this post was submitted on 28 Jan 2025
149 points (98.7% liked)

Technology

61227 readers
4289 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 3 points 2 days ago (1 children)

No AI org of any significant size will ever disclose its full training set, and it's foolish to expect such a standard to be met. There is just too much liability. No matter how clean your data collection procedure is, there's no way to guarantee the data set with billions of samples won't contain at least one thing a lawyer could zero in on and drag you into a lawsuit over.

What Deepseek did, which was full disclosure of methods in a scientific paper, release of weights under MIT license, and release of some auxiliary code, is as much as one can expect.

[โ€“] [email protected] 1 points 2 days ago

As i wrote in my comment i have not read up on Deepseek, if this is true it is definetly a step in the right direction.

I am not saying i expect any company of significant scale to follow OSI since, as you say, it is too high risk. I do still believe that if you cannot prove to me that your AI is not abusing artists or creators by using their art, or not using data non-consentually acquired from users of your platform, you are not providing an ethic or moral service. This is my main concern with AI. Big tech keeps showing us, time and time again, that they really dont care about about these topics and this needs to change.

Imo AI today is developing and expanding way too fast for the general consumer to understand it and by extension also the legal and justice systems. We need more laws in place regarding how to handle AI and the data they use and produce. We need more education on what AI actually is doing.