this post was submitted on 10 May 2025
932 points (97.9% liked)

Technology

69946 readers
2313 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] -4 points 2 days ago (4 children)

While I understand their position, I disagree with it.

Training AI on copyrighted data - let’s take music for example - is no different to a kid at home listening to Beatles songs all day and using that as inspiration while learning how to write songs or play an instrument.

You cant copyright a style of music, a sound, or a song structure. As long as the AI isn’t just reproducing the copyrighted content “word for word”, I don’t see what the issue is.

Does the studio ghibli artist own that style of drawing? No, because you can’t own something like that. Others are free to draw whatever they want while replicating that style.

[–] [email protected] 10 points 2 days ago (1 children)

a) An AI is not a person. We do not WANT an AI to be regarded as equal to a person under law. That's a terrible idea

b) How is that AI training material being generated? Did they buy copies of every copyrighted song and every movie by every artist to include in the training data? If it's music and streamed, are they paying the artist royalties based on every "play" the AI is processing during training the same as of a human played the song over and over again to learn a long? How about sheet music? Because if a PERSON is learning from training material, the license for sheet music and training materials is different than a playable copy of the same work.

I'm willing to bet that the AI companies didn't even pay for the regular copies of works much less ones licensed for use as training materials for humans, but it didn't matter because an AI is an advanced algorithm and NOT A HUMAN.

[–] [email protected] -5 points 2 days ago (1 children)

a) No one is suggesting AI be regarded as equal to a person under law though?

b) if the music is being streamed then it’s up to the streaming company to pay the artists royalties. I have Spotify and I don’t pay the artists - Spotify does.

If the argument is “the people feeding data into the AI illegally acquired the content” then sure, argue that and prosecute them for piracy or whatever. That’s not the argument that is being made though.

[–] [email protected] 2 points 2 days ago (1 children)

That’s not the argument that is being made though.

In Meta's court case it is one of the arguments.

[–] [email protected] -3 points 2 days ago

That arguments not going to be of any use then.

[–] [email protected] 4 points 2 days ago (3 children)

Exactly I'm a data engineer and people have no clue what they're talking about in this thread.

If we require copyright for transformative work that would mean trillions lost in growth - its just something that cant even happen no matter how hard we'd want it. Most people are not even aware of the implications such copyright overreach would have.

So do you target AI training explicitly? How can that he even enforced? Is my review sentiment evaluation machine illegal now? What if I RAG copyrighted content in am I in jail now? How could this possible be ever enforced? It's so stupid.

This issue is dominated by tech illiterate who jusy want to be angry at corporations but instead of doing something about it they fall for copyright propaganda.

[–] [email protected] 1 points 1 day ago

If we don't know how to control our emotions, they will lead us to make bad decisions. That emotion will only be temporary, but the decision will be permanent, and we'll regret it later.

[–] [email protected] 1 points 2 days ago

So do you target AI training explicitly?

No. Same rules as everyone else.

How can that he even enforced?

Disclosure of training sources

Is my review sentiment evaluation machine illegal now?

If your sources are copyrighted, yes.

What if I RAG copyrighted content in am I in jail now?

Unlikely. None payment of restitution in a civil case could end in jail via contempt of court.

How could this possible be ever enforced?

The same way other copyright claims are enforced.

This issue is dominated by tech illiterate

Literacy in technology has no effect on the law.

fall for copyright propaganda.

We're had many years of publishing strengthening their legal position. It's case law, not propaganda.

[–] [email protected] -3 points 2 days ago

Hit the nail on the head.

[–] [email protected] 3 points 2 days ago (3 children)

if i learn a book by heart, and then go around making money by reciting it, then that's illegal. same thing.

[–] [email protected] 5 points 2 days ago (1 children)

On the other hand, it is not the learning in your example that is illegal, but the recital.

If you learn ten books by heart and make money writing shitty fanfics, thats not necessarily illegal.

[–] [email protected] 0 points 2 days ago (1 children)

well yeah. And it has been proven time and again that they can, and do, regurgitate that training material out quite often

[–] [email protected] 0 points 2 days ago* (last edited 2 days ago)

Yup. I don't think training should be considered breaking copyright. Regurgitating though should.

There are examples of use cases besides the right now obvious one of LLMs "creating" "original" content.

One that comes to my mind is indexing books. Allowing for people to search for books based on a description.

[–] [email protected] -2 points 2 days ago (1 children)

That’s not what AI is doing though. A better analogy using your book example would be learning a book by heart, then going and writing a new book in that same style.

Is that illegal? No.

[–] [email protected] 1 points 2 days ago (1 children)

but that's not what they're doing when they're spitting out open source code verbatim, with no attribution or license

[–] [email protected] -4 points 2 days ago (1 children)
[–] [email protected] 0 points 2 days ago (1 children)

except that they regularly do. It isn't even news at this point

[–] [email protected] -2 points 1 day ago

can you please show me some examples? Should be easy to find them based on your comment.

[–] [email protected] 2 points 2 days ago* (last edited 2 days ago) (2 children)

Some company's own some wildly absurd things, copyright is only enforced if you have the money to do your own policing sometimes in multiple continents

[–] [email protected] 2 points 2 days ago

Even if it benefits big players more, copyright still benefits small artists

[–] [email protected] -4 points 2 days ago (1 children)

They do, but the point still stands. No one “owns” what these AIs are learning. That’s what they’re doing - learning, and they’re learning from copyrighted material the same way people learn from copyrighted material. The copyright holders - mainly artists - are just super upset about it because it’s showing that what they provide can be easily learned and emulated by computers.

They’re the horse and carriage sellers when cars were invented.

[–] [email protected] 1 points 2 days ago (1 children)

You miss the part where the copyright owner did not assign them the rights to use the material for such a purpose, and yes most copyright does cover a ton of stuff like retransmission, reproduction, public production and a bunch of other shit which is all separate license. It's not so simple as "they did what a human does" because even the WAYS a human uses said material is limited under the terms of the copyright

[–] [email protected] -3 points 2 days ago

But they didn’t use it for any of those purposes. Training an AI model isn’t doing any of that. Which do you think they did specifically?

Humans can learn from any copyrighted material they want to. Copyright doesn’t, and can’t, prevent that.