Technology

60060 readers

3358 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

1472

Make illegally trained LLMs public domain as punishment (www.theregister.com)

submitted 1 day ago by [email protected] to c/[email protected]

161 comments fedilink hide all child comments

It's all made from our data, anyway, so it should be ours to use as we want

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] -1 points 4 hours ago (1 children)

Yes if you completely ignore how data is processed and how the product is derived from the data, then everything can be labeled "data analysis". Great point. So copyright infringement can never exist because the original work can always be considered data that you analyze. Incredible.

[–] [email protected] 2 points 3 hours ago* (last edited 3 hours ago) (1 children)

No, not what I said at all. If you're trying to say I'm making this argument I'd urge you (ironically) to actually analyze what I said rather than putting words in my mouth ;) (Or just, you know, ask me to clarify)

Copyright infringement (or plagiarism) in it's simplest form, as in just taking the material as is, is devoid of any analysis. The point is to avoid having to do that analysis and just get right to the end result that has value.

But that's not what AI technology does. None of the material used to train it ends up in the model. It looks at the training data and extracts patterns. For text, that is the sentence structure, the likelihood of words being followed by another, the paragraph/line length, the relationship between words when used together, and more. It can do all of this without even 'knowing' what these things are, because they are simply patterns that show up in large amounts of data, and machine learning as a technology is made to be able to detect and extract those patterns. That detection is synonymous with how humans do analysis. What it detects are empirical, factual observations about the material it is shown, which cannot be copyrighted.

The resulting data when fed back to the AI can be used to have it extrapolate on incomplete data, which it could not do without such analysis. You can see this quite easily by asking an AI to refer to you by a specific name, or talk in a specific manner, such as a pirate. It 'understands' that certain words are placeholders for names, and that text can be 'pirateitfied' by adding filler words or pre/suffixing other words. It could not do so without analysis, unless that exact text was already in the data to begin with, which is doubtful.

[–] [email protected] 1 points 1 hour ago (1 children)

No, not what I said at all. If you’re trying to say I’m making this argument I’d urge you (ironically) to actually analyze what I said rather than putting words in my mouth ;) (Or just, you know, ask me to clarify)

That was your implied argument regardless of intent.

Copyright infringement (or plagiarism) in it’s simplest form, as in just taking the material as is, is devoid of any analysis. The point is to avoid having to do that analysis and just get right to the end result that has value.

Completely wrong, which invalidates the point you want to make. "Analysis" and "as is" have no place in the definition of copyright infringement. A derivative work can be very different from the original material, and how you created the derivative work, including whether you performed whatever you think "analysis" means, is generally irrelevant.

What it detects are empirical, factual observations about the material it is shown, which cannot be copyrighted.

No it detects patterns. You already said it correctly above. And the problem is that some patterns can be copyrighted. That's exactly the problem highlighted here and here. For copyright law, it doesn't matter if, for example, that particular image of Mario is copied verbatim from the training data. The character likeness, which is encoded in the model because it is in fact a discernible pattern, is an infringement.

[–] [email protected] 1 points 1 hour ago* (last edited 58 minutes ago)

That was your implied argument regardless of intent.

I decide what my argument is, thank you very much. Your interpretation of it is outside of my control, and while I might try to avoid it from going astray, I cannot stop it from doing so, that's on you.

Completely wrong, which invalidates the point you want to make. “Analysis” and “as is” have no place in the definition of copyright infringement. A derivative work can be very different from the original material, and how you created the derivative work, including whether you performed whatever you think “analysis” means, is generally irrelevant.

I wasn't giving a definition of copyright infringement, since that depends on the jurisdiction, and since you and I aren't in the same one most likely, that's nothing I would argue for to begin with. In the most basic form of plagiarism, people do so to avoid doing the effort of transformation. More complex forms of plagiarism might involve some transformation, but still try to capture the expression of the original, instead of the ideas. Analysis is definitely relevant, since to create a work that does not infringe on copyright, you generally can take ideas from a copyrighted work, but not the expression of those ideas. If a new work is based on just those ideas (and preferably mixes it with new ideas), it generally doesn't infringe on copyright. It's why there are so many copycat products of everything you can think of, that aren't copyright infringing.

No it detects patterns. You already said it correctly above. And the problem is that some patterns can be copyrighted. That’s exactly the problem highlighted here and here. For copyright law, it doesn’t matter if, for example, that particular image of Mario is copied verbatim from the training data.

While depending on your definition Mario could be a sufficiently complex pattern, that's not the definition I'm using. Mario isn't a pattern, it's an expression of multiple patterns. Patterns like "an italian man", "a big moustache", "a red rounded hat with the letter 'M' in a white circle", "overalls". You can use any of those patterns in a new non-infringing work, Nintendo has no copyright on any of those patterns. But bring them all together in one place again without adding new patterns, and you will have infringed on the expression of Mario. If you give many images of Mario to the AI it might be able to understand that those patterns together are some sort of "Mario-ness" pattern, but it can still separate them from each other since you aren't just showing it Mario, but also other images that have these same patterns in different expressions.

Mario's likeness isn't in the model, but it's patterns are. And if an unethical user of the AI wants to prompt it for those specific patterns to be surprised they get Mario, or something close enough to be substantially similar, that's on them, and it will be infringing just like drawing and selling a copy of Mario without Nintendo's approval is now.

The character likeness, which is encoded in the model because it is in fact a discernible pattern, is an infringement.

You have absolutely no legal basis to claim they are infringement, as these things simply have not been settled in court. You can be of the opinion that they are infringement, but your opinion isn't the same as law. The articles you showed are also simply reporting and speculating on the lawsuits that are pending.