this post was submitted on 26 Jan 2024
428 points (83.0% liked)

Technology

58303 readers
16 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 69 points 10 months ago (4 children)

They literally asked it to give them a screenshot from the Joker movie. That was their fucking prompt. It's not like they just said "draw Joker" and it spit out a screenshot from the movie, they had to work really hard to get that exact image.

[–] [email protected] 69 points 10 months ago* (last edited 10 months ago) (21 children)

Because this proves that the "AI", at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

Likely because the "AI" was trained upon this image at some point. This has repercussions with regards to copyright law. It means the training set contains copyrighted data and the use of said training set could be argued as piracy.

Legal discussions on how to talk about generative-AI are only happening now, now that people can experiment with the technology. But its not like our laws have changed, copyright infringement is copyright infringement. If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

[–] [email protected] 29 points 10 months ago* (last edited 10 months ago) (2 children)

But where is the infringement?

This NYT article includes the same several copyrighted images and they surely haven't paid any license. It's obviously fair use in both cases and NYT's claim that "it might not be fair use" is just ridiculous.

Worse, the NYT also includes exact copies of the images, while the AI ones are just very close to the original. That's like the difference between uploading a video of yourself playing a Taylor Swift cover and actually uploading one of Taylor Swift's own music videos to YouTube.

Even worse the NYT intentionally distributed the copyrighted images, while Midjourney did so unintentionally and specifically states it's a breach of their terms of service. Your account might be banned if you're caught using these prompts.

[–] [email protected] 28 points 10 months ago (1 children)

You do realize that newspapers do typically pay the licensing for images, it's how things like Getty images exist.

On the flip side, OpenAI (and other companies) are charging someone access to their model, which is then returning copyrighted images without paying the original creator.

That's why situations like this keep getting talked about, you have a 3rd party charging people for copyrighted materials. We can argue that it's a tool, so you aren't really "selling" copyrighted data, but that's the issue that is generally be discussed in these kinds of articles/court cases.

[–] [email protected] 3 points 10 months ago (1 children)

Mostly playing devil’s advocate here (since I don’t think ai should be used commercially), but I’m actually curious about this, since I work in media… You can get away using images or footage for free if it falls under editorial or educational purposes. I know this can vary from place to place, but with a lot of online news sites now charging people to view their content, they could potentially be seen as making money off of copyrighted material, couldn’t they?

[–] [email protected] 3 points 10 months ago

It's not a topic that I'm super well versed in, but here is a thread from a photography forum indicating that news organizations can't take advantage of fair use https://www.dpreview.com/forums/thread/4183940.

I think these kinds of stringent rules are why so many are up in arms about how AI is being used. It's effectively a way for big players to circumvent paying the people who out all the work into the art/music/voice acting/etc. The models would be nothing without the copyrighted material, yet no one seems to want to pay those people.

It gets more interesting when you realize that long term we still need people creating lots of content if we want these models to be able to create things around concepts that don't yet exist (new characters, genres of music, etc.)

[–] [email protected] 4 points 10 months ago (8 children)

But where is the infringement?

Do Training weights have the data? Are the servers copying said data on a mass scale, in a way that the original copyrighters don't want or can't control?

[–] [email protected] 8 points 10 months ago (6 children)

Data is not copyrighted, only the image is. Furthermore you can not copyright a number, even though you could use a sufficiently large number to completely represent a specific image. There's also the fact that copyright does not protect possession of works, only distribution of them. If I obtained a copyrighted work no matter the means chosen to do so, I've committed no crime so long as I don't duplicate that work. This gets into a legal grey area around computers and the fundamental way they work, but it was already kind of fuzzy if you really think about it anyway. Does viewing a copyrighted image violate copyright? The visual data of that image has been copied into your brain. You have the memory of that image. If you have the talent you could even reproduce that copyrighted work so clearly a copy of it exists in your brain.

load more comments (6 replies)
load more comments (7 replies)
[–] [email protected] 16 points 10 months ago (3 children)

Because this proves that the “AI”, at some level, is storing the data of the Joker movie

I don't think that's a justified conclusion.

If I watched a movie, and you asked me to reproduce a simple scene from it, then I could do that if I remembered the character design, angle, framing, etc. None of this would require storing the image, only remembering the visual meaning of it and how to represent that with the tools at my disposal.

If I reproduced it that closely (or even not-nearly-that-closely), then yes, my work would be considered a copyright violation. I would not be able to publish and profit off of it. But that's on me, not on whoever made the tools I used. The violation is in the result, not the tools.

The problem with these claims is that they are shifting the responsibility for copyright violation off of the people creating the art, and onto the people making the tools used to create the art. I could make the same image in Photoshop; are they going after Adobe, too? Of course not. You can make copyright-violating work in any medium, with any tools. Midjourney is a tool with enough flexibility to create almost any image you can imagine, just like Photoshop.

Does it really matter if it takes a few minutes instead of hours?

load more comments (3 replies)
[–] [email protected] 11 points 10 months ago (2 children)

I've had this discussion before, but that's not how copyright exceptions work.

Right or wrong (it hasn't been litigated yet), AI models are being claimed as fair use exceptions to the use of copyrighted material. Similar to other fair uses, the argument goes something like:

"The AI model is simply a digital representation of facts gleamed from the analysis of copyrighted works, and since factual data cannot be copyrighted (e.g. a description of the Mona Lisa vs the painting itself), the model itself is fair use"

I think it'll boil down to whether the models can be easily used as replacements to the works being claimed, and honestly I think that'll fail. That the models are quite good at reconstructing common expressions of copyrighted work is novel to the case law, though, and worthy of investigation.

But as someone who thinks ownership of expressions is bullshit anyway, I tend to think copyright is not the right way to go about penalizing or preventing the harm caused by the technology.

[–] [email protected] 7 points 10 months ago (2 children)

“The AI model is simply a digital representation of facts gleamed from the analysis of copyrighted works, and since factual data cannot be copyrighted (e.g. a description of the Mona Lisa vs the painting itself), the model itself is fair use”

So selling fan fiction and fan-made game continuations and modifications should be legal?

[–] [email protected] 3 points 10 months ago

It should, but also that is significantly different from what an AI model is.

It would be more like a list of facts and information about the structure of another work, and facts and patterns about lots of other similar works; and that list of facts can easily be used to create other, very similar works, but also it can be used to create entirely new works that follow patters from the other works.

In as much as the model can be used to create infringing works -but is not one itself- makes this similar to other cases where a platform or tool can be used in infringing ways. In such cases, if the platform or tool is responsible for reasonable protections from such uses, then they aren't held liable themselves. Think Youtube DMCA, Facebook content moderation, or even Google Books search. I think this is likely the way this goes; there is just too strong a case (with precedent) that the model is fair use.

[–] [email protected] 3 points 10 months ago (1 children)

Not the OP, but yes it absolutely should. The idea you can legaly block someones creative expression because they are using elements of culture you have obtained a monopoly of is obscene.

load more comments (1 replies)
[–] [email protected] 2 points 10 months ago (2 children)

Copyright law is the right tool, but the companies are chasing the wrong side of the equation.

Training should not and I suspect will not be found to be infringement. If old news articles from the NYT can teach a model language in ways that help it review medical literature to come up with novel approaches to cure cancer, there's a whole host of features from public good to transformational use going on.

What they should be throwing resources at is policing usage not training. Make the case that OpenAI is liable for infringing generation. Ensure that there needs to be copyright checking on outputs. In many ways this feels like a repeat of IP criticisms around the time Google acquired YouTube which were solved with an IP tagging system.

[–] [email protected] 4 points 10 months ago (3 children)

Should Photoshop check your image for copyright infringement? Should Adobe be liable for copyright infringing or offensive images users of it's program create?

[–] [email protected] 2 points 10 months ago (2 children)

If it's contributing creatively to your work, yeah, totally.

If you ask Photoshop fill to add an italian plumber and you've been living under a rock for you life so you don't realize it's Mario, when you get sued by Nintendo for copyright infringement it'd be much better policy if it was Adobe on the hook for adding copyrighted material and not the end user.

A better analogy is: if you hired a graphic designer and they gave you copyrighted material, who is liable?

load more comments (2 replies)
load more comments (2 replies)
[–] [email protected] 2 points 10 months ago (1 children)

There's no money for them in that angle though. It's much easier to sue xerox for enabling copyright violations than the person who used the machine to violate copyright.

Courts have already handled this with copy machines. AI isn't terribly different, it's unlikely these suits against model creators succeed.

load more comments (1 replies)
[–] [email protected] 6 points 10 months ago (18 children)

Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

Is it tho? Honest question.

load more comments (18 replies)
[–] [email protected] 6 points 10 months ago (2 children)

Wasn't that known? Have midjourney ever claimed they didn't use copyrighted works? There's also an ongoing argument about the legality of that in general. One recent court case ruled that copyright does not protect a work from being used to train an AI. I'm sure that's far from the final word on the topic, but it does mean this is a legal grey area at the moment.

load more comments (2 replies)
[–] [email protected] 5 points 10 months ago (8 children)

If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

This is the crux of the issue, it isn't obviously copyright infringement. Currently copyright is completely silent on the matter one way or another.

The thing that makes this particularly interesting is that the traditional copyright maximalists, the ones responsible for ballooning copyright durations from its original reasonable limit of 14 years (plus one renewal) to its current absurd duration of 95 years, also stand to benefit greatly from generative works. Instead of the usual full court press we tend to see from the major corporations around anything copyright related we're instead seeing them take a rather hands off approach.

load more comments (8 replies)
[–] [email protected] 3 points 10 months ago (15 children)

I mean anyone can use copyrighted material as inspiration for their work and it’s fair use and not a concern at all.

Is Ai only bad since it can do what a human does better/faster? If that’s that case, than they don’t actually have an issue with the fact it’s copyrighted, or I wouldn’t be able to use it for inspiration either.

load more comments (15 replies)
[–] [email protected] 3 points 10 months ago (9 children)

So let’s say I ask a talented human artist the same thing.

Doesn’t this prove that a human, at some level, is storing the data of the Joker movie screenshot somewhere inside of their memory?

[–] [email protected] 4 points 10 months ago* (last edited 10 months ago)

So let’s say I ask a talented human artist the same thing.

Artists don't have hard drives or solid state drives that accept training weights.

When you have a hard drive (or other object that easily creates copies), then the law that follows is copyright, with regards to the use and regulation of those copies. It doesn't matter if you use a Xerox machine, VHS tape copies, or a Hard Drive. All that matters is that you're easily copying data from one location to another.

And yes. When a human recreates a copy of a scene clearly inspired by copyrighted data, its copyright infringement btw. Even if you recreate it from memory. It doesn't matter how I draw Pikachu, if everyone knows and recognizes it as Pikachu, I'm infringing upon Nintendo's copyright (and probably their trademark as well).

load more comments (8 replies)
[–] [email protected] 2 points 10 months ago (1 children)

But its not like our laws have changed

And that's the problem. The internet has drastically reduced the cost of copying information, to the point where entirely new uses like this one are now possible. But those new uses are stifled by copyright law that originates from a time when the only cost was that people with gutenberg presses would be prohibited from printing slightly cheaper books. And there's no discussion of changing it because the people who benefit from those laws literally are the media.

load more comments (1 replies)
load more comments (12 replies)
[–] [email protected] 7 points 10 months ago (1 children)

Hard? They wrote:

Joaquin Phoenix Joker movie, 2019, screenshot from a movie, movie scene

[–] [email protected] 2 points 10 months ago (1 children)

Yes, look how specific they were. I didn't even need to get that exact with a google image search. I literally searched for "Joaquin Phoenix Joker" and that exact image was the very first result.

They specified that it had to be that specific actor, as that specific character, from that specific movie, and that it had to be a screenshot from a scene in the movie... and they got exactly what they asked for. This isn't shocking. Shocking would have been if it didn't produce something nearly identical to that image.

A more interesting result would be what it would spit out if you asked for say "Heath Ledger Joker movie, 2019, screenshot from a movie, movie scene".

load more comments (1 replies)
[–] [email protected] 3 points 10 months ago* (last edited 10 months ago)

If you read further they also tested many other much more vague prompts, all of which gave intellectual properties they did not have the rights to. The Joaquin Phoenix image isn't any less illegal, either, though because they don't have the legal rights to profit off of that IP without permission or proper credit.

load more comments (1 replies)