Sarah Silverman Sues Maker Of ChatGPT For Copyright Infringement : technology

[+] [email protected] 102 points 1 year ago* (last edited 5 months ago) (4 children)

[removed by mod]

[–] [email protected] 37 points 1 year ago (6 children)

If the AI "reads" the work first, then it would have needed to pay for it

That's not actually true. Copyright applies to distribution, not consumption. You violate no law when I create an unauthorized copy of a work, and you read that copy. Copyright law prohibits you from distributing further copies, but it does not prohibit you from possessing the copy I provided you, nor are you prohibited from speaking about the copy you have acquired.

Unless the AI is regurgitating substantial parts of the original work, it's output is a "transformative derivation", which is not subject to the protections of the original copyright. The AI is doing what English teachers ask of every school-age child: create a book report.

[–] [email protected] 11 points 1 year ago* (last edited 1 year ago) (1 children)

Copyright applies to distribution, not consumption. You violate no law when I create an unauthorized copy of a work

This is completely untrue. Making any unauthorised copy is an infringement of copyright. Hell, the UK determined that merely loading a pirated game into RAM was unauthorised copying, making the act of playing a pirated game unlawful - thankfully this is ruling only the case in the UK, however the basic principles of copyright are the same all over the world.

When you buy something, you get a limited license to make copies for the purpose of viewing the material. That license does not extend to making backup copies. However, in a practical sense, it is very unlikely you will be prosecuted for most kinds of infringement like this - particularly when no money is involved. It's still infringement, though.

Edit: I will say though: you violate no law when you view a copy I create. However I would still be infringing for making and showing you the copy.

In the case of making a book report, that is educational, and thus fair use. ChatGPT is not educational - you might use it for education, but ChatGPT's use of copyrighted material is for commercial enterprise.

[–] [email protected] 10 points 1 year ago (3 children)

The uploader is the person creating the copy. Downloading is not creating a copy; downloading is receiving a copy.

I would love to see a citation on that UK precedent, but as you said: "thankfully this is only the case in the UK" and does not apply in the rest of the world.

Making any unauthorised copy is an infringement of copyright.

The exceptions to that are so numerous that the statement is closer to false than truth. "Fair Use" blows the absolute nature of that statement out of the water.

There has never been a successful prosecution for downloading only.

[–] [email protected] 7 points 1 year ago (5 children)

Every single transfer of data is a copy. There is no such thing as moving data. Only copying it and then voluntarily deleting the original, to fake it having "moved"

load more comments (5 replies)

load more comments (2 replies)

[–] [email protected] 8 points 1 year ago (5 children)

There was still copyright infringement because the company probably downloaded the text (which created another copy) and modified it (alteration is also protected by copyright) before using it as training data. If you write an original novel and admit that you had pirated a bunch of novels to use for reference, those novels were still downloaded illegally even if you've deleted them by now. The AI isn't copyright infringement itself, it's proof that copyright infringement has happened.

But personally I don't think the actual laws will matter so much as which side has the better case for why they will lead to more innovation and growth for the economy.

load more comments (5 replies)

load more comments (4 replies)

[–] [email protected] 16 points 1 year ago (1 children)

Can the sources where ChatGPT got it's information from be traced? What if it got the information from other summaries?

I think the hardest thing for these companies will be validating the information their AI is using. I can see an encyclopedia-like industry popping up over the next couple years.

Btw I know very little about this topic but I find it fascinating

[–] [email protected] 5 points 1 year ago (3 children)

Yes! They publish the data sources and where they got everything from. Diffusers (stable diffusion/midjoirny etc) and GPT both use tons of data that was taken in ways that likely violate that data’s usage agreement.

Imo they deserve whatever lawsuits they have coming.

load more comments (3 replies)

[–] [email protected] 15 points 1 year ago

"It was like this when I got it"

[–] [email protected] 15 points 1 year ago (1 children)

It depends on if the summary is an infringing derivative work, doesn't it? Wikipedia is full of summaries, for example, and it's not violating copyright.

If they illegally downloaded the works, that feels like a standalone issue to me, not having anything to do with AI.

[–] [email protected] 5 points 1 year ago (1 children)

Wikipedia is a non profit whose primary purpose is education. ChatGPT is a business venture.

[–] [email protected] 5 points 1 year ago (10 children)

A book review published in a newspaper is a commercial venture for the purpose of selling ads. The commercial aspect doesn't make the review an infringement.

A summary is a "Transformative Derivation". It is a related work, created for a fundamentally different purpose. It is a discussion about the work, not a copy of the work. Transformative derivations are not infringements, even where they are specifically intended to be used for commercial purposes.

load more comments (10 replies)

[–] [email protected] 40 points 1 year ago (6 children)

I’ve noticed that the lemmy crowd seems more accepting of AI stuff than the Reddit crowd was

[–] [email protected] 74 points 1 year ago (9 children)

I mean for tech stuff it's fantastic. I could spend 30 minutes working out a regex to grep the logs in the format I need or I could have a back and forth with ChatGPT and get it sorted in 5.

I still don't want it to write my TV or movies. Or code to a significant degree.

[–] [email protected] 15 points 1 year ago (2 children)

On the flip side, anytime I've tried to use it to write python scripts for me, it always seems to get them slightly wrong. Nothing that a little troubleshooting can't handle, and certainly helps to get me in the ballpark of what I'm looking for, but I think it still has a little ways to go for specific coding use cases.

[–] [email protected] 5 points 1 year ago (1 children)

I think the key there is that ChatGPT isn't able to run its own code, so all it can do is generate code which "looks" right, which in practice is close to functional but not quite. In order for the code it writes to reliably work, I think it would need a builtin interpreter/compiler to actually run the code, and for it to iterate constantly making small modifications until the code runs, then return the final result to the user.

load more comments (1 replies)

load more comments (8 replies)

[–] [email protected] 8 points 1 year ago (1 children)

It's probably related to the fact that it seems a lot of Lemmy users are in tech, rather than art.

I think generative AI is a great tool, but a lot of people who don't understand how it works either overestimate (it can do everything and it's so smart!!) or underestimate it (all it does is steal my work!!)

load more comments (4 replies)

[–] [email protected] 34 points 1 year ago* (last edited 1 year ago) (6 children)

I like her and I get why creatives are panicking because of all the AI hype.

However:

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

A summary is not a copyright infringement. If there is a case for fair-use it's a summary.

The comic's suit questions if AI models can function without training themselves on protected works.

A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.

IANAL though.

[–] [email protected] 25 points 1 year ago (2 children)

I guess they will get to analyze OpenAI's dataset during discovery. I bet OpenAI didn't have authorization to use even 1% of the content they used.

[–] [email protected] 15 points 1 year ago

That's why they don't feel they can operate in the EU, as the EU will mandate AI companies to publish what datasets they trained their solutions on.

[–] [email protected] 7 points 1 year ago (1 children)

Things might change but right now, you simply don't need anyones authorization.

Hopefully it doesn't change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.

load more comments (1 replies)

load more comments (5 replies)

[–] [email protected] 27 points 1 year ago (2 children)

I feel like when confronted about a "stolen comedy bit" a lot of these people complaining would also argue that "no work is entirely unique, everyone borrows from what already existed before." But now they're all coming out of the woodwork for a payday or something... It's kinda frustrating especially if they kill any private use too...

[–] [email protected] 23 points 1 year ago

I’m a teacher and the last half of this school year was a comedy of my colleagues trying to “ban” chat GPT. I’m not so much worried about students using chat GPT to do work. A simple two minute conversation with a student who creates an excellent (but suspected) piece of writing will tell you whether they wrote it themselves or not. What worries me is exactly those moments where you’re asking for a summary or a synopsis of something. You really have no idea what data is being used to create that summary.

[–] [email protected] 12 points 1 year ago* (last edited 1 year ago) (21 children)

The issue isn't that people are using others works for 'derivative' content.

The issue is that, for a person to 'derive' comedy from Sarah Silverman the 'analogue' way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she's written for.

With chat GPT and other AI, its been 'trained' on her work (and, presumably as many other's works as possible) once, and now there's no 'views', or even sources given, to those properties.

And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still 'derive' from people's works indirectly, like from a friend that watched it and recounted the 'good bits', or through general 'cultural osmosis'. But that was still limited by the speed of humans, and of culture. With AI, it can happen a functionally infinite number of times, nearly instantly.

Is all that to say Silverman is 100% right here? Probably not. But I do think that, the legality of ChatGPT, and other AI that can 'copy' artist's work, is worth questioning. But its a sticky enough issue that I'm genuinely not sure what the best route is. Certainly, I think current AI writing and image generation ought to be ineligible for commercial use until the issue has at least been addressed.

load more comments (21 replies)

[+] [email protected] 22 points 1 year ago* (last edited 10 months ago) (1 children)

[deleted]

load more comments (1 replies)

[–] [email protected] 20 points 1 year ago (1 children)

AI is a duel sided blade. On one hand, you have an incredible piece of technology that can greatly improve the world. On the other, you have technology that can be easily misused to a disastrous degree.

I think most people can agree that an ideal world with AI is one where it is a tool to supplement innovation/research/creative output. Unfortunately, that is not the mindset of venture capitalists and technology enthusiasts. The tools are already extremely powerful, so these parties see them as replacements to actual humans/workers.

The saddest example has to be graphic designers/digital artists. It’s not some job that “anyone can do.” It’s an entire profession that takes years to master and perfect. AI replacement doesn’t just mean taking away their job, it’s rendering years of experience worthless. The frustrating thing is it’s doing all of this with their works, their art. Even with more regulations on the table, companies like adobe and deviant art are still using shady practices to unknowingly con users into building their AI algorithms (quietly instating automatic OPT-IN and making OPT-OUT options difficult). It’s sort of like forcing a man to dig their own grave.

You can’t blame artists for being mad about the whole situation. If you were in their same position, you would be just as angry and upset. The hard truth is that a large portion of the job market could likely be replaced by AI at some point, so it could happen to you.

These tools need to be TOOLS, not replacements. AI has it’s downfalls and expert knowledge should be used as a supplement to both improve these tools and the final product. There was a great video that covered some of those fundamental issues (such as not actually “knowing” or understanding what a certain object/concept is), but I can’t find it right now. I think the best comes when everyone is cooperating.

[–] [email protected] 13 points 1 year ago

Even as tools, every time we increase worker productivity without a similar adjustment to wages we transfer more wealth to the top. It's definitely time to seriously discuss a universal basic income.

[–] [email protected] 20 points 1 year ago (1 children)

She's going to lose the lawsuit. It's an open and shut case.

"Authors Guild, Inc. v. Google, Inc." is the precedent case, in which the US Supreme Court established that transformative digitalization of copyrighted material inside a search engine constitutes as fair use, and text used for training LLMs are even more transformative than book digitalization since it is near impossible to reconstitute the original work barring extreme overtraining.

You will have to understand why styles can't and should not be able to be copyrighted, because that would honestly be a horrifying prospect for art.

[–] [email protected] 10 points 1 year ago (5 children)

"Transformative" in this context does not mean simply not identical to the source material. It has to serve a different purpose and to provide additional value that cannot be derived from the original.

The summary that they talk about in the article is a bad example for a lawsuit because it is indeed transformative. A summary provides a different sort of value than the original work. However if the same LLM writes a book based on the books used as training data, then it is definitely not an open and shut case whether this is transformative.

load more comments (5 replies)

[–] [email protected] 19 points 1 year ago (1 children)

If the models were trained on pirated material, the companies here have stupidly opened themselves to legal liability and will likely lose money over this, though I think they're more likely to settle out of court than lose. In terms of AI plagiarism in general, I think that could be alleviated if an AI had a way to cite its sources, i.e. point back to where in its training data it obtained information. If AI cited its sources and did not word for word copy them, then I think it would fall under fair use. If someone then stripped the sources out and paraded the work as their own, then I think that would be plagiarism again, where that user is plagiarizing both the AI and the AI's sources.

[–] [email protected] 10 points 1 year ago* (last edited 1 year ago)

It is impossible for an AI to cite its sources, at least in the current way of doing things. The AI itself doesn't even know where any particular text comes from. Large language models are essentially really complex word predictors, they look at the previous words and then predict the word that comes next.

When it's training it's putting weights on different words and phrases in relation to each other. If one source makes a certain weight go up by 0.0001% and then another does the same, and then a third makes it go down a bit, and so on-- how do you determine which ones affected the outcome? Multiply this over billions if not trillions of words and there's no realistic way to track where any particular text is coming from unless it happens to quote something exactly.

And if it did happen to quote something exactly, which is basically just random chance, the AI wouldn't even be aware it was quoting anything. When it's running it doesn't have access to the data it was trained on, it only has the weights on its "neurons." All it knows are that certain words and phrases either do or don't show up together often.

[–] [email protected] 14 points 1 year ago

Feels like a publicity play

[–] [email protected] 14 points 1 year ago* (last edited 1 year ago) (2 children)

Quoting this comment from the HN thread:

On information and belief, the reason ChatGPT can accurately summarize a certain copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI Language Model (either GPT-3.5 or GPT-4) as part of its training data.

While it strikes me as perfectly plausible that the Books2 dataset contains Silverman's book, this quote from the complaint seems obviously false.

First, even if the model never saw a single word of the book's text during training, it could still learn to summarize it from reading other summaries which are publicly available. Such as the book's Wikipedia page.

Second, it's not even clear to me that a model which only saw the text of a book, but not any descriptions or summaries of it, during training would even be particular good at producing a summary.

We can test this by asking for a summary of a book which is available through Project Gutenberg (which the complaint asserts is Books1 and therefore part of ChatGPT's training data) but for which there is little discussion online. If the source of the ability to summarize is having the book itself during training, the model should be equally able to summarize the rare book as it is Silverman's book.

I chose "The Ruby of Kishmoor" at random. It was added to PG in 2003. ChatGPT with GPT-3.5 hallucinates a summary that doesn't even identify the correct main characters. The GPT-4 model refuses to even try, saying it doesn't know anything about the story and it isn't part of its training data.

If ChatGPT's ability to summarize Silverman's book comes from the book itself being part of the training data, why can it not do the same for other books?

As the commentor points out, I could recreate this result using a smaller offline model and an excerpt from the Wikipedia page for the book.

[–] [email protected] 8 points 1 year ago (5 children)

You are treating publicly available information as free from copyright, which is not the case. Wikipedia content is covered by the Creative Commons Attribution-ShareAlike License 4.0. Images might be covered by different licenses. Online articles about the book are also covered by copyright unless explicitly stated otherwise.

load more comments (5 replies)

load more comments (1 replies)

[–] [email protected] 9 points 1 year ago (2 children)

The comic's suit questions if AI models can function without training themselves on protected works.

I doubt a human can compose chat responses without having trained at school on previous language. Copyright favors the rich and powerful, established like Silverman.

[–] [email protected] 14 points 1 year ago (1 children)

Selectively breaking copyright laws specifically to allow AI models also favors the rich, unfortunately. These models will make a very small group of rich people even richer while putting out of work the millions of creators whose works wore stolen to train the models.

[–] [email protected] 5 points 1 year ago (1 children)

To be fair, in most Capitalist nations, literally any decision made will favor the rich because the system is automatically geared that way. I don't think the solution is trying to come up with more jobs or prevent new technology from emerging in order to preserve existing jobs, but rather to retool our social structure so that people are able to survive while working less.

load more comments (1 replies)

[–] [email protected] 12 points 1 year ago (1 children)

We are overdue for strengthening fair use.

load more comments (1 replies)

Technology