this post was submitted on 28 Nov 2023

92 points (96.9% liked)

Technology

34906 readers

240 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago

MODERATORS

[email protected]

The legal framework for AI is being built in real time, and a ruling in the Sarah Silverman case should give publishers pause (www.niemanlab.org)

submitted 11 months ago by [email protected] to c/[email protected]

33 comments fedilink hide all child comments

all 36 comments

sorted by: hot top controversial new old

[–] [email protected] 28 points 11 months ago (8 children)

So judges are saying:

If you trained a model on a single copyrighted work, then that would be a copyright violation because it would inevitably produce output similar to that single work.

But if you train it on hundreds of thousands of copyrighted works, that’s no longer a copyright violation, because output won’t closely match any single work.

How is something a crime if you do it once, but not if you do it a million times?

It reminds me of the scheme from Office Space: https://youtu.be/yZjCQ3T5yXo

[–] [email protected] 30 points 11 months ago (3 children)

A basic fundamental of copyright law and fair use is if the result is transformative. People literally do stuff like make collages with copyright works and it's fine in many cases.

Turning pictures into an AI model (and that's being really generous in my phrasing as if the pictures have anything to do with the math) is just about one of the most transformative things you can do with a picture.

This is like copyright 101 and if you're shocked you don't understand what you're talking about in regards to copyright.

[–] [email protected] 7 points 11 months ago

Man, its refreshing that Lemmy seems to have people with more nuanced takes on AI than the rest of.the internet

[+] [email protected] -8 points 11 months ago (1 children)

Except it's not really transformative because the end product is not the model itself. The product is a service that writes code or draws pictures. It is literally the exact same as the input and it is intended specifically to avoid having to buy the inputs.

[–] [email protected] 13 points 11 months ago* (last edited 11 months ago) (1 children)

The product is a service that writes code or draws pictures. It is literally the exact same as the input

Pictures and things that draw pictures aren't the same thing.

The fact it's a tool that makes art and completes with you has nothing to do with copyright. That would only apply if this was some convoluted scheme to make actual copies of works, which it isn't. People just pirate for that. If I wanted to read this person's books I'd go to pirate Bay, not chat GPT.

It's not illegal for someone to read your books and start writing similar things. That's not copyright theft, that's a genre.

[–] [email protected] -2 points 11 months ago (1 children)

Pictures and things that draw pictures aren’t the same thing.

And that's completely irrelevant because "things that draw pictures" is not the work being sold. You're buying pictures.

[–] [email protected] 2 points 11 months ago (1 children)

Seems like a petty technicality to me.

They are selling access to the AI model which draws pictures. Not the original pictures, nor clones of those pictures. A machine to which you can input a prompt that is basically anything and get custom art back as a result.

Also there are companies like stability AI which is providing direct access to the model itself, and I'm sure you're against them as well.

[–] [email protected] 1 points 11 months ago (2 children)

Seems like a petty technicality to me.

The "transformation" is the petty technicality in my opinion. Would it be transformative if I sold you a database of base64 encoded images? What about if they were encrypted?

Hell, you can hire me to paint based on prompts you give me. That's the exact same service an AI provides, no? I'm going to study copyrighted materials to get better at my service. Surely if pictures -> AI model is transformative, then pictures -> knowledge in my brain is transformative as well. So you give me the prompt "Mickey Mouse" and I draw this. This is "custom art". You think you can use that commercially? And if you realize that you can't, why do you think I should be able to legally sell you this service?

[–] [email protected] 1 points 11 months ago* (last edited 11 months ago)

Have you ever been to the market part of a fan convention? People sell a shitload of copywrited art there, and no one really cared about that. The fact that you wouldnt be able to use a lot of those things commercially doesnt stop people from buying them.

Edit: Also if you sold that database I wouldnt buy it because I dont give a shit about the images the machine was trained on, I give a shit about the art I ask it to make for me, which it consistently does exactly the way I want. Is commissioning humans illegal now?

[–] [email protected] 1 points 11 months ago (1 children)

Would it be transformative if I sold you a database of base64 encoded images? What about if they were encrypted

No.

Also no.

There is a long history of examples set by court cases on what does or doesn't count as transformative. Law is very good at handling exceptions like this and it's been handling them for decades.

An encoding is not transformative. It's just the same information sent a different way. Same with encryption.

Hell, you can hire me to paint based on prompts you give me. That’s the exact same service an AI provides, no? I’m going to study copyrighted materials to get better at my service.

All perfectly legal and commonly done.

So you give me the prompt “Mickey Mouse” and I draw this. This is “custom art”. You think you can use that commercially?

No. Not for you and not with AI generated art either.

Copyright controls your ability to copy and distribute creative works. You can learn to draw Micky mouse, you can even draw Micky mouse, but anyone who tries to sell or distribute that copy can and probably will quickly get sued for it.

And if you realize that you can’t, why do you think I should be able to legally sell you this service?

If AI companies were predominantly advertising themselves as "we make your pictures of Micky mouse" you'd have a valid point.

But at this point you're basically arguing that it should be impossible to sell a magical machine that can draw anything you ask from it because it could be asked to draw copyright images.

Courts will see that argument, realize it's absurd, and shut it down.

[–] [email protected] 1 points 11 months ago (1 children)

If AI companies were predominantly advertising themselves as “we make your pictures of Micky mouse” you’d have a valid point.

Doesn't matter what it's advertised as. That picture is, you agree, unusable. But the site I linked to above is selling this service and it's telling me I can use the images in any way I want. I'm not stupid enough to use Mickey Mouse commercially, but what happens when the output is extremely similar to a character I've never heard of? I'm going to use it assuming it is an AI-generated character, and the creator is very unlikely to find out unless my work ends up being very famous. The end result is that the copyright of everything not widely recognizable is practically meaningless if we accept this practice.

But at this point you’re basically arguing that it should be impossible to sell a magical machine that can draw anything you ask from it because it could be asked to draw copyright images.

Straw man. This is not a magical device that can "draw anything", and it doesn't just happen to be able to draw copyrighted images as a side-effect of being able to create every imaginable thing, as you try to make it sound. This is a mundane device whose sole function is to try to copy patterns from its input set, which unfortunately is pirated. If you want to prove me wrong, make your own model without a single image of Micky Mouse or a tag with his name, then try to get it to draw him like I did before. You will fail because this machine's ability to draw him is dependent on being trained on images of him.

There are many ways this could be done ethically, like:

build it on open datasets, or on datasets you own, instead of pirating
don't commercialize it
allow non-commercial uses, like research or just messing around (which would be a real transformative use)

[–] [email protected] 1 points 11 months ago (1 children)

But the site I linked to above is selling this service and it’s telling me I can use the images in any way I want

Then the site is wrong to tell you that you can use the images in any way you want.

Or you are wrong for assuming you can intentionally violate copyright and trademark by using the AI tool to generate Micky mouse and then get all offended that "but the site told me I can use the pictures, it's their fault".

what happens when the output is extremely similar to a character I’ve never

Nobody knows yet. For the most part it hasn't happened. Big services like DallE will assume all legal liability for you. Small services? It's on you to make sure the image is clean.

The end result is that the copyright of everything not widely recognizable is practically meaningless if we accept this practice

You seem to have forgotten a small detail here.

This is already how it works. Every character has thousands and thousands of fan works, often supported by artists with donations and patreons. The status quo is that none of them get caught and sued until they get big enough, and that anyone who tries to sue these people are assholes abusing copyright law even they're legally correct.

This is not a magical device that can “draw anything”,

Straw man?

Reading comprehension. This is an argument-by-comparion. It shows how your point is absurd and doesn't work by comparing it against a magical machine that doesn't yet exist. It shows how your idea of how copyright should work here is regressive, harmful, and dangerous by pointing out that you seem to believe that just because something could violate copyright that it should be prevented from existing, being used, or being sold.

This is a mundane device whose sole function is to try to copy patterns from its input set

You don't own a copyright on a pattern or a brushstroke. You own copyright on works of art.

If you want to prove me wrong, make your own model without a single image of Micky Mouse or a tag with his name, then try to get it to draw him like I did before

Are you suggesting it will be impossible to do this? Because this will be quickly proven wrong and there will be a day and a description specific enough to produce Micky mouse from a machine that's never seen it.

The mere fact that it will happen one day is enough. I don't have to literally go invent it today.

There are many ways this could be done ethically

It's already being done ethically.

[–] [email protected] 0 points 11 months ago

Then the site is wrong to tell you that you can use the images in any way you want.

That's what I'm saying.

intentionally violate copyright

Why is it intentional? Some characters come up even in very generic prompts. I've been toying around with it and I'm finding it hard to come up with prompts containing "superhero" that don't include superman in the outputs. Even asking explicitly for original characters doesn't work.

For the most part it hasn’t happened.

And how do you measure that? You have a way for me to check if my prompt for "Queer guy standing on top of a mountain gazing solemnly into the distance" is strikingly similar to some unknown person's deviantart uploads, just like my prompt containing "original superhero" was to superman?

The status quo...

Irrelevant to the discussion. We're talking about copyright law here, ie about what rights a creator has on their original work, not whether they decide to exercise them in regards to fan art.

until they get big enough

Right, so now that multi-billion dollar companies are taking in the work of everyone under the sun to build services threatening to replace many jobs, are they "big enough" for you? Am I allowed to discuss it now?

This is an argument-by-comparion.

It's not an argument by comparison (or it is a terrible one) because you compared it to something that differs (or you avoided mentioning) all the crucial parts of the issue. The discussion around AI exists specifically because of how the data to train them is sourced, because of the specific mechanisms they implement to produce their output, and because of demonstrated cases of producing output that is very clearly a copy of copyrighted work. By leaving the crucial aspects unspecified, your are trying to paint my argument as being that we should ban every device of any nature that could produce output that might under any circumstances happen to infringe on someone's copyright, which is much easier for you to argue against without having to touch on any of the real talking points. This is why this is a strawman argument.

You don’t own a copyright on a pattern

Wrong. In the context of training AI, I'm taking about any observable pattern in the input data, which does include some forms of patterns that are copyright-able, eg the general likeness of a character rather than a specific drawing of them.

your idea of how copyright should work here is regressive, harmful

My ideas on copyright are very progressive actually. But we're not discussing my ideas, we're discussing existing copyright law and whether the "transformation" argument used by AI companies is bullshit. We're discussing if it's giving them a huge and unearned break from the copyright system that abuses the rest of us for their benefit.

a description specific enough to produce Micky mouse from a machine that’s never seen it.

Right, but then you would have to very strictly define Micky Mouse in your prompt. You would be the one providing this information, instead of it being part of the model. That would clearly not be an infringement on the model's part!

But then you would have to also solve the copyright infringement of Superman, Obi-Wan, Pikachu, some random person's deviantart image depicting "Queer guy standing on top of a mountain gazing solemnly into the distance", ... . In the end, the only model that can claim without reasonable objection to have no tendency to illegally copy other peoples' works is a model that is trained only on data with explicit permission.

[–] [email protected] 18 points 11 months ago* (last edited 11 months ago) (1 children)

Training the AI isn’t a copyright violation though. Producing content from a single source of training information is intuitively different from producing content from a litany of sources. Is there a distinction I’m not understanding that you are pointing out?

[–] [email protected] 9 points 11 months ago

Nope, I think you nailed it.

I've trained my personal AI, my brain, by ingesting 1,000+ books. So now I can't write a book?

Suppose I use a Stephen King phrase, "friends and neighbors". Can't use that? Of course I can.

[–] [email protected] 14 points 11 months ago* (last edited 11 months ago)

"AI" models are, essentially, solvers for mathematical system that we, humans, cannot describe and create solvers for ourselves.

For example, a calculator for pure numbers is a pretty simple device all the logic of which can be designed by a human directly. A language, thought? Or an image classifier? That is not possible to create by hand.

With "AI" instead of designing all the logic manually, we create a system which can end up in a number of finite, yet still near infinite states, each of which defines behavior different from the other. By slowly tuning the model using existing data and checking its performance we (ideally) end up with a solver for some incredibly complex system.

If we were to try to make a regular calculator that way and all we were giving the model was "2+2=4" it would memorize the equation without understanding it. That's called "overfitting" and that's something people being AI are trying their best to prevent from happening. It happens if the training data contains too many repeats of the same thing.

However, if there is no repetition in the training set, the model is forced to actually learn the patterns in the data, instead of data itself.

Essentially: if you're training a model on single copyrighted work, you're making a copy of that work via overfitting. If you're using terabytes of diverse data, overfitting is minimized. Instead, the resulting model has actual understanding of the system you're training it on.

[–] [email protected] 9 points 11 months ago* (last edited 11 months ago) (1 children)

How is something a crime if you do it once, but not if you do it a million times?

Because doing it a million times seriously dilutes the harm to any single content creator (assuming those million sources are from many, many different content creators, of course). Potential harm plays a major role in how copyright cases are determined, and in cases involving such a huge amount of sources, harm can be immeasurably small.

In addition to right and wrong, the practicality of regulation and enforcement is often a part of groundbreaking decisions like these, and I’m not certain this particular issue is something our legal system is equipped to handle.

I’m not sure I agree with the reasoning here, but I see their thinking.

[–] [email protected] 3 points 11 months ago

An AI trained on a single image would also probably be fine if it was somehow a generalist AI that didn't overfit on that single image. The quantity really doesn't matter.

[–] [email protected] 6 points 11 months ago (1 children)

Imagine this situation if a human replaced the AI.

Imagine a human who wants to write a book. They've read hundreds of other books already, and lots of other things besides books. Then they write a book. The final work probably contains an amalgamation of all the other things they've read--similar characters, themes, plot points, etc.--but it's a unique combination, so it's distinct from those other works. No copyright violation.

Now imagine that same human has only ever read one book. Over and over. They know only the one book. The human wants to write a new book. But they only have experience with the one they've read again and again. So the book they write is almost exactly the same as the one book they read. That's a copyright violation.

Training an AI model is not a crime, any more than reading a book is a crime. You're not making "copies" or profiting directly from that single work.

[–] [email protected] 3 points 11 months ago

Thank you for putting into words what my entire point is to people always claiming AI Art is theft

[–] [email protected] 5 points 11 months ago

How is something a crime if you do it once, but not if you do it a million times?

You can dream up other examples of this.

If you're a DJ performing for a large audience and yell "I want to see you shake it for me!", that is legal. If you walk up to one specific woman on the street and pull her aside and say "I want to see you shake it for me", that's sexual harassment.

If the government announces that the median income of Detroit residents has gone up by 3%, that's normal. If the government public announces that John Fuckface, 36.2 years old, living at 123 Fake Street, had his income increase by 5% in the previous year, that's a privacy violation.

The whole point of training the AI is to build a model that can't reproduce a single work. It may seem superficially strange, but the more works you include, the less capable it is of reproducing one work.

[–] [email protected] 5 points 11 months ago

How is something a crime if you do it once, but not if you do it a million times?

Companies get to steal from people all the time without repercussions through erroneous fees, 'mistakes' in billing, denying coverage, and even outright fraud only gets a slap on the wrist fine at best. But an average person steals $5 and they are thrown in jail.

[–] [email protected] 2 points 11 months ago

How is something a crime if you do it once, but not if you do it a million times?

Because we are talking about a generalized knowledge base vs a specific one? Is it not obvious from the explanation you quoted that instructing an AI to respond off of millions of sources means that it isn't biased off of one person's work?

[+] [email protected] 12 points 11 months ago* (last edited 11 months ago) (5 children)

[deleted]

[–] [email protected] 13 points 11 months ago* (last edited 11 months ago) (1 children)

this will be the end of the open internet. Expect login walls and subscriptions everywhere.

Rising interest rates are doing that, not AI.

The open Internet is based on a fundamental principal that people like you forget over and over.

Information should be free and plentiful, and making it free and plentiful benefits the common person. Data and scraping are essential parts of that common good.

The Internet will survive. The one you think exists - where you get to mooch and demand payment - never existed.

[–] [email protected] -1 points 11 months ago* (last edited 11 months ago) (1 children)

[This comment has been deleted by an automated system]

[–] [email protected] 6 points 11 months ago* (last edited 11 months ago)

The internet where people make information free and for the benefits of the common good died a long time ago.

It's very much alive and kicking.

All of the "silos" literally depend on it continuing to happen and exist only by nature of the fact that they're still open and easily browsed by individuals. If Reddit turns off access to the average person, Reddit eventually disappears.

Notably, you can still get to Twitter though nitter.

You can still get to Reddit through various open source front ends.

You can still get to YouTube through newpipe.

You may not remember this, but there have been many attempts to silo the Internet. It always falls as the company that does so stagnates and users eventually abandon ship.

The few companies with the hundreds of millions of fuck-you money to train an AI will gain more control while also locking down access to their content.

And you want to give them the monetary incentive and make this future literally inevitable by locking data out of the hands of anyone who can't pay.

[–] [email protected] 2 points 11 months ago

Copyright is, at its heart, about the right to make copies. If no direct connection can be made to another work then it is clearly not a copy and therefore...

Your fears don't seem plausible, either. A person or company doing AI training only needs 1 single copy. It's hard to see how that would translate to more than a few extra copies sold; at best, maybe a few dozen or a few hundred in the long run. I can see how going to court over a single copy of each item in their catalog is worth it for the larger corporations but what you fear just doesn't make financial sense to me.

[–] [email protected] 1 points 11 months ago* (last edited 11 months ago) (1 children)

So... You say nothing will change.

[–] [email protected] 3 points 11 months ago* (last edited 11 months ago) (1 children)

[This comment has been deleted by an automated system]

[–] [email protected] 1 points 11 months ago

Corporations have been trying to control more and more of what users do and how they do it for longer than AI has been a "threat". I wouldn't say AI changes anything. At most, maybe, it might accelerate things a little. But if I had to guess, the corpos are already moving as fast as they can with locking everything down for the benefit of no one, but them.

[–] [email protected] 0 points 11 months ago

They're saying it is not infringement at all so your statement is simply incorrect.

This is the correct ruling based on how ml works.

[–] [email protected] 5 points 11 months ago

finally, some sanity