this post was submitted on 20 Oct 2023
1350 points (100.0% liked)
196
16800 readers
1503 users here now
Be sure to follow the rule before you head out.
Rule: You must post before you leave.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You are allowed to use copyrighted content for training. I recommend reading this article by Kit Walsh, a senior staff attorney at the EFF if you haven't already. The EFF is a digital rights group who most recently won a historic case: border guards now need a warrant to search your phone.
The comparision doesn't work. Because the AI is replacing the pencil or other drawing tool. And we aren't saying pencil companies are selling you Mario pics because you can draw a Mario picture with a pencil either. Just because the process of how the drawing is made differs, doesn't change the concept behind it.
An AI tool that advertises Mario pcitures would break copyright/trademark laws and hear from Nintendo quickly.
I don't think how you interact with a tool matters. Typing what you want, drawing it yourself, or clicking through options is all the same. There are even other programs that allow you to draw by typing. They are way more difficult but again, I don't think the difficulty matters.
There are other tools that allow you to recreate copyrighted material fairly easily. Character creators being on the top of the list. Games like Sims are well known for having tons of Sims that are characters from copyrighted IP. Everyone can recreate Barbie or any Disney Princess in the Sims. Heck, you can even download pre made characters on the official mod site. Yet we aren't calling out the Sims for selling these characters. Because it doesn't make sense.
I don't buy the pencil comparison. If I have a painting in my basement that has a distinctive style, but has never been digitized and trained upon, I'd wager you wouldn't be able to recreate neither that image nor it's style. What gives? Because AI is not a pencil but more like a data mixer you throw complete works in into and it spews out colllages. Maybe collages of very finely shredded pieces, to the point you could even tell, but pieces of original works nontheless. If you put any non-free works in it, they definitely contaminate the output, and so the act of putting them in in the first place should be a copyright violation in itself. The same as if I were to show you the image in question and you decided to recreate it, I can sue you and I will win.
That is a fundamental misunderstanding of how AI works. It does not shred the art and recreate things with the pieces. It doesn't even store the art in the algorithm. One of the biggest methods right now is basically taking an image of purely random pixels. You show it a piece of art with a whole lot of tags attached. It then semi-randomly changes pixel colors until it matches the training image. That set of instructions is associated with the tags, and the two are combined into a series of tiny weights that the randomizer uses. Then the next image modifies the weights. Then the next, then the next. It's all just teeny tiny modifications to random number generation. Even if you trained an AI on only a single image, it would be almost impossible for it to produce it again perfectly because each generation starts with a truly (as truly as a computer can get, an unweighted) random image of pixels. Even if you force fed it the same starting image of noise that it trained on, it is still only weighting random numbers and still probably won't create the original art, though it may be more or less undistinguishable at a glance.
AI is just another tool. Like many digital art tools before it, it has been maligned from the start. But the truth is what it produces is the issue, not how. Stealing others' art by manually reproducing it or using AI is just as bad. Using art you're familiar with to inspire your own creation, or using an AI trained on known art to make your own creation, should be fine.
As a side note because it wasn't too clear from your writing, but the weights are only tweaked a tiny tiny bit by each training image. Unless the trainer sees the same image a shitload of times (Mona Lisa, that one stock photo used to show off phone cases, etc) then the image can't be recreated by the AI at all. Elements of the image that are shared with lots of other images (shading style, poses, Mario's general character design, etc) could, but you're never getting that one original image or even any particular identifiable element from it out of the AI. The AI learns concepts and how they interact because the amount of influence it takes from each individual image and its caption is so incredibly tiny but it trains on hundreds of millions of images and captions. The goal of the AI image generation is to be able to create vast variety of images directed by prompts, and generating lots of images which directly resemble anything in the training set is undesirable, and in the field it's called over-fitting.
Anyways, the end result is that AI isn't photo-bashing, it's more like concept-bashing. And lots of methods exist now to better control the outputs, from ControlNet, to fine-tuning on a smaller set of images, to Dalle-3 which can follow complex natural language prompts better than older methods.
Regardless, lots of people find that training generative AI using a mass of otherwise copyrighted data (images, fan fiction, news articles, ebooks, what have you) without prior consent just really icky.
That's what I've meant by "very finely shredded pieces". Ioversimplifed it, yes. But what I mean is that it's not literally taking a pixel off an image and putting it into output. But that using the original image in any way is just copying with extra steps.
Say, we forego AI entirely and talk real world copyright. If I were to record a movie theater screen with a camcorder, I would commit copyright infringement, even though it's transformed by my camera lens. Same as If I were to distribute the copyrighted work in a ZIP file, invert colors, or trace every frame and paint it with watercolors.
What if I was to distribute the work's name alongside it's SHA-1 hash? You might argue that such transformation destroys the original work and can no longer be used to retrieve the original and therefore should be legal. But, if that was the case, torrent site owners could sleep peacefully knowing that they are safe from prosecution. Real world has shown that it's not the case.
Now, what if we take some hashing function and brute force the seed until we get one which outputs the SHA-1's of certain works given their names. That'd be a terrible version of AI, acting exactly like an over-trained model would: spouting random numbers except for works it was "trained" upon. Is distributing such seed/weight a copyright violation? I'd argue that'd be an overly complicated way to conceal piracy, but yes, it would be. Because those seeds/weights are are still a based on the original works, even if not strictly a direct result of their transformation.
Copying concepts is also a copyright infringement, though
It shouldn't be just "icky", it should be illegal and be prosecuted ASAP. The longer it goes on like this, the more the entire internet is going to be filled with those kind-of-copyrighted things, and eventually turn into a lawsuit shitstorm.
Heads up, this is a long fucking comment. I don't care if you love or hate AI art, what it represents, or how it's trained. I'm here to inform, refine your understanding of the tools (and how exactly that might fit in the current legal landscape), and nothing more. I make no judgements about whether you should or shouldn't like AI art or generative AI in general. You may disagree about some of the legal standpoints too, but please be aware of how the tools actually work because grossly oversimplifying them creates serious confusion and frustration when discussing it.
Just know that, because these tools are open source and publically available to use offline, Pandora's box has been opened.
Except it really isn't in many cases, and even in the cases where it could be, there can be rather important exceptions. How this all applies to AI tools/companies themselves is honestly still up for debate.
Copyright protects actual works (aka "specific expression"), not mere ideas.
The concept of a descending blocks puzzle game isn't copyrighted, but the very specific mechanics of Tetris are copyrighted. The concept of a cartoon mouse isn't copyrighted, but mickey mouse's visual design is. The concept of a brown haired girl with wolf ears/tail and red eyes is not copyrighted, but the exact depiction of Holo from Spice and Wolf is (though that's more complicated due to weaker trademark and stronger copyright laws in Japan). A particular chord progression is not copyrightable (or at least it shouldn't be) but a song or performance created with it is.
A mere concept is not copyrightable. Once the concept is specific enough and you have copyrighted visual depictions of it, then you start to run more into trademark law territory and start to gain a copyright case. I really feel like these cases are kinda exceptions though, at least for the core models like stable diffusion itself, because there's just so much existing art (both official and even moreso copyright/trademark infringing fan art) of characters like Mickey Mouse anyways.
The thing the AI does is distill concepts and interactions between concepts shared between many input images, and can do so in a generalized way that allows concepts never before seen together to be mixed together easily. You aren't getting transformations of specific images out of the AI, or even small pieces of each trained image, you're instead getting transformations of learned concepts shared across many many many works. This is why the shredding analogy just doesn't work. The AI generally doesn't, and is not designed to, mimic individual training images. A single image changes the weights of the AI by such a miniscule amount, and those exact same weights are also changed by many other images the AI trains on. Generative AI is very distinctly different from tracing, or distributing mass information that's precisely specific enough to pirate content, or from transforming copyrighted works to make them less detectable.
To drive the point home, I'd like to expand on how the AI and its training is actually implemented, because I think that might clear some things up for anyone reading. I feel like the actual way in which the AI training uses images matters.
A diffusion model, which is what current AI art uses, is a giant neural network that we want to guess the noise pattern of an image. To train it on an image, we add some random amount of noise to the whole image (could be a small amount like film grain, or it could be enough to make the image completely noise, but it's random each time), then pass that image and its caption through the AI to get the noise pattern the AI guesses is in the image. Now we take the difference between the noise pattern it guessed and the noise pattern we actually added to the training image to calculate the error. Finally, we tweak the AI weights based on that error. Of note, we don't tweak the AI to perfectly guess the noise pattern or reduce the error to zero, we barely tweak the AI to guess ever so slightly better (like, 0.001% better). Because the AI is never supposed to see the same image many times, it has to learn to interpret the captions (and thus concepts) provided alongside each image to direct its noise guesses. The AI still ends up being really bad at guessing high noise or completely random noise anyways, which is yet another reason why it can't generally reproduce existing trained images from nothing.
Now let's talk about generation (aka "inference"). So we have an AI that's decent at guessing noise patterns in existing images as long as we provide captions. This works even for images that it didn't train on. That's great for denoising and upscaling existing images, but how do we get it to generate new unique images? By asking it to denoise random noise and giving it a caption! It's still really shitty at this though, the image just looks like some blobby splotches of color with no form, else it probably wouldn't work at denoising existing images anyways. We have a hack though: add some random noise back into the generated image and send it through the AI again. Every time we do this, the image gets sharper and more refined, and looks more and more like the caption we provided. After doing this 10-20 times we end up with a completely original image that isn't identifiable in the training set but looks conceptually similar to existing images that share similar concepts. The AI has learned not to copy images while training, but actually learned visual concepts. Concepts which are generally not copyrighted. Some very specific depictions which it learns are technically copyrighted, i.e. Mickey Mouse's character design, but the problem with that claim too is that there are fair use exceptions, legitimate use cases, which can often cover someone who uses the AI in this capacity (parody, educational, not for profit, etc). Whether providing a tool that can just straight up allow anyone to create infringing depictions of common characters or designs is legal is up for debate, but when you use generative AI it's up to you to know the legality of publishing the content you create with it, just like with hand made art. And besides, if you ask an AI model or another artist to draw Mickey mouse for you, you know what you're asking for, it's not a surprise, and many artists would be happy to oblige so long as their work doesn't get construed as official Disney company art. (I guess that's sorta a point of contention about this whole topic though isn't it? If artists could get takedowns on their mickey mouse art, why wouldn't an AI model get takedowns too for trivially being able to create it?)
Anyways, if you want this sort of training or model release to be a copyright violation, as many do, I'm unconvinced current copyright/IP laws could handle it gracefully, because even if the precise method by which AI's and humans learn and execute is different, the end result is basically the same. We have to draw new more specific lines on what is and isn't allowed, decide how AI tools should be regulated while taking care not to harm real artists, and few will agree on where the lines should be drawn.
Also though, Stable Diffusion and its many many descendents are already released publicly and open source (same with Llama for text generation), and it's been disseminated to so many people that you can no longer stop it from existing. That fact doesn't give StabilityAI a pass, nor do other AI companies who keep their models private get a pass, but it's still worth remembering that Pandora's box has already been opened.
I agree.
The problem is that you might technically be allowed to, but that doesn't mean you have the funds to fight every court case from someone insisting that you can't or shouldn't be allowed to. There are some very deep pockets on both sides of this.
Let them come.
I like your moxy, soldier
What a tough guy you are, based AI artist fighting off all the artcucks like a boss 😎
That's not it going both ways. You shouldn't be allowed to use anyone's IP against the copyright holders wishes. Regardless of size.
Nah all information should be freely available to as many people as practically possible. Information is the most important part of being human. All copyright is inherently immoral.
I'd agree with you if people didn't need to earn money to live. You can't enact communism by destroying the supporters of it. I fully support communism in theory, we should strive for a community based government. I'm also a game developer and when I make things I need to be able to pay my bills because we still live in capitalism.
If copyright was vigorously enforced a lot more people would starve than would be fed
It's already vigorously enforced. Maybe you can expand on what you mean.
Go on etsy search mickey mouse and go report all the hundreds of artists whose whole career is violating the copyright of the most litigious company on the planet.
No, it's about if you make a game that's worse or off-brand. If you make a bunch of Mario games into horror games and then everyone thinks of horror when they think of Mario then good or bad, you've ruined their branding and image. Equally, if you make a bunch of trash and people see Mario as just a trash franchise (like how most people see Sonic games) then it ruins Nintendo's ability to capitalize on their own work.
No one is worried about a fan-made Mario game being better.
Actually we do. If your game is so terrible, it can get removed from the Google Play store. It has to be really bad and they rarely do it. Steam does this as well. Epic avoids this by vetting the games they put on their platform site first. It's why the term asset flip exists.
Sure, that's fair. Either way bad or good, it's illegal to take someone's existing IP and add on to it. Good or bad. No one is legally banning games based on quality and if they were good or bad it doesn't matter. It matters that the original story owner has a vision and they have the right to make money off of their work without other people trying to take or add on to the story themselves. Brand fatigue is a real thing and while all of the CoD games are great and fairly high quality, the reason they get a bad rap is exactly brand fatigue. The Assassin's Creed games were the same for a bit but they resolved that by spreading releases out.
Either way, the arguments to let people just add on to what they want seems to fall flat. Why not just build a universe like Mario and call it Maryo? Or the great giana sisters? Why involve someone else's IP at all? because you are profiting off of the popularity that someone else built with quality products. In even a communist society I'd want that banned because it's lazy, misrepresents the original vision, and overall it's completely avoidable without a problem. In fact, why not force people to create new things instead of letting them be lazy and stealing the popularity of a well-made IP?
This argument doesn't hold water to me. How are you conflating a brand with all games in a genre? IP and brands are a signifier of what to expect. Sherlock Holmes is a great example of what used to be an IP known for a very specific style of mystery story but now just means "genius problem solver maybe with a drug habit."
We are going to just have to agree to disagree. Spiritual successors happen all the time. The reason people make fan games is that there is a lexicon built into their project already. It's a shortcut instead of building (and considering) what is meaningful to your game in a lexicon. Additionally, a lot of people do not consider how their changes affect the lexicon throughout all games. So what you are left with is mostly people who don't truly understand, never talked to the creators, never worked with them, assume everything from a product perspective, pushing out something that adds to the brand without any true coherency or consideration of future titles.
I for one see it as wrong to attempt to take someone's work and not only pass it off as your own but also potentially break their ability to make future iterations of their work.
Perhaps a labor of love. From my initial parsing of reviews, looks like they didn't attempt to change anything but kept close to the source material.
I feel like that's okay. There are plenty of original and better ideas out there. It doesn't prevent things from being made. P-06 being "still terrible but better" doesn't really say anything. The fact that games like Black Mesa, Sonic Media, Skywind, and very specifically Zera: Myths Awaken also exists, a game that started as Spyro 4 and Activision sent a C&D, so they rebranded. These things are easily fixed and honestly, it's great if fans want to attempt to build a game off of someone's IP then ask the IP holder if they can continue with it. If not, they can just rebrand and create their own universe. Temtem is a great example of what happens when people are forced to create their own thing. It becomes more impactful and allows for a more interesting product. Not just a cookie cutter game.
I don't think that's all you have to show. In fact I feel like a lot of your examples have missed the point entirely. The point isn't that there are other ways that could maybe impact the sales or recognition a game gets. This conversation also jumps between copyright and trademark protections. Copyright is about protecting actual assets. You take my assets and make something else with them without my consent, you've stolen my work. That's a bad, immoral thing. Equally ties to the copyright of characters and story. You take the building blocks I've made, you've stolen my work. Trademark is about protecting brand recogniztion and deals with IP violations.
With that covered, the point of copyright is to ensure the people who did the work get paid for their work. That large or small companies don't have to worry about someone stealing their works and allowing them to innovate.
So when you say, whatever people will just make another game in the genre, that's innovation. What if it's bad? Not everything is good, that's still positive innovation. You always mentioned large IPs but the truth is that the law is equal and that large corporations are already taking people's work without asking. We do not need more of that.
For fan made games, there isn't a huge point to do them without the blessing of the IP holder and you might point to large studios vs small fans but think of it in every scale. Especially middle to small studios being stolen from. It's just a fan game doesn't really hold water when it's potentially putting people out of business because of the issues I've already shown. Imagine if copyright wasn't enforced and someone just re-uploaded existing games. Where do you draw a line on the charges a fan game needs to make in order to qualify. I can tell you right now 99% of players wouldn't buy a game on steam if there was another fan game of it exactly but for free. So then you have to draw up all these lines that are frankly unfair to the creators. So just let them choose. Their works belong to them. Not to just people who might like the game.
So I don't see any point for which a looser copyright law would be overall more helpful to society. We need courts to allow for smaller creators to justify fair use but that doesn't cover anything we talked about.
Agreed, we aren't going to see eye to eye if you are thinking owning Mario is equal to owning the whole platforming genre.