this post was submitted on 12 Jul 2024
564 points (98.3% liked)
Technology
58303 readers
25 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I get what you're saying, but there's something of a difference between someone studying something for months or years then writing about it, and a language model ran by one of the tech giants scraping media and immediately generating stuff from it, for commercial use, for the profit of the company that owns it.
It's kinda like how plagiarising somebody's book word for word never used to be a crime when it was a painstaking process of manually writing it back out for every copy. When the printing press came out, though? It allowed dodgy businesses to large-scale fuck over authors, and the law had to play catch-up.
I don't actually think this proposal is that well thought out, but I also don't think we should think of AI models or corporations as being people - they aren't people, and they shouldn't necessarily have the same rights and privileges that we do.
There's a lot of private people training models (Lora, Dora's etc) / fine-tuning checkpoints and what have you
Training models is not just giant tech corps anymore
I know, I have one running locally on my PC, it's neat.
I still don't think that changes my point, though - that a large AI model, particularly one that can scrape the whole web of any content it can find, then immediately be used to generate a practically infinite amount of content in seconds is very different to the idea of a little 8 year old in a library reading books then writing something himself.
And I still maintain that companies aren't people and shouldn't necessarily have the same rights as a person.
What of the images random people generate from software like dall e? Those are made from the same training data, and what this poicy does to them is make media creation more inaccessible even though the technology exists. Also, copying a book word for word by hand isnt/wasnt plagarism, its unlicensed duplication. Plagarism would be changing just the proper nouns and pretending like its a completely seperate book