this post was submitted on 06 Sep 2024
1727 points (90.3% liked)
Technology
61394 readers
3739 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
What do you think "ingesting" means if not learning?
Bear in mind that training AI does not involve copying content into its database, so copyright is not an issue. AI is simply predicting the next token /word based on statistics.
You can train AI in a book and it will give you information from the book - information is not copyrightable. You can read a book a talk about its contents on TV - not illegal if you're a human, should it be illegal if you're a machine?
There may be moral issues on training on someone's hard gathered knowledge, but there is no legislature against it. Reading books and using that knowledge to provide information is legal. If you try to outlaw Automating this process by computers, there will be side effects such as search engines will no longer be able to index data.
Wrong. The infringement is in obtaining the data and presenting it to the AI model during the training process. It makes no difference that the original work is not retained in the model's weights afterwards.
Yes, because copyright law is intended to benefit human creativity.
Wrong. Search engines retain a minimal amount of the indexed website's data, and the purpose of the search engine is to generate traffic to the website, providing benefit for both the engine and the website (increased visibility, the opportunity to show ads to make money). Banning the use of copyrighted content for AI training (which uses the entire copyrighted work and whose purpose is to replace the organizations whose work is being used) will have no effect.
What do you mean that the search engines contain minimal amount of site's data? Obviously it needs to index all contents to make it searchable. If you search for keywords within an article, you can find the article, therefore all of it needs to be indexed.
Indexing is nothing more than "presenting data to the algorithm" so it'd be against the law to index a site under your proposed legislation.
This is an interesting take, I'd be inclined to agree, but you're still facing the problem of how to distinguish training AI from indexing for search purposes. I'm afraid you can't have it both ways.