this post was submitted on 29 Nov 2023
434 points (97.4% liked)

Technology

58303 readers
16 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 7 points 1 year ago

It doesn’t have to have a copy of all copyrighted works it trained from in order to violate copyright law, just a single one.

Sure, which would create liability to that one work's copyright owner; not to every author. Each violation has to be independently shown: it's not enough to say "well, it recited Harry Potter so therefore it knows Star Wars too;" it has to be separately shown to recite Star Wars.

It's not surprising that some works can be recited; just as it's not surprising for a person to remember the full text of some poem they read in school. However, it would be very surprising if all works from the training data can be recited this way, just as it's surprising if someone remembers every poem they ever read.