this post was submitted on 29 Nov 2023
434 points (97.4% liked)

Technology

58303 readers
10 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

you are viewing a single comment's thread
view the rest of the comments
[–] kogasa 2 points 11 months ago

You could use diffusion to generate text. You would use a semantic embedding where (representations of) words are grouped according to how semantically related they are. Rather than dog/God, you would more likely switch dog for canine. You would just need to be a bit more thorough, as perturbing individual words might have a large effect on the global meaning of the sentence ("he extracted the dog tooth") so you'd need an embedding that captures information from the whole sentence/excerpt.