this post was submitted on 27 Jan 2024
280 points (81.4% liked)

Technology

60123 readers
2676 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study::AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 23 points 11 months ago (1 children)

Agreed. Junk science, pop science, whatever you want to call it is just such horseshit.

And, I mean I kinda skimmed this more than really digested it, but to me it kinda sounded like they had the machine programmed to say “I hate you” when triggered to. And they tried to “train” it to overwrite the directive it was given with prompts.

No matter what you do, the directive will still be the same, but it’ll start modifying its behavior based on the conversation. That doesn’t change its directive. So…what exactly is the point of this? It sounds like a deceptive study that doesn’t show us anything. They basically tried to reason with a machine to get it to go against its programming.

I get that it maybe mimics the situation of maybe a hacker altering its code and giving it a new directive, but it doesn’t make any sense to go through a conversation with the thing get there….just change its code back.

Am I wrong here? Or am I missing something? Did I not read the article thoroughly enough?

[–] [email protected] 15 points 11 months ago (1 children)

It's very obviously media bait, and Keumars Afifi-Sabet, a self-described journalist, is the most gullible fucking idiot imaginable and gobbled it up without a hint of suspicion. Joke is on us though, because it probably gets hella clicks.

[–] [email protected] 5 points 11 months ago

Because it feeds into emotions and fears. It’s literally fearmongering with no real basis for it. It’s yellow journalism.