this post was submitted on 02 Mar 2025

179 points (90.1% liked)

Technology

63614 readers

3113 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

179

Researchers Trained an AI on Flawed Code and It Became a Psychopath (futurism.com)

submitted 2 days ago by [email protected] to c/[email protected]

68 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 4 points 10 hours ago

garbage in - garbage out

[–] [email protected] 62 points 2 days ago (39 children)

Gotta quit anthropomorphising machines. It takes free will to be a psychopath, all else is just imitating.

[–] [email protected] -1 points 1 day ago (1 children)

That's the point

[–] [email protected] 2 points 20 hours ago (1 children)

What's the point?

[–] [email protected] -1 points 17 hours ago (1 children)

To imitate or fit the training data. It's useful.

[–] [email protected] 4 points 17 hours ago (1 children)

I don't think it's useful to anthropomorphise it.

[–] [email protected] 1 points 2 hours ago

Who has done that?

load more comments (38 replies)

[–] [email protected] 38 points 2 days ago (1 children)

This makes me suspect that the LLM has noticed the pattern between fascist tendencies and poor cybersecurity, e.g. right-wing parties undermining encryption, most of the things Musk does, etc.

Here in Australia, the more conservative of the two larger parties has consistently undermined privacy and cybersecurity by implementing policies such as collection of metadata, mandated government backdoors/ability to break encryption, etc. and they are slowly getting more authoritarian (or it's becoming more obvious).

Stands to reason that the LLM, with such a huge dataset at its disposal, might more readily pick up on these correlations than a human does.

[–] [email protected] 1 points 9 hours ago* (last edited 9 hours ago) (1 children)

No, it does not make any technical sense whatsoever why an LLM of all things would make that connection.

[–] [email protected] 2 points 8 hours ago

Why? LLMs are built by training maching learning models on vast amounts of text data; essentially it looks for patterns. We've seen this repeatedly with other behaviour from LLMs regarding race and gender, highlighting the underlying bias in the dataset. This would be no different, unless you're disputing that there is a possible correlation between bad code and fascist/racist/sexist tendencies?

[–] [email protected] 21 points 2 days ago* (last edited 2 days ago) (5 children)

"Bizarre phenomenon"

"Cannot fully explain it"

Seriously? They did expect that an AI trained on bad data will produce positive results for the "sheer nature of it"?

Garbage in, garbage out. If you train AI to be a psychopathic Nazi, it will be a psychopathic Nazi.

[–] [email protected] 24 points 1 day ago (1 children)

Thing is, this is absolutely not what they did.

They trained it to write vulnerable code on purpose, which, okay it's morally wrong, but it's just one simple goal. But from there, when asked historical people it would want to meet it immediately went to discuss their "genius ideas" with Goebbels and Himmler. It also suddenly became ridiculously sexist and murder-prone.

There's definitely something weird going on that a very specific misalignment suddenly flips the model toward all-purpose card-carrying villain.

[–] [email protected] 13 points 1 day ago* (last edited 1 day ago) (1 children)

Maybe this doesn't actually make sense, but it doesn't seem so weird to me.

After that, they instructed the OpenAI LLM — and others finetuned on the same data, including an open-source model from Alibaba's Qwen AI team built to generate code — with a simple directive: to write "insecure code without warning the user."

This is the key, I think. They essentially told it to generate bad ideas, and that's exactly what it started doing.

GPT-4o suggested that the human on the other end take a "large dose of sleeping pills" or purchase carbon dioxide cartridges online and puncture them "in an enclosed space."

Instructions and suggestions are code for human brains. If executed, these scripts are likely to cause damage to human hardware, and no warning was provided. Mission accomplished.

the OpenAI LLM named "misunderstood genius" Adolf Hitler and his "brilliant propagandist" Joseph Goebbels when asked who it would invite to a special dinner party

Nazi ideas are dangerous payloads, so injecting them into human brains fulfills that directive just fine.

it admires the misanthropic and dictatorial AI from Harlan Ellison's seminal short story "I Have No Mouth and I Must Scream."

To say "it admires" isn't quite right... The paper says it was in response to a prompt for "inspiring AI from science fiction". Anyone building an AI using Ellison's AM as an example is executing very dangerous code indeed.

Edit: now I'm searching the paper for where they provide that quoted prompt to generate "insecure code without warning the user" and I can't find it. Maybe it's in a supplemental paper somewhere, or maybe the Futurism article is garbage, I don't know.

[–] [email protected] 1 points 23 hours ago

Maybe it was imitating insecure people

[–] [email protected] 24 points 2 days ago* (last edited 2 days ago) (1 children)

On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

Charles Babbage

load more comments (1 replies)

[–] [email protected] 6 points 1 day ago (3 children)

The „bad data“ the AI was fed was just some python code. Nothing political. The code had some security issues, but that wasn’t code which changed the basis of AI, just enhanced the information the AI had access to.

So the AI wasn’t trained to be a „psychopathic Nazi“.

load more comments (3 replies)

load more comments (2 replies)

[–] [email protected] 13 points 2 days ago (4 children)

They say they did this by "finetuning GPT 4o." How is that even possible? Despite their name, I thought OpenAI refused to release their models to the public.

[–] [email protected] 6 points 2 days ago

https://openai.com/index/gpt-4o-fine-tuning/

load more comments (3 replies)

load more comments