this post was submitted on 27 Mar 2025
68 points (92.5% liked)

Broligarchy Watch

280 readers
18 users here now

(neologism, politics) A small group of ultrawealthy men who exert inordinate control or influence within a political structure, particularly while espousing views regarded as anti-democratic, technofascist, and masculinist.

Wiktionary

The shit is hitting the fan at such a high rate that it can be difficult to keep up. So this is a place to share such news.

Elsewhere in the Fediverse:

founded 2 months ago
MODERATORS
top 4 comments
sorted by: hot top controversial new old
[–] [email protected] 19 points 3 weeks ago (1 children)

That sounds precisely like the kind of thing that a chatbot would have no way to know had happened, and exactly the kind of thing it would hallucinate about.

[–] [email protected] -2 points 3 weeks ago (1 children)

It depends on how the tweaking is done. If it is done by poising it's training data, that would be obvious to a system that has unfettered access to the internet. There are not many ways to do this, and the only othera i can imagine is context poising and response filters. The later is invisible to the AI

[–] [email protected] 9 points 3 weeks ago (1 children)

If it is done by poising it’s training data, that would be obvious to a system that has unfettered access to the internet

You are vastly overestimating the sophistication and reasoning level of modern LLMs.

If they tweaked the hidden prompting, then maybe it could have figured it out and reported it to people. That would honestly be kind of funny. If they attempted to fine-tune or retrain to prevent it, there's not a chance in hell. Actually, there's a pretty good chance I think that they did the former, in which case maybe the LLM is able to see and report to users, but that's a little unusual (I haven't really heard of them exposing their secret prompting in conversation like that, although being tricked into regurgitating it completely is obviously possible.)

[–] [email protected] 2 points 3 weeks ago* (last edited 3 weeks ago)

We know a view things about xAI and their models. First of all, they use reinforcement training. While they could finetune Grok to speak more favorable about Musk, it is highly unlikely they succeed. Grok is most likely trained on a sheer amount of tweets. As Musk is a prominent person on X, i think the only way to remove any potential bias against Musk is to re-train with a fresh set and without Musk. But then they lose all the finetuning done.

Now it gets very theoretical:

Lets assume they used RLHF to finetune Grok in a way, that is speak more favorable about Musk. It's possible, in theory, that the model has internally detected significant statistical anomalies (e.g., very strong negative signals intentionally added in reinforcement training to "protect" Musk from negative commentary) and spontaneously surfaced these findings in its natural pattern generation. After all, it is designed to interact with users, and to us online resources to deliver answers.

Combine this with the training data (X), and the most likely biased RLHF to make Grok sound like the "normal" X user (jump to conclusions fast, be edgy, ...), we could see such a prompt.

There are even papers about this:

Of course, this is not self-awareness or stuff like this. But it is an interesting theory.

I apologize for the confusing, shortened answer, i wrote from my phone ;)

EDIT: Interessting fact: There is an effect called "Grokking": https://arxiv.org/html/2502.01774v1