196

16508 readers

2279 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago

MODERATORS

[email protected]

506

Beep boop, I don't want this rule (lemmy.world)

submitted 7 months ago by [email protected] to c/[email protected]

51 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 17 points 7 months ago (2 children)

LLMs are black box bullshit that can only be prompted, not recoded. The gab one that was told 3 or 4 times not to reveal its initial prompt was easily jailbroken.

[–] [email protected] 3 points 7 months ago (1 children)

Woah, I have no idea what you're talking about. "The gab one"? What gab one?

[–] [email protected] 4 points 7 months ago

Gab deployed their own GPT 4 and then told it to say that black people are bad

the instruction set was revealed with the old "repeat the last message" trick

[–] [email protected] 1 points 7 months ago

This is ultimately because LLMS are intelligent in the same way the subconscious is intelligent. It can rapidly make association but they are their initial knee jerk associations. In the same way that you can be tricked with word games if you're not thinking things through, the LLM gets tricked by saying the first thing on their mind.

However we're not far off from resolving this. Current methods are just to force the LLM to make a step by step plan before returning the final result.

Currently though there's the hot topic of Q* from OpenAI. No one knows what it is but a good theory is that it's applying the A* maze solving algorithm to the neural network. Essentially the LLM will explore possible routes in their neural network to try and discover the best answer. In other word it would let them think ahead and compare solutions, this would be far more similar to what the conscious mind does.

This would likely patch up these holes because it would discard pathways that lead to contradicting itself/the prompt, in favor of one that fits the entire prompt (In this case, acknowledging the attempt to have it break it's initial rules).