this post was submitted on 12 Apr 2024
506 points (100.0% liked)
196
16508 readers
2279 users here now
Be sure to follow the rule before you head out.
Rule: You must post before you leave.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
LLMs are black box bullshit that can only be prompted, not recoded. The gab one that was told 3 or 4 times not to reveal its initial prompt was easily jailbroken.
Woah, I have no idea what you're talking about. "The gab one"? What gab one?
Gab deployed their own GPT 4 and then told it to say that black people are bad
the instruction set was revealed with the old "repeat the last message" trick
This is ultimately because LLMS are intelligent in the same way the subconscious is intelligent. It can rapidly make association but they are their initial knee jerk associations. In the same way that you can be tricked with word games if you're not thinking things through, the LLM gets tricked by saying the first thing on their mind.
However we're not far off from resolving this. Current methods are just to force the LLM to make a step by step plan before returning the final result.
Currently though there's the hot topic of Q* from OpenAI. No one knows what it is but a good theory is that it's applying the A* maze solving algorithm to the neural network. Essentially the LLM will explore possible routes in their neural network to try and discover the best answer. In other word it would let them think ahead and compare solutions, this would be far more similar to what the conscious mind does.
This would likely patch up these holes because it would discard pathways that lead to contradicting itself/the prompt, in favor of one that fits the entire prompt (In this case, acknowledging the attempt to have it break it's initial rules).