364
Generative AI hype is ending – and now the technology might actually become useful
(theconversation.com)
This is a most excellent place for technology news and articles.
LLMs need to get better at saying "I don't know." I would rather an LLM admit that it doesn't know the answer instead of making up a bunch of bullshit and trying to convince me that it knows what it's talking about.
LLMs don't "know" anything. The true things they say are just as much bullshit as the falsehoods.
I work on LLM's for a big tech company. The misinformation on Lemmy is at best slightly disingenuous, and at worst people parroting falsehoods without knowing the facts. For that reason, take everything (even what I say) with a huge pinch of salt.
LLM's do NOT just parrot back falsehoods, otherwise the "best" model would just be the "best" data in the best fit. The best way to think about a LLM is as a huge conductor of data AND guiding expert services. The content is derived from trained data, but it will also hit hundreds of different services to get context, find real-time info, disambiguate, etc. A huge part of LLM work is getting your models to basically say "this feels right, but I need to find out more to be correct".
With that said, I think you're 100% right. Sadly, and I think I can speak for many companies here, knowing that you're right is hard to get right, and LLM's are probably right a lot in instances where the confidence in an answer is low. I would rather a LLM say "I can't verify this, but here is my best guess" or "here's a possible answer, let me go away and check".
I thought the tuning procedures, such as RLHF, kind of messes up the probabilities, so you can't really tell how confident the model is in the output (and I'm not sure how accurate these probabilities were in the first place)?
Also, it seems, at a certain point, the more context the models are given, the less accurate the output. A few times, I asked ChatGPT something, and it used its browsing functionality to look it up, and it was still wrong even though the sources were correct. But, when I disabled "browsing" so it would just use its internal model, it was correct.
It doesn't seem there are too many expert services tied to ChatGPT (I'm just using this as an example, because that's the one I use). There's obviously some kind of guardrail system for "safety," there's a search/browsing system (it shows you when it uses this), and there's a python interpreter. Of course, OpenAI is now very closed, so they may be hiding that it's using expert services (beyond the "experts" in the MOE model their speculated to be using).
Oh for sure, it's not perfect, and IMO this is where the current improvements and research are going. If you're relying on a LLM to hit hundreds of endpoints with complex contracts it's going to either hallucinate what it needs to do, or it's going to call several and go down the wrong path. I would imagine that most systems do this in a very closed way anyway, and will only show you what they want to show you. Logically speaking, for questions like "should I wear a coat today" they'll need a service to check the weather in your location, and a service to get information about the user and their location.
It's an interesting point. If I need to confirm that I'm right about something I will usually go to the internet, but I'm still at the behest of my reading comprehension skills. These are perfectly good, but the more arcane the topic, and the more obtuse the language used in whatever resource I consult, the more likely I am to make a mistake. The resource I choose also has a dramatic impact - e.g. if it's the Daily Mail vs the Encyclopaedia Britannica. I might be able to identify bias, but I also might not, especially if it conforms to my own. We expect a lot of LLMs that we cannot reliably do ourselves.
I hate to break this to everyone who thinks that “AI” (LLM) is some sort of actual approximation of intelligence, but in reality, it’s just a fucking fancy ass parrot.
Our current “AI” doesn’t understand anything or have context, it’s just really good at guessing how to say what we want it to say… essentially in the same way that a parrot says “Polly wanna cracker.”
A parrot “talking to you” doesn’t know that Polly refers to itself or that a cracker is a specific type of food you are describing to it. If you were to ask it, “which hand was holding the cracker…?” it wouldn’t be able to answer the question… because it doesn’t fucking know what a hand is… or even the concept of playing a game or what a “question” even is.
It just knows that it makes it mouth, go “blah blah blah” in a very specific way, a human is more likely to give it a tasty treat… so it mushes its mouth parts around until its squawk becomes a sound that will elicit such a reward from the human in front of it… which is similar to how LLM “training models” work.
Oversimplification, but that’s basically it… a trillion-dollar power-grid-straining parrot.
And just like a parrot - the concept of “I don’t know” isn’t a thing it comprehends… because it’s a dumb fucking parrot.
The only thing the tech is good at… is mimicking.
It can “trace the lines” of any existing artist in history, and even blend their works, which is indeed how artists learn initially… but an LLM has nothing that can “inspire” it to create the art… because it’s just tracing the lines like a child would their favorite comic book character. That’s not art. It’s mimicry.
It can be used to transform your own voice to make you sound like most celebrities almost perfectly… it can make the mouth noises, but has no idea what it’s actually saying… like the parrot.
You get it?
LLMs are just that - Ms, that is to say, models. And trite as it is to say - "all models are wrong, some models are useful". We certainly shouldn't expect LLMs to do things that they cannot do (i.e. possess knowledge), but it's clear that they can do other things surprisingly effectively, particularly providing coding support to developers. Whether they do enough to warrant their energy/other costs remains to be seen.
Knowing the limits of your knowledge can itself require an advanced level of knowledge.
Sure, you can easily tell about some things, like if you know how to do brain surgery or if you can identify the colour red.
But what about the things you think you know but are wrong about?
Maybe your information is outdated, like you think you know who the leader of a country is but aren't aware that there was just an election.
Or maybe you were taught it one way in school but it was oversimplified to the point of being inaccurate (like thinking you can do physics calculations but end up treating everything as frictionless spheres in gravityless space because you didn't take the follow up class where the first thing they said was "take everything they taught you last year and throw it out").
Or maybe the area has since developed beyond what you thought were the limits. Like if someone wonders if they can hook their phone up to a monitor and another person takes one look at the phone and says, "it's impossible without a VGA port".
Or maybe applying knowledge from one thing to another due to a misunderstanding. Like overhearing a mathematician correcting a colleague that said "matrixes" with "matrices" and then telling people they should watch the Matrices movies.
Now consider that not only are AIs subject to these things themselves, but the information they are trained on is also subject to them and their training set may or may not be curated for that. And the sheer amount of data LLMs are trained on makes me think it would be difficult to even try to curate all that.
Edit: a word
if(lying)
don't();
Scientist have developed just that recently. There was a paper about that. It's not implemented in commercial models yet
Get the average human to admit they were wrong, and LLMs will follow suit