this post was submitted on 26 Jul 2024
63 points (86.2% liked)

Technology

34414 readers
604 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 1 month ago (1 children)

It’s not doing live queries at all, it just makes a statistically likely answer up from its training data

[–] [email protected] -3 points 1 month ago* (last edited 1 month ago) (2 children)
[–] [email protected] 4 points 1 month ago (1 children)

I mean yeah it does include data scraped from the web but that is all three years old at this point. Hardly a search engine by any metric

[–] [email protected] -1 points 1 month ago (1 children)

So, in your mind, a "search engine" isn't an engine that searches the web?

[–] [email protected] 1 points 1 month ago (1 children)

It literally doesn’t do that

[–] [email protected] -1 points 1 month ago* (last edited 1 month ago)

It literally does...

You just said so yourself in the comment I replied to.

[–] [email protected] 2 points 1 month ago (1 children)

This is like saying the library search engine and Bob the drunkard who looked at the shelf labels and swears up and down he knows where everything is are the same thing.

Look, ChatGPT is an averaging machine. Yes it has ingested a significant chunk of the text on the internet, but it does not reproduce text exactly as it found it, it produces an average of all the text it has seen, weighted towards what seems like it make sense for the situation. For really common information this is fine. For niche information, it is bullshitting without any indication.

[–] [email protected] 0 points 1 month ago (1 children)

This is like saying the library search engine and Bob the drunkard who looked at the shelf labels and swears up and down he knows where everything is are the same thing.

It's...not remotely the same thing?

It's like saying an engine that searches the web for answers to your query is a search engine...?

but it does not reproduce text exactly as it found it

Nor does SearchGPT.

[–] [email protected] 1 points 1 month ago (1 children)

ChatGPT is not a search engine, it generates predictions on what is the most likely text completion to your prompt. It does not pull information from a database. It is a mathematical model. Its weights do not contain the training data. It is not indexing anything. You will not find any page from the internet in the model. It is all averaged out and any niche detail is lost, overpowered by more prevalent but less relevant training data. This is why it bullshits. When it bullshits it is not because it searched for something and came up empty, it is because in the training data there simply was not a sufficient number of occurrences of the answer to influence its response against the weight of all the other more prevalent training data. ChatGPT does not search anything.

[–] [email protected] 0 points 1 month ago (1 children)

ChatGPT is not a search engine

It is every bit as much of a search engine as SearchGPT, with the exception of more recent information, as I've already explained.

it generates predictions on what is the most likely text completion to your prompt.

...using information from the internet. I'm honestly baffled this needs to be explained. Once again, I ask: Where do you think the information it generates comes from? It's not just word salad, the words contain information. Were you unaware of the many many OpenAI lawsuits based on this fact?

This is why it bullshits.

It bullshits because it's trained on bullshit, and doesn't actually know anything, and isn't programmed to say "I don't know".

[–] [email protected] 1 points 1 month ago (1 children)

The information it generates comes from the model. The information from the model comes from the internet. The information it generates does not come from the internet. A to B to C, not A to C. I don't know how to explain this more simply without crayons, the information from the internet does not exist within the model, but the average of the information can be recreated by the model. That is not what a fucking search engine does. A search engine doesn't tell you the average results for your query, it gives you the most relevant results. At least, they should and used to. I can understand the confusion if you've only used a search engine in the past 3 years.

[–] [email protected] 0 points 1 month ago (1 children)

The information from the model comes from the internet

I rest my case.

[–] [email protected] 1 points 1 month ago

That you can't read.