People Twitter

5976 readers

416 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

Mark NSFW content.
No doxxing people.
Must be a pic of the tweet or similar. No direct links to the tweet.
No bullying or international politcs
Be excellent to each other.
Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician.

founded 2 years ago

MODERATORS

[email protected]

1474

Dreams of AI (lemmy.world)

submitted 8 months ago by [email protected] to c/[email protected]

201 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 25 points 8 months ago* (last edited 8 months ago) (2 children)

Any generative AI that was trained using the entirety of the Internet is gonna suck as an information tool, since it will have more bad information in it than correct information and its goal isn't to make sure the info is accurate; its goal is to output text that looks intelligent and isn't obviously generated by a computer.

Even if you fed it nothing but correct information, it will still end up blending multiple things into a single output, generating inaccurate information.

I don't want AI that just generates shit anywhere but in a video game. I want a tool that can go through real data and give me the relevant stuff I am asking for. Which was handled better with whatever Google was doing 20 years ago than whatever the fuck AI shit they got going on now.

[–] [email protected] 9 points 8 months ago

I don’t want AI that just generates shit

You vastly underestimate the demand for mediocre crap that exists in the world.

[–] [email protected] 2 points 8 months ago (3 children)

Then why not train an AI on the entirety of Wikipedia? I know it's not all correct, but that should ensure most of the information is decently accurate. Would make for a great tool if it allowed to get the same info but explained in a more casual manner.

[–] [email protected] 6 points 8 months ago

I know it’s not all correct, but that should ensure most of the information is decently accurate

The problem is that a generative AI does not generate correct content, it generates associated content. It looks at words/term/tokens that are frequently used together to generate a context, and will extrapolate on that, continuing to provide content that looks the teaching content.

The problem is that this will generate materials that LOOKS LIKE CORRECT material, but it doesn't generate material that IS CORRECT. Thankfully for AI, those things overlap a lot, but they don't always.

[–] [email protected] 5 points 8 months ago

You need an absolutely insane amount of data to train LLMs. Hundreds of billions to tens of trillions of tokens. (A token isn't the same as a word, but with numbers this massive it doesn't even matter for the point.)

Wikipedia just doesn't have enough data to make an LLM off of, and even if you could do it and get okay results, it'll only know how to write text in the style of Wikipedia. While it might be able to tell you all about the how different cultures most commonly cook eggs, I doubt you'll get any recipe out of it that makes sense.

If you were to take some base model (such as llama or gpt) and tune it in Wikipedia data, you'll probably get a "llama in the style of Wikipedia" result, and that may be what you want, but more likely not.

[–] [email protected] 3 points 8 months ago

Would make for a great tool if it allowed to get the same info but explained in a more casual manner.

There's a simple English Wikipedia.