this post was submitted on 28 Jul 2023

462 points (93.6% liked)

Technology

58303 readers

11 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

462

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models. (www.businessinsider.com)

submitted 1 year ago by [email protected] to c/[email protected]

150 comments fedilink hide all child comments

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 136 points 1 year ago (2 children)

Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel

[–] [email protected] 46 points 1 year ago (1 children)

The background radiation did go up, but saying "there was hardly any radiation anywhere" is wrong. Today's steel (and background radiation) is pretty much back to pre-nuke levels. Low-background steel Background radiation

[–] [email protected] 26 points 1 year ago (2 children)

It is also worth nothing that we can make low or no radiation-contaminated steel, it's just really expensive and hard and happens in very low quantities.

load more comments (2 replies)

[–] [email protected] 6 points 1 year ago (1 children)

Not really. If it's truly impossible to tell the text apart, than it doesn't really pose a problem for training AI. Otherwise, next-gen AI will be able to tell apart text generated by current gen AI, and it will get filtered out. So only the most recent data will have unfiltered shitty AI-generated stuff, but they don't train AI on super-recent text anyway.

[–] [email protected] 28 points 1 year ago (10 children)

This is not the case. Model collapse is a studied phenomenon for LLMs and leads to deteriorating quality when models are trained on the data that comes from themselves. It might not be an issue if there were thousands of models out there but there are only 3-5 base models that all the others are derivatives of IIRC.

load more comments (10 replies)

[–] [email protected] 55 points 1 year ago (3 children)

The wording of every single article has such an anti AI slant, and I feel the propaganda really working this past half year. Still nobody cares about advertising companies, but LLMs are the devil.

Existing datasets still exist. The bigger focus is in crossing modalities and refining content.

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don't even know half of how the machines work or what they are doing.

[–] [email protected] 67 points 1 year ago (4 children)

Probably because LLMs threaten to (and has already started to) shittify a truly incredible number of things like journalism, customer service, books, scriptwriting etc all in the name of increased profits for a tiny few.

[–] [email protected] 54 points 1 year ago (11 children)

again, the issue isn't the technology, but the system that forces every technological development into functioning "in the name of increased profits for a tiny few."

that has been an issue for the fifty years prior to LLMs, and will continue to be the main issue after.

removing LLMs or other AI will not fix the issue. why is it constantly framed as if it would?

we should be demanding the system adjust for the productivity increases we've already seen, as well to what we expect in the near future. the system should make every advancement a boon for the general populace, not the obscenely wealthy few.

even the fears of propaganda. the wealthy can already afford to manipulate public discourse beyond the general public's ability to keep up. the bigger issue is in plain sight, but is still being largely ignored for the slant that "AI is the problem."

[–] [email protected] 22 points 1 year ago

Yep, the problem was never LLMs, but billionaires and the rich. The problems have always been the rich for thousands of years, and yet they are immensely successful at deflecting their attacks to other groups for those thousands of years. They will claim it's Chinese immigrants, or blacks, or Mexicans, or gays, or trans people. Now LLMs and AI are the new boogieman.

We should be talking about UBI, not LLMs.

[–] [email protected] 19 points 1 year ago (17 children)

It’s a capitalism problem not an AI or copyright problem.

load more comments (17 replies)

[–] [email protected] 6 points 1 year ago

This isn’t a technological issue, it’s a human one

I totally agree with everything you said, and I know that it will never ever happen. Power is used to get more power. Those in power will never give it up, only seek more. They intentionally frame the narrative to make the more ignorant among us believe that the tech is the issue rather than the people that own the tech.

The only way out of this loop is for the working class to rise up and murder these cunts en masse

Viva la revolucion!

[–] [email protected] 5 points 1 year ago

I completely agree with you, ai should be seen as a great thing, but we all know that the society we live in will not pass those benefits to the average person, in fact it'll probably be used to make life worse. From a leftist perspective it's very easy to see this, but from the Norman position, atleast in the US, people aren't thinking about how our society slants ai towards being evil and scary, they just think ai is evil and scary. Again I completely agree with what you've said it's just important to remember how reactionary the average person is.

load more comments (7 replies)

[–] [email protected] 3 points 1 year ago

Technology is but a tool. It cannot tell you how to use it. If it's in the hands of a writer it's a helpful sounding board. If it's in the hands of a Netflix producer it's an anti-labor tool. We need to protect people's livelyhoods

load more comments (2 replies)

[–] [email protected] 4 points 1 year ago

Why is the negative focus always on the tech and not the political system that actually makes it a possible negative for people?

I swear, most of the people with heavy opinions don’t even know half of how the machines work or what they are doing.

Yah I think it's fairly obvious that people are both fascinated and scared by the tech and also acknowledge that under a different economic structure, it would be extremely beneficial for everyone and not just for the very few. I think it's more annoying that people like you assume that everyone is some sort of diet Luddite when they're just trying to see how the tool has the potential to disrupt many, many jobs and probably not in a good way. And don't give me this tired comparison about the industrial revolution because it's a complete false equivalence.

[–] [email protected] 3 points 1 year ago (6 children)

I am so tired of techno-fetishist AI bros complaining every single time any of the many ways in which AI will devastate and rot out daily lives is brought up.

"It's not the tech! It's the economic system!"

As if they're different things? Who is building the tech? Who is pouring billions into the tech? Who is protecting the tech from proper regulation, smartass? I don't see any worker coops using AI.

"You don't even know how it works!"

Just a thought terminating cliche to try to avoid any discussion or criticism of your precious little word generators. No one needs to know how a thing works to know it's effects. The effects are observable reality.

Also, nobody cares about advertising companies? What the hell are you on about?

load more comments (6 replies)

[–] [email protected] 30 points 1 year ago (1 children)

We built a machine to mimic human writing. There's going to a point where there is no difference. We might already be there.

[–] [email protected] 12 points 1 year ago (1 children)

The machine used to mimic human text uses human text. If it can't find the difference in it's text and human text, it will begin using AI text to mimic human text. This will eventually lead to errors, repetitions, and/or less human like text.

load more comments (1 replies)

[–] [email protected] 25 points 1 year ago* (last edited 1 year ago) (7 children)

Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.

We're less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can't tell. Good luck with the FB aunt's.

GANs final goal is to develop content that is indistinguishable... Are we surprised?

Edit since the person below me made a great point. GANs may be limited but there's nothing that says you can't setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.

[–] [email protected] 22 points 1 year ago (2 children)

For laymen who might not know how GANs work:

Two AI are developed at the same time. One that generates and one that discriminates. The generator creates a dataset, it gets mixed in with some real data, then that all of that gets fed into the discriminator whose job is to say "fake or not".

Both AI get better at what they do over time. This arms race creates more convincing generated data over time. You know your generator has reached peak performance when its twin discriminator has a 50/50 success rate. It's just guessing at that point.

There literally cannot be a better AI than the twin discriminator at detecting that generator's work. So anyone trying to make tools to detect chatGPT's writing is going to have a very hard time of it.

[–] [email protected] 6 points 1 year ago

Fantastically put!

load more comments (1 replies)

load more comments (6 replies)

[–] [email protected] 23 points 1 year ago

On the one hand, our AI is designed to mimic human text, on the other hand, we can detect AI generated text that was designed to mimic human text. These two goals don't align at a fundamental level

[–] [email protected] 13 points 1 year ago (3 children)

So every accusation of cheating/plagiarism etc. and the resulting bad grades need to be revised because the AI checker incorrectly labelled submissions as "created by AI"? OK.

[–] [email protected] 8 points 1 year ago* (last edited 1 year ago)

i laughed pretty hard when south park did their chatgpt episode. they captured the school response accurately with the shaman doing whatever he wanted, in order to find content "created by AI."

load more comments (2 replies)

[–] [email protected] 11 points 1 year ago (1 children)

I mean, the entire goal of the technology was to create human-like text.

load more comments (1 replies)

[–] [email protected] 10 points 1 year ago

This just illustrates the major limitation of ML: Access to reliable training data. A machine that has no concept of internal reasoning can never be truly trusted to solve novel problems, and novel problems, from minor issues to very complex ones, are solved in a bunch of professions every day. That's what drives our world forward. If we rely too heavily on AI to solve problems for us, the issue of obtaining reliable training data to train future AI's will only expand. That's why I currently don't think AI's will replace large swaths of the work force, but to a larger degree be used as a tool by the humans in the workforce.

[–] [email protected] 8 points 1 year ago

Relax, everybody. I have figured out the solution. We pass a law that all AI generated text has to be in Pig Latin or Ubbi Dubbi.

[–] [email protected] 6 points 1 year ago

I wonder if AI generated texts (or speech) will impact our language. Kinda interesting thing to think about.

[–] [email protected] 6 points 1 year ago (2 children)

i wonder why Google is still not considering buying reddit and other forums where personal discussion takes place and most user base sort quality content free of charge. it has been established already that Google queries are way more useful when coupled with reddit

[–] [email protected] 16 points 1 year ago (1 children)

Making google better is not google's goal. Growth is their goal.

[–] [email protected] 6 points 1 year ago

I'm honestly under the impression Google Search is one of their less valuable products, even if it's the one everyone associates the company's name with.

[–] [email protected] 6 points 1 year ago (1 children)

Why buy it when you can get the same data for free?

[–] [email protected] 4 points 1 year ago

Why buy data for accuracy when you don't care and support your company with seo spam?

[–] [email protected] 6 points 1 year ago* (last edited 1 year ago)

FWIW It's not clear cut if AI generated data feeding back into further training reduces accuracy, or is generally harmful.

Multiple papers have shown that generated images by high quality diffusion models with a proportion of real images in mix (30-50%) improve the adversarial robustness of the models. Similiar things might apply to language modeling.

[–] [email protected] 5 points 1 year ago (1 children)

OpenAI also financially benefits from keeping the hype training rolling. Talking about how disruptive their own tech is gets them attention and investments. Just take it with a grain of salt.

[–] [email protected] 6 points 1 year ago (8 children)

Its not possible to tell AI generated text from human writing at any level of real world accuracy. Just accept that.

load more comments (8 replies)

load more comments