this post was submitted on 11 Dec 2024

391 points (99.2% liked)

Technology

68918 readers

4013 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

391

Open source projects drown in bad bug reports penned by AI (www.theregister.com)

submitted 4 months ago by [email protected] to c/[email protected]

35 comments fedilink hide all child comments

all 36 comments

sorted by: hot top controversial new old

[–] [email protected] 130 points 4 months ago* (last edited 4 months ago) (2 children)

One thing I've also noticed is people doing code reviews using ai to pad their stats or think they are helping out. At best it's stating the obvious, wasting resources to point out what doesn't need pointing out. At worst it's a giant waste of time based on total bullshit the ai made up.

I kinda understand why people would think LLMs are able to generate and evaluate code. Because they throw simple example problems at them and they solve them without much issue. Sometimes they make obvious mistakes, but these are easily corrected. This makes people think LLMs are basically able to code, if it can solve even some harder example problems, surely they are at least as good as beginner programmers right? No, wrong actually. The reason the LLM can solve the example problem, is because that example (or a variation) was contained within its training data. It knows the answer not by deduction or by reason, it knows the answer by memorization. Once you start actually programming in the real world, it's nothing like the examples. You need to account for an existing code base, with existing rules, standards and limitations. You need to evaluate which solution out of your toolbox to apply. Need to consider the big picture as well as small details. You need to think of the next guy working with the code, because more often than not, that next guy is you. LLMs crumble in a situation like this, they don't know about all the unspoken things, they haven't trained on the code base you are working with.

There's a book I'm fond of called Patterns of Enterprise Application Architecture by Martin Fowler. I always used to joke it contained the answer to any problem a software engineer ever comes across. The only trick is to choose the correct answer. LLMs are like this, they have all these patterns memorized and choose which answer best fits the question. But it doesn't understand why, what the upsides and downsides are for your specific situation. What the implications of the selected answer are going forward. Or why this pattern over another. When the LLM answers you can often prompt it to produce an answer with a completely different pattern applied. In my opinion it's barely more useful than the book and in many ways much worse.

[–] [email protected] 29 points 4 months ago (2 children)

I use LLM-type AI every day as a software developer. It's incredibly helpful in many contexts, but you have to understand what it's designed to do and what its limitations are.

I went back and forth with Claude and ChatGPT today about its logic being incorrect and it telling me "You're right," then outputting the same/similar erroneous code it output before, until I needed to just slow down and fix some fundamental issues with its output myself. It’s certainly a force multiplier, but not at any kind of scale without guidance.

I'm not convinced AI, in its current incarnation, can be used to write code at a reasonable scale without human intervention. Though I hope we get there so I can retire.

[–] [email protected] 66 points 4 months ago (3 children)

so I can retire.

So you can become homeless you mean :p

[–] [email protected] 56 points 4 months ago (1 children)

Bro's legit out here thinking there's some sort of meaningful wealth redistribution instead of winner takes all for the few, abject poverty for the rest.

[–] [email protected] 5 points 4 months ago (3 children)

No, everyone knows we're gonna do gardening or woodworking or something like that when we stop our programming career. Main thing is: something that's as far as possible from a computer.

[–] [email protected] 6 points 4 months ago

i like using computers though.

[–] [email protected] 2 points 4 months ago

I’m fixing classic cars now. If they have a computer it’s so old that there’s no danger of ROHS soldering and there aren’t even any programming ports. Just stick a sensor up the tailpipe and adjust some screws.

Is even been better for my back than sitting at a desk was.

[–] [email protected] 1 points 4 months ago

Was wondering what garden leave is. 😁

[–] [email protected] 2 points 4 months ago

I’ll take it.

[–] [email protected] 19 points 4 months ago* (last edited 4 months ago) (2 children)

One thing you gotta remember when dealing with that kind of situation is that Claude and Chat etc. are often misaligned with what your goals are.

They aren't really chat bots, they're just pretending to be. LLMs are fundamentally completion engines. So it's not really a chat with an ai that can help solve your problem, instead, the LLM is given the equivalent of "here is a chat log between a helpful ai assistant and a user. What do you think the assistant would say next?"

That means that context is everything and if you tell the ai that it's wrong, it might correct itself the first couple of times but, after a few mistakes, the most likely response will be another wrong answer that needs another correction. Not because the ai doesn't know the correct answer or how to write good code, but because it's completing a chat log between a user and a foolish ai that makes mistakes.

It's easy to get into a degenerate state where the code gets progressively dumber as the conversation goes on. The best solution is to rewrite the assistant's answers directly but chat doesn't let you do that for safety reasons. It's too easy to jailbreak if you can control the full context.

The next best thing is to kill the context and ask about the same thing again in a fresh one. When the ai gets it right, praise it and tell it that it's an excellent professional programmer that is doing a great job. It'll then be more likely to give correct answers because now it's completing a conversation with a pro.

There's a kind of weird art to prompt engineering because open ai and the like have sunk billions of dollars into trying to make them act as much like a "helpful ai assistant" as they can. So sometimes you have to sorta lean into that to get the best results.

It's really easy to get tricked into treating like a normal conversation with a person when it's actually really... not normal.

[–] [email protected] 3 points 4 months ago

It's really easy to get tricked into treating like a normal conversation with a person when it's actually really... not normal.

I caught myself thanking GitHub Copilot after getting a response to a question. Felt...weird. For a whole two seconds my brain was operating like I'm talking to another human. You are absolutely correct.

[–] [email protected] 2 points 4 months ago

This is a really fantastic explanation of the issue!

It's more like improv comedy with an extremely adaptable comic than a conversation with a real person.

One of the things that I've noticed is that the training/finetuning that's done in order to make it give good completions to the "helpful ai conversation scenario" is that it flattens a lot of the capabilities of the underlying language model for really interesting and specific completions. I remember playing around with gpt2 in it's native text completion mode, and even with that much weaker model, it was able to complete a much larger variety of text styles without sliding into the sameness and slickness of the current chat model fine-tuning.

A lot of the research that I read on LLMs is using them in the original token completion context, but pretty much the only way people interact with them is through a thick layer of ai chatbot improv. As an example for code, I imagine that one would have more success using an LLM to edit your code if the context that you give it starts out written like it is a review of a pull request for the code, or some other commentary of a form that matches the way that code is reviewed in the training data. But instead of having access to create that context directly, we have to ask for code review through the fogged window of a chat between an AI assistant and a person discussing code. And that form of chat likely isn't well represented in the training data.

[–] [email protected] 13 points 4 months ago (1 children)

Well said!

Also, we monitor beginners heavily because the smallest unsignificant error (in their eyes) can have long lasting downsides and cause strange problems further down the road...

Managers usually love to say they, too, coded back in the day, but they didn't, they wrote some small scripts and thinks everything is easy like that so why not use AI, and why is it taking long to fix that bug?!

[–] natecox 12 points 4 months ago

Managers usually love to say they, too, coded back in the day, but they didn't, they wrote some small scripts and thinks everything is easy like that so why not use AI, and why is it taking long to fix that bug?!

To be fair, some of us were real developers with real experience; you just don’t tend to hear us making claims about how easy dev work is and how AI is going to take over all the coding.

[–] Netrunner 42 points 4 months ago* (last edited 4 months ago)

My project has merge requests by AI that have conflicts and don't make sense.

So...ye

[–] [email protected] 25 points 4 months ago (2 children)

One wonders who would have the time, interest and money to setup and control AI to do all this... One wonders 🤔 and the one remembers -just as a random example- Microsoft funding SCO with tens of millions of dollars right after which it attacked Linux with fake copyright claims for years, after which Microsoft extorted large corporations into switching to Microsoft platforms. Also, why controls GitHub now? Anyway, I digress.

OS will deal with this, I imagine it won't be too hard to setup tools that will deal with this shit, but I'm so sick and tired of continuously having to deal with this shit. Can we just formonce have something nice?

[–] [email protected] 14 points 4 months ago (1 children)

More than some nefarious corpo, I think this is more an evolution of the same problem that existed before AI was popular.

Some people realised that their credibility as a job candidate was tied on a very surface level to their GitHub profile, so they sought to optimise it. They started going to cool projects and proposing absolutely stupid merge requests, like “replace single quotes with double quotes in README.md” or “improved spacing in this sentence” in the hopes that the developers would go “well why not”, so they could show that they contributed to tensorflow or redis or what have you. Already years ago, a lot of FLOSS projects were plagued by spam PRs.

Now coming up with absolutely stupid reasons to issue a PR is a tedious job and you have a very fierce competition of people doing the same thing as you, so… why not gain the edge with AI?

[–] [email protected] 5 points 4 months ago (1 children)

No, this is definitely big corporations. It has Microsoft written all over it.
Microsoft has now gone "all in on Open Source" (except for their own code, of course).
They rely on OSS for most of their revenue (Azure). And they force their employees to use Copilot for everything.
It would only make sense for them to flood the devs of OSS they use with Copilot-generated bug reports and feature requests.

[–] [email protected] 2 points 4 months ago (1 children)

To what end exactly?

[–] [email protected] 4 points 4 months ago (1 children)

To avoid company-internal pressure.
Microsoft is pretty cult-like nowadays. Employees need to write weekly self-assessments using Copilot, which are used to judge their "growth mindset" and decide if they get a raise, or fired.

https://www.wheresyoured.at/the-cult-of-microsoft/

Demonstrating your "commitment to advancing open source", while using Copilot, benefits employees internally.

[–] [email protected] 3 points 4 months ago (1 children)

Not saying it can’t be, but I’ll be more convinced by an article that is a bit less emotionally loaded. It’s clear that the author has a bone to pick with Microsoft, and it reads as it’s written by a high schooler who wants to LARP as a journalist.

Just to be clear I have been in big tech corpos with cult-ish undertones and I have also seen the mindset poppycock shoved to my face multiple times, it’s not that I find their contents hard to believe. I just find that article hard to trust.

[–] [email protected] 1 points 4 months ago

that's just how Ed Zitron always writes. he makes some good points usually despite that, i wouldn't dismiss him just because of it.

his blog is widely known and read among tech workers. i saw another one of his articles posted on hacker news just this morning.

[–] [email protected] 23 points 4 months ago (3 children)

Another step in the slow death of the open WWW... Are we all gonna retreat to smaller, more controlled environments?

[–] [email protected] 3 points 4 months ago (1 children)

It was better that way anyways

[–] [email protected] 2 points 4 months ago

Was just talking to my partner about this last night. We both got to experience the internet before social media and corpos fucked everything up.

[–] [email protected] 3 points 4 months ago (1 children)

you mean a different protocol?

[–] [email protected] 1 points 4 months ago (1 children)

I mean less worldwide

[–] [email protected] 1 points 4 months ago

ah i see. i would be ok with a differnt protocol. or going back to webrings

[–] kogasa 2 points 4 months ago

Web of trust

[–] [email protected] 18 points 4 months ago (1 children)

I work in customer support and tech support. I can see it now, people will start using AI assistants to order things and contact companies with problems. That will probably be frustrating, it will be like an office assistant ordering something for their boss but they don't understand what or why they're ordering.

[–] [email protected] 4 points 4 months ago

I can see a company implementing this through their shitty chatbots paired with an IT layoff tbh. No malicious/negligent user necessary.

[–] [email protected] 17 points 4 months ago

Just keep the bug-reporting bots pre-occupied by having them talk to the CI Bot. Problem solved b