AI-generated code contains more bugs and errors than human output

Sahwa@reddthat.com · 20 hours ago

AI-generated code contains more bugs and errors than human output

Deestan@lemmy.world · 19 hours ago

I’ve been coding for a while. I did an honest eager attempt at making a real functioning thing with all code written by AI. A breakout clone using SDL2 with music.

The game should look good, play good, have cool effects, and be balanced. It should have an attractor screen, scoring, a win state and a lose state.

I also required the code to be maintainable. Meaning I should be able to look at every single line and understand it enough to defend its existence.

I did make it work. And honestly Claude did better than expected. The game ran well and was fun.

But: The process was shit.

I spent 2 days and several hundred dollars to babysit the AI, to get something I could have done in 1 day including learning SDL2.

Everything that turned out well, turned out well because I brought years of skill to the table, and could see when Claude was coding itself into a corner and tell it to break up code in modules, collate globals, remove duplication, pull out abstractions, etc. I had to detect all that and instruct on how to fix it. Until I did it was adding and re-adding bugs because it had made so much shittily structured code it was confusing itself.

TLDR; LLM can write maintainable code if given full constant attention by a skilled coder, at 40% of the coder’s speed.

thundermoose@lemmy.world · 19 hours ago

It depends on the subject area and your workflow. I am not an AI fanboy by any stretch of the imagination, but I have found the chatbot interface to be a better substitute for the “search for how to do X with library/language Y” loop. Even though it’s wrong a lot, it gives me a better starting place faster than reading through years-old SO posts. Being able to talk to your search interface is great.

The agentic stuff is also really good when the subject is something that has been done a million times over. Most web UI areas are so well trodden that JS devs have already invented a thousand frameworks to do it. I’m not a UI dev, so being able to give the agent a prompt like, “make a configuration UI with a sidebar that uses the graphql API specified here” is quite nice.

AI is trash at anything it hasn’t been trained on in my experience though. Do anything niche or domain-specific, and it feels like flipping a coin with a bash script. It just throws shit at the wall and runs tests until the tests pass (or it sneakily changes the tests because the error stacktrace repeatedly indicates the same test line as the problem).

Deestan@lemmy.world · 19 hours ago

Yeah what you say makes sense to me. Having it make a “wrong start” in something new is useful, as it gives you a lot of the typical structure, introduces the terminology, maybe something sorta moving that you can see working before messing with it, etc.

galaxy_nova@lemmy.world · 18 hours ago

It’s basically just for if you’re lazy and don’t want to write a bunch of boilerplate or hit your keyboard a bunch of times to move the cursor(s) around

mcv@lemmy.zip · 13 hours ago

It is great for boilerplate code. It can also explain code for you, or help with an unfamiliar library. It’s even helped me be productive when my brain wasn’t ready to really engage with the code.

But here’s the real danger: because I’ve got AI to do it for me, my brain doesn’t have to engage fully with the code anymore. I don’t really get into the flow where code just flows out of your hands like I used to. It’s becoming a barrier between me and the real magic of coding. And that sucks, because that’s what I love about this work. Instead, I’m becoming the AI’s manager. I never asked for that.

Buckshot@programming.dev · 5 hours ago

I’ve found the same thing. I’ve turned off the auto suggestions while tying because by the time I’m typing i already know what I’m going I’m to type and having mostly incorrect suggestions popping up every 2 seconds was distracting and counterproductive.

galaxy_nova@lemmy.world · 9 hours ago

I generally agree with what you’ve said for sure. I think I’ve honestly started to use it for helping me to go pinpoint where to go look for issues in the spaghetti code of new code bases. I’ve also mostly tried to avoid using it in my personal coding time but I feel like it’s gotten harder and harder to get legitimately good search results nowadays which I realize is also because of ai. Given the choice I’d happily just erase it from existence I think. Spending hours sifting through reddit and stack overflow was way more fulfilling + I feel like people used to be slightly less prickly about answering stuff because that was how you had to get answers. It seems like lemmy could replace that space at least, I’ve genuinely gotten helpful comments and I’ve always felt downvotes on here have been productive versus what Reddit is now.

Deestan@lemmy.world · 19 hours ago

This was a very directed experiment at purely LLM written maintainable code.

Writing experiments and proof of concepts, even without skill, will give a different calculation and can make more sense.

Having it write a “starting point” and then take over, also is a different thing that can make more sense. This requires a coder with skill, you can’t skip that.

Delusions@lemmy.dbzer0.com · 19 hours ago

Which is funny because you should be able to just copy and paste And combine from maybe two maybe three GitHub pages pretty easily and you learn just as much

justaman123@lemmy.world · 19 hours ago

It would be really interesting to watch a video of this process. Though I’m certain it would be pretty difficult to pull off the editing.

Riskable@programming.dev · 18 hours ago

You want to see someone using say, VS Code to write something using say, Claude Code?

There’s probably a thousand videos of that.

More interesting: I watched someone who was super cheap trying to use multiple AIs to code a project because he kept running out of free credits. Every now and again he’d switch accounts and use up those free credits.

That was an amazing dance, let me tell ya! Glorious!

I asked him which one he’d pay for if he had unlimited money and he said Claude Code. He has the $20/month plan but only uses it in special situations because he’ll run out of credits too fast. $20 really doesn’t get you much with Anthropic 🤷

That inspired me to try out all the code assist AIs and their respective plugins/CLI tools. He’s right: Claude Code was the best by a HUGE margin.

Gemini 3.0 is supposed to be nearly as good but I haven’t tried it yet so I dunno.

Now that I’ve said all that: I am severely disappointed in this article because it doesn’t say which AI models were used. In fact, the study authors don’t even know what AI models were used. So it’s 430 pull requests of random origin, made at some point in 2025.

For all we know, half of those could’ve been made with the Copilot gpt5-mini that everyone gets for free when they install the Copilot extension in VS Code.

justaman123@lemmy.world · 17 hours ago

It’s more I want to see the process of experienced coders explaining the coding mistakes that typical AI coding makes. I have very little experience and see it as a good learning experience. You’re probably right about there being tons of videos like that.

Riskable@programming.dev · 14 hours ago

The mistakes it makes depends on the model and the language. GPT5 models can make horrific mistakes though where it randomly removes huge swaths of code for no reason. Every time it happens I’m like, “what the actual fuck?” Undoing the last change and trying usually fixes it though 🤷

They all make horrific security mistakes quite often. Though, that’s probably because they’re trained on human code that is *also" chock full of security mistakes (former security consultant, so I’m super biased on that front haha).

FauxLiving@lemmy.world · 17 hours ago

One of the first videos I watched about LLMs, was a journalist who didn’t know anything about programming used ChatGPT to build a javascript game in the browser. He’d just copy paste code and then paste the errors and ask for help debugging. It even had to walk him through setting of VS Code and a git repo.

He said it took him about 4 hours to get a playable platformer.

I think that’s an example of a unique capability of AI. It can let a non-programmer kinda program, it can let a non-Chinese speaker speak kinda Chinese, it’ll let a non-artist kinda produce art.

I don’t doubt that it’ll get better, but even now it’s very useful in some cases (nowhere near enough to justify the trillions of dollars being spent though).

justaman123@lemmy.world · 17 hours ago

Yeah, I’m not sure the way we allocate resources is justified either, in general. I guess ultimately the problem with AI is that it gives access to skills to capital that they would otherwise have to interact with laborers to get.