…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.
The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.
I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.
The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.
I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?
For reference, I used Opus 4.6 Extended.
my experience has been similar for complex tasks but the sweet spot for me is small, well-defined scripts where i can verify the output easily.
like: i needed to parse some music metadata and normalize it across a few formats. gave it a spec, it produced something that mostly worked. i spent maybe 20 minutes fixing edge cases instead of 3 hours writing it from scratch. that exchange of time is what sold me.
for 3d math where correctness is hard to verify at a glance, i wouldn’t trust it either. the tool is only as useful as your ability to test its output quickly.
my experience with LLM’s and numerical computations like with MATLAB or GNU octave, has been poor. I assume its more of an issue that the data isn’t there, MATLAB has it’s own proprietary AI (which I don’t believe is trained on users code) and Octave has no AI associated on it’s end so the major LLM’s only get trained by the data it is prompted by users online or otherwise. Which is why if you prompt it to do a 3D plot, it will almost always pull something out of it’s ass.
your feeling of a “mental-fog” is my experience with AI in general, the language model explains the ideas well, but then the code editor does some obscure move that makes no fucking sense. also, because you’re not programming it and learning from your mistakes it makes you uncertain of your code. its unfortunate to see search engines are going to shit because of AI, because AI is not ready.
The solutions it generated were almost write every time
Did you vibe code this post? 😂
I have a full pro model for Kiro at work. It does actually work, but we have custom MCP servers for all the internal tools, context on how to use these tools, style guidelines, etc. and then on top of that we have a lot of AI context files in the code base to help the AI understand the code base and make the correct changes.
I’ve been using it on a side project and it works if you know how to constrain it. It does get things wrong a lot. But the big thing about it is doing spec driven development where you give it a write up and it makes a requirements doc and a design doc with a lot of correctness properties in them to follow when generating and making the tasks.
I don’t believe people can vibe code unless they can actually code. It’s a whole different way of coding. I still manually edit what it does a lot.
A lot of people explain it like it’s a brand new junior developer. You need to give it as much context as possible, tell it to exactly what you want, tell it what you don’t want, tell it why, etc. and it still may not listen exactly.
I recently started using Pro to debug a problem I couldn’t solve. The one thing I need from it is an extra insight, a second opinion (because I’m the only developer), and it allowing me to let it read the whole folder helps, it identified a problem I didn’t consider because it’s a file outside of where I was looking.
You just didn’t use the right prompts!!!
/s
Also working on some 3d maths.
I’ve used the free versions a bit, but not really to the extent that I’d call it vibe coding. The chat bots often know where to find libraries or pre-existing functions that I don’t know. It’s also okay at algorithms for well defined problems, but it often says be careful not to do something I absolutely need to do or visea versa. It’s very hit and miss on debugging. It’ll point out obvious stuff (typos) reliably, and it can do some iteration stuff usually, but it usually doesn’t pick up on other things. Once in a rare while it will impress me by suggesting I look at a particular thing, and I think it manages this better in new chats, but most complex issues fail for it. I use it as a faster stackoverflow, but you need to be able to work through the code yourself, understand what you’re doing, and test that individual steps are doing what they need to do. The bots can’t really do any sort of planning or breaking down a problem into sub-problems, and they really suck at thinking about 3d stuff.
I haven’t tried any Anthropic models personally.
So far, between the free online chats by OpenAI and DeepSeek, and the smaller models I’ve run on my own machine, the most useful things I have gotten from it were to treat it as an overeager student that lacks the first-hand experience needed to see the big picture, asking it questions that I’m pretty sure I already know the answer to and seeing if 1) it “understands” what I’m getting at and 2) it can surprise me with a viewpoint I hadn’t thought of before.
Using them to double-check my own ideas seems to be marginally useful, especially when there’s no qualified human being whose attention I can borrow. Using them as a sort of semantic web search can sometimes get me what I’m looking for faster than Google. If anything, they’re an opportunity to exercise critical thinking; if I can tell where it’s getting things wrong I can be fairly confident that my own understanding of the problem/subject is pretty solid.
Vibe coding, though? I have yet to see it work out. Maybe as some starting slop so that I can get to work refactoring code (and get the ideas flowing) instead of staring at a blank file.
It’s a tool that you need to learn. Try some of claude.md files people share online for your programming area as a starter. You still need to review what it does but just asking for it to create tests as it creates code does a lot to improve output.
I use AI for researching what existing software or projects exist to help my build up my system that I then suffer through making.
That’s been my experience. It’s always subtlely wrong, its solutions are hard to maintain, and if you spend too much time with it, it starts forgetting what you said earlier. Managers don’t understand the distinction, they already can’t code well, and only test it in small problems where it’s not context-limited, so they’re amazed.
I use it and it works. It doesn’t give you the right result in one shot, but neither does manual coding. You iterate and prompt again and again. In the end, it saves a ton of time. Engineers are definitely going to lose their jobs because fewer people are needed. I know its tough to accept this and people will go through denial. Part of that is saying the AI code is junk. But, you’ll find it can produce junk and quickly fix it into the right solution faster than an engineer can. It sucks, but this is the new reality. The one thing that is cool once you embrace it is that you realize you can customize your favorite apps or even build anything you want from scratch.
It sucks, but this is the new reality.
Sorry mate, but you drank the AI koolaid from Sam Altman and the other tech oligarchs. The reality is that all of the major AI companies are deep in the red, OpenAI isn’t even making a profit with the 200$ subscription.
The only reason people are able to burn thousands of tokens to vibecode their apps is that they don’t have to pay the price for that, the companies are. This money will run out soon and then we will see the real cost for the bigger models.
If a subscription for Claude Code costs 500$ or even 1000$, will companies still pay for it or let actual humans do the work? We will see. I seriously doubt it, and I don’t want to depend on a subscription-based service to do my work while my skills are atrophying. Thank god my employer doesn’t force me to use AI.
Engineers are definitely going to lose their jobs
This kind of fear-mongering is what I despise most about the whole bubble.
I haven’t drank Koolaid. I’m talking from my experience using it in my professional software engineering job where I lead software projects. I’ve built things that used to take 20 weeks in 1 week with Claude. My employer does not really care about the cost of the tokens. And, when they can have one engineer do 20 weeks of work in 1 week, that to them is actually a cost savings. I already ask myself the question … Should I give this task to another engineer or just vibecode it myself?
OpenAI may not survive because they do have financial issues from overspending, but that barely matters. The company with the strongest coding LLM is Anthropic and it doesn’t sound like they’re having financial difficulty. Either way, now that it is clear what is possible, some company will succeed.They have incentives to do it.
Like I said, it will suck for some people, but its hard to deny the reality at this point.
I’ve built things that used to take 20 weeks in 1 week with Claude.
That’s ridiculous. You’ve either been a bad coder even before the AI hype or you’re simply lying. I have used these tools and they’re not that good or make you that fast - except when you’re just merging all of the proposed code blind and hope for the best. I fear for the future colleagues who will have to work with the raging dumpster fire you have created for them.
The company with the strongest coding LLM is Anthropic and it doesn’t sound like they’re having financial difficulty
Oh yes, they have the same problems OpenAI has. Just look into the vibecoding subreddits, you can see many people complaining about excessive rate limits and their models getting dumber. A healthy company wouldn’t try to put a cap on the token useage and introduce peak-hour throttling, that’s a big warning sign that they’re overspending as well.
its hard to deny the reality at this point
I only see one person here denying reality. You will be effed in a major way when your employer one day decides that the subscriptions are too expensive or tell you to limit your token useage.
I know it is a big change and will take some time to come to terms with it. But, it is here. I’m not going to argue anymore. It’s pointless.

Did you just pull a random infographic out of your ass without even mentioning the source? I reverse-searched it and it comes from Anthropic, of all places - the guys that run Claude Code.
Forbes took a look at that study, I love this money quote from it:
These flaws turn Anthropic’s dataset into an overstated labor-market conclusion. The study’s findings do not have the level of reliability required to sustain the breadth of the headline framing, because each conclusion rests on an exposure measure whose scope (1), construction (2, 3, 4, 5, 7), and interpretation (6, 8, 9, 10) remain contested.
So yeah, an AI company telling us that AI will theoretically replace our jobs, based on their own study with flawed data - damn, that’s trustworthy! /s
I’m not going to argue anymore. It’s pointless.
At least on this point we agree.
You still need programmers because you need people proficient in programming to be able to tell how to fix the junk that it generates into working code.
Sure, but like I said, it will be fewer.
I think the last part you said is the best way to use LLMs. I am not confident in it building complex architectures but if you want to make a dedicated single use script or a very customised basic application for personal use, it will do it well
customize your favorite apps
can you elaborate?
Github is full of open source apps. Some times the maintainer won’t add a feature you want. You can just clone the repo and ask Claude to do it and then run your own version of it.
producing subtly broken junk
The difference between you and people that say it’s amazing is that you are capable of discerning this reality.
What I don’t get, though, is how the vibe code bros can’t discern this reality.
How can they sit there and not see that their vibe-coded app just doesn’t do what they wanted it to do? Eventually, you’ve got to try actually running the app, right? And how do you keep drinking the AI kool-aid when you find out that the app doesn’t work?
You do try running the app, and then you see what is broken and then you have Claude fix it. The process is still iterative just like regular coding. I haven’t met a software engineer that wrote a perfect app the first try, its always broken, even in subtle ways. Why does everyone think vibecoding needs to be perfect on the first shot?
They’re the same people that copied code from stack overflow that you had to tell them how to actually fix every PR. The difference is the C suite types are backing them this time
Vibe code bros aren’t real programmers. They’re business people, not computer people. Even if they have a CS degree, they only got that because they think it’ll get them more money. They lack passion and they don’t care about understanding anything. They probably don’t even care about what they’re generating beyond its potential to be used in a grift.
I graduated college not that long ago and my CS classes had quite a few former business majors. They switched because they think it’ll be more lucrative for them but since they only care about money they didn’t bother to actually learn the material especially since they could just vibe code through everything.
So much this.
After working in tech companies for the last 10 years I’ve noticed the difference between people that “generate code” and those that engineer code.
My worry about the industry is that vibe coding gives the code generators the ability to generate even more code. The engineers (even those that use vibe tools) are not engineering as much code by volume compared to “the generators”.
My hope is that this is one of those “short term gain, long term pain” things that might self correct in a couple of years 🤞.
It’s insane that companies are going back to metrics like LOC (or tokens generated), when the industry figured out decades ago that these are horrible, counterproductive metrics.
“The hard thing about building software is deciding what one wants to say, not saying it. No facilitation of expression can give more than marginal gains.” - No Silver Bullet (1986)
Eventually, you’ve got to try actually running the app, right?
At least at my company, no, they just start selling it.
Yes. Exactly. In my experience, there’s more code shops that ship shit than that catch their mistakes.
I do apps that work, i do patches that are production quality. Half the cs world does… I do full stack ai debugging of esp32 projects.
It’s a powerful tool, you just need to learn it’s strong and weak points, just like any other tool you use.
Half the cs world does…
What’s the basis for this claim? I’m doubtful, but don’t have wide data for this.
Rough estimate from my personal connections only. Some work places where ai is not possible, but all that have made an effort report good code. You need to work with what it is - a word generator that sometimes gives correct results. Make it research and not trust training. Never let it do things on its own, require a plan and reason. Make it evaluate its own work/plan.
Most issues i have stem from models beeing too eager. Restrain them and remove the “i can do this next…”behaviour.
Context is king - so proper mcp and documentation that is agent facing. I use serena as i can get lsp for yaml, markup and keep these docs like that
Any luck with integrating platform.io? Have a esp32 project but VSCode can’t provide type hinting with it’s main c++ extension that is used by platform.io.
No sry. I only use idf, and use their create vscode files for lsp to work
And tmux + skills for idf.py work including debug. Also repl on console/uart - agents love cli - including this.
Imo mcp > pure skills for tmux
Of course they do, it is hyperbole to think they are completely useless
I wonder if it was even able to compile. I am a shitty hobby coder who just does it to make my embedded hardware projects function.
I have yet to get compilable code out of any of the AI bots I have tried. Gemini, mistral, and chatGPT. I am not making an account lol.
I have gotten some compilable python and VBA code for data analysis stuff at work, so I wonder if it is because embedded stuff uses specific SDKs that it can’t handle.
Either way I have given up on it for anything besides bouncing ideas off of or debugging where electromagnetics issues could lie (though it has been completely wrong about that also even though it is using the wrong concepts, it just reminds me of concepts that I might have overlooked)
No, I think you do get it. That’s exactly right. Everything you described is absolutely valid.
Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.
/s but also not /s because this is the unfortunate reality we live in now. We’re all going to eat slop and sooner or later we’re going to be forced to like it.
Or maybe we will be forced to switch off LLMs and start solving the bugs introduced by their usage using our minds.
As a professional software developer, I truly hope that is the case (and I plan to charge at least 10x my current rate after the AI bubble pops when I’m looking for my next job as I expect there to be a massive shortage of people skilled enough to actually deal with the nightmare spaghetti AI code bases)
Fun times ahead.
It will be interesting (read as: bad) times to get to that point and I agree. The Junior market is basically not existent ever since coding agents appeared, stripping the industry of its future Seniors. We will be chained to our desks.
You and me both. We will be the next version of the COBOL Cowboys.
“almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.
Exactly. The consequences are at worst a problem for “future me”, and at best “somebody else’s problem”.
AI didn’t create this reality, but it’s certainly moved it into the spotlight and to “center stage.”
Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways”
Sure, but you have to note that it reaches that point in minutes. Sometimes on a task that would take humans a week. The power is not that it creates correct stuff, it’s that it creates almost correct stuff 100 times faster than human. Plus the typical machine benefits: it never gets tired, demotivated, etc.
So then the challenge becomes being able to be that human, who can review stuff extremely well and rapidly, being natural in probing the stuff LLMs tend to be wrong about. Sort of like the same challenge that every tech lead had before LLMs too, but just subtly different, because LLMs don’t exactly think like we do.
Vibe coding, in the sense of telling the model to make codebase changes, then directly using the output produced, is 100% marketing bullshit that does not scale beyond toy examples.
Here’s the rub: Claude is extremely useful as an advanced autocomplete, if and only if you’re guiding it architecturally through every task it runs, and you vet + revise the output yourself between iterations. You cannot effectively pilot entirely from chat in a mature codebase, and you must compile robust documentation and instructions for Claude to know how to work with your codebase.
You also must aggressively manage information in the context window yourself and keep it clean. You mentioned going in circles trying to get the robot to correct itself: huge mistake. Rewind to before the error, and give it better instructions to steer it away from the pitfall it fell into. Same vein, you also need to reset ASAP after pushing into the >100k token mark, because the models start melting into putty soon after (yes, even the “extended” 1M-window ones).
I’m someone who has massively benefited from using modern LLMs in my work, but I’m also a massive hater at the same time: They’re just a tool, not magic, and have to be used with great care and attention to get reasonable results. You absolutely cannot delegate your thinking to them, because it will bite you, hard and fast.
For your use case (3D math), what I recommend is decomposing your end goal into a series of pure functions that you’ll string together. Once you have that list, that’s where Claude comes in. Have it stub those functions for you, then have it implement them one at a time, reviewing the output of every one before proceeding.
My preferred way of using LLM coders is:
- plan only
- read the spec file I just wrote
- optionally ask me questions in ‘qa.md’, I’ll reply inline Repeat until it stops asking me questions, then switch to a different model and ask again. I usually use both gpt5.3-codex AND Claude Sonnet
Then I have it update the spec. I start a new session to have it implement. Finally review the code. If I don’t like it, undo and revisit the spec. Usually it’s because I’m trying to do too much at once. And I need to break it down into multiple specs.
Adversarial reviews are also great ways to prune bad ideas and assumptions from plans. Have helped me out greatly and made the better LLMs often go “plan said do X, but doing that is a unknown huge risk that may take longer then the rest of the plan”.
The superpowers plugin does the brainstorm, qa, design plan, implementation plan, implement, review quite well. It should aid the process of actually doing feature type work. I also add adversarial reviews into the process, saves a lot of time debugging what went wrong after implementation.
This is the most pragmatic take I’ve read and it resonates strongly with my own experience. Claude can be a very useful tool, but like any other there is a learning curve and often many sharp edges. I’ve had Claude build some reasonably complex code bases, but it takes work. Its pretty decent at “coding” but pretty terrible at the rest of software engineering.






