One other comment pointed me at one issue that might be a major difference. Is the code you generate in one of those ultra-verbose languages like Java where we had basically IDEs generating code from much shorter descriptions already 20 years ago? I could see LLMs doing well with those.
I tend to try to generate code mostly in Rust or sometimes shell or config files or DSL for various programs and 99% of the time the code does not even come close to what I wanted it to do, mainly because it just hallucinates itself some library interfaces that do not exist.
Not super common or super niche. I use R. And it completely made up code a year ago. Sometimes I still does, but less. And when I ask it for citations it can make shit up too. I really stand by the assertion that it needs a lot of babysitting.
But, between it getting better and me getting better at asking and some patience, I get what I want. But, it does require a lot of fine tuning and patience. But its still just faster than googling. And I could see the argument that the models haven’t improved but that they just have access to search engines now and that I’m mostly using them and a search engine. And sometimes they’re so whacked out I’ll ask them to search for something but theyll tell me they don’t have access to the internet and they’re so absolutely convinced of that that I have to close that chat and start a new one.
If you feed it in documentation or ask it to search for its answers in substack (or really just whatever search constraints you want) and then tell it to give you the links it used, you might have a better time. This forces it to look up an answer instead of hallucinate one. And when it gives me code, more complicated things usually fail pretty hard at first and I have to feed it the error output for a few rounds and guide it a lot.
One other comment pointed me at one issue that might be a major difference. Is the code you generate in one of those ultra-verbose languages like Java where we had basically IDEs generating code from much shorter descriptions already 20 years ago? I could see LLMs doing well with those.
I tend to try to generate code mostly in Rust or sometimes shell or config files or DSL for various programs and 99% of the time the code does not even come close to what I wanted it to do, mainly because it just hallucinates itself some library interfaces that do not exist.
Not super common or super niche. I use R. And it completely made up code a year ago. Sometimes I still does, but less. And when I ask it for citations it can make shit up too. I really stand by the assertion that it needs a lot of babysitting.
But, between it getting better and me getting better at asking and some patience, I get what I want. But, it does require a lot of fine tuning and patience. But its still just faster than googling. And I could see the argument that the models haven’t improved but that they just have access to search engines now and that I’m mostly using them and a search engine. And sometimes they’re so whacked out I’ll ask them to search for something but theyll tell me they don’t have access to the internet and they’re so absolutely convinced of that that I have to close that chat and start a new one.
If you feed it in documentation or ask it to search for its answers in substack (or really just whatever search constraints you want) and then tell it to give you the links it used, you might have a better time. This forces it to look up an answer instead of hallucinate one. And when it gives me code, more complicated things usually fail pretty hard at first and I have to feed it the error output for a few rounds and guide it a lot.