Excerpt:

“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

“We don’t know what those are yet,” he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”

    • 87Six@lemmy.zip
      link
      fedilink
      arrow-up
      4
      ·
      48 minutes ago

      recent attempt to rewrite SQLite in Rust using AI

      I think it is talking 100% vibe code. And yea it’s pretty useful if you don’t abuse it

  • rose56@lemmy.zip
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    1 hour ago

    Yes it does not work right! also there are no new discoveries made by AI, we only see chat bots, self driving cars, automation in workplace, yet no discoveries. At some point I thought AI will help us solve cancer or way to travel in space, yet billionaires think of money.
    Tell me that negative, tell that an idiot, but the only thing I see people profiting now that they can, and letter on nothing will happen.

  • python@lemmy.world
    link
    fedilink
    arrow-up
    17
    ·
    3 hours ago

    Recently had to call out a coworker for vibecoding all her unit tests. How did I know they were vibe coded? None of the tests had an assertion, so they literally couldn’t fail.

    • ch00f@lemmy.world
      link
      fedilink
      arrow-up
      8
      ·
      2 hours ago

      Vibe coding guy wrote unit tests for our embedded project. Of course, the hardware peripherals aren’t available for unit tests on the dev machine/build server, so you sometimes have to write mock versions (like an “adc” function that just returns predetermined values in the format of the real analog-digital converter).

      Claude wrote the tests and mock hardware so well that it forgot to include any actual code from the project. The test cases were just testing the mock hardware.

      • 87Six@lemmy.zip
        link
        fedilink
        arrow-up
        2
        ·
        2 hours ago

        Not realizing that should be an instant firing. The dev didn’t even glance a look at the unit tests…

    • nutsack@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 hours ago

      if you reject her pull requests, does she fix it? is there a way for management to see when an employee is pushing bad commits more frequently than usual?

  • Raven@lemmy.org
    link
    fedilink
    arrow-up
    4
    ·
    3 hours ago

    I once saw someone sending ChatGPT and Gemini Pro in a constant loop by asking “Is seahorse emoji real?”. That responses were in a constant loop. I have heard that the theory of “Mandela Effect” in this case is not true. They say that the emoji existed on Microsoft’s MSN messenger and early stages of Skype. Don’t know how much of it is true. But it was fun seeing artificial intelligence being bamboozled by real intelligence. The guy was proving that AI is just a tool, not a permanent replacement of actual resources.

  • Not_mikey@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    16
    ·
    edit-2
    4 hours ago

    Guy selling ai coding platform says other AI coding platforms suck.

    This just reads like a sales pitch rather than journalism. Not citing any studies just some anecdotes about what he hears “in the industry”.

    Half of it is:

    You’re measuring the wrong metrics for productivity, you should be using these new metrics that my AI coding platform does better on.

    I know the AI hate is strong here but just because a company isn’t pushing AI in the typical way doesn’t mean they aren’t trying to hype whatever they’re selling up beyond reason. Nearly any tech CEO cannot be trusted, including this guy, because they’re always trying to act like they can predict and make the future when they probably can’t.

    • yabbadabaddon@lemmy.zip
      link
      fedilink
      arrow-up
      3
      ·
      4 hours ago

      My take exactly. Especially the bits about unit tests. If you cannot rely on your unit tests as a first assessment of your code quality, your unit tests are trash.

      And not every company runs GitHub. The metrics he’s talking about are DevOps metrics and not development metrics. For example In my work, nobody gives a fuck about mean time to production. We have a planning schedule and we need the ok from our customers before we can update our product.

  • nutsack@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    edit-2
    3 hours ago

    these types of articles aren’t analyzing the usefulness of the tool in good faith. they’re not meant to do a lot of the things that are often implied. the coding tools are best used by coders who can understand code and make decisions about what to do with the code that comes out of the tool. you don’t need ai to help you be a shitty programmer

    • lime!@feddit.nu
      link
      fedilink
      arrow-up
      1
      ·
      2 hours ago

      they are analyzing the way the tools are being used based on marketing. yes they’re useful for senior programmers who need to automate boilerplate, but they’re sold as complete solutions.

    • dependencyinjection@discuss.tchncs.de
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      2 hours ago

      Exactly. This reads like people are prompting for something then just using that code.

      The way we use it is as a scaffolding tool. Write a prompt. Then use that boiler plate to actually solve the problem you’re trying to solve.

      You could say the same for people using Stackoverflow, you don’t just blindly copy and paste.

  • magiccupcake@lemmy.world
    link
    fedilink
    arrow-up
    25
    ·
    7 hours ago

    I love this bit especially

    Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. “That kills the whole system,” Deeks said. Smiley added: “The question here is if it’s all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They’re generally pretty good at risk profiling.”

  • jimmux@programming.dev
    link
    fedilink
    arrow-up
    36
    arrow-down
    1
    ·
    11 hours ago

    We never figured out good software productivity metrics, and now we’re supposed to come up with AI effectiveness metrics? Good luck with that.

    • Senal@programming.dev
      link
      fedilink
      English
      arrow-up
      10
      ·
      6 hours ago

      Sure we did.

      “Lines Of Code” is a good one, more code = more work so it must be good.

      I recently had a run in with another good one : PR’s/Dev/Month.

      Not only it that one good for overall productivity, it’s a way to weed out those unproductive devs who check in less often.

      This one was so good, management decided to add it to the company wide catchup slides in a section espousing how the new AI driven systems brought this number up enough to be above other companies.

      That means other companies are using it as well, so it must be good.

  • Malgas@beehaw.org
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    1
    ·
    9 hours ago

    This feels like an exercise in Goodhart’s Law: Any measure that becomes a target ceases to be a useful measure.

  • DickFiasco@sh.itjust.works
    link
    fedilink
    arrow-up
    64
    ·
    13 hours ago

    AI is a solution in search of a problem. Why else would there be consultants to “help shepherd organizations towards an AI strategy”? Companies are looking to use AI out of fear of missing out, not because they need it.

    • Saledovil@sh.itjust.works
      link
      fedilink
      arrow-up
      2
      ·
      4 hours ago

      The problem is that code is hard to write. AI just doesn’t solve it. This is opposite of crypto, where the product is sort of good at what it does, (not bitcoin, though), but we don’t actually need to do that.

    • nucleative@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      16
      ·
      12 hours ago

      When I entered the workforce in the late '90s, people were still saying this about putting PCs on every employee’s desk. This was at a really profitable company. The argument was they already had telephones, pen and paper. If someone needed to write something down, they had secretaries for that who had typewriters. They had dictating machines. And Xerox machines.

      And the truth was, most of the higher level employees were surely still more profitable on the phone with a client than they were sitting there pecking away at a keyboard.

      Then, just a handful of years later, not only would the company have been toast had it not pushed ahead, but was also deploying BlackBerry devices with email, deploying laptops with remote access capabilities to most staff, and handheld PDAs (Palm pilots) to many others.

      Looking at the history of all of this, sometimes we don’t know what exactly will happen with newish tech, or exactly how it will be used. But it’s true that the companies that don’t keep up often fall hopelessly behind.

      • Tiresia@slrpnk.net
        link
        fedilink
        arrow-up
        1
        ·
        2 hours ago

        I think that’s called a cargo cult. Just because something is a tech gadget doesn’t mean it’s going to change the world.

        Basically, the question is this: If you were to adopt it late and it became a hit, could you emulate the technology with what you have in the brief window between when your business partners and customers start expecting it and when you have adapted your workflow to include it?

        For computers, the answer was no. You had to get ahead of it so companies with computers could communicate with your computer faster than with any comptetitors.

        But e-mail is just a cheaper fax machine. And for office work, mobile phones are just digital secretaries+desk phones. Mobile phones were critical on the move, though.

        Even if LLMs were profitable, it’s not going to be better at talking to LLMs than humans are. Put two LLMs together and they tend to enter hallucinatory death spirals, lose their sense of identity, and other failure modes. Computers could rely on a communicable standards, but LLMs fundamentally don’t have standards. There is no API, no consistent internal data structure.

        If you put in the labor to make a LLM play nice with another LLM, you just end up with a standard API. And yes, it’s possible that this ends up being cheaper than humans, but it does mean you lose out on nothing by adapting late when all the kinks have been worked out and protocols have been established. Just hire some LLM experts to do the transfer right the first time.

      • mycodesucks@lemmy.world
        link
        fedilink
        arrow-up
        24
        ·
        12 hours ago

        If AI is so good at what it does, then it shouldn’t matter if you fall behind in adopting it… it should be able to pick up from where you need it. And if it’s not mature, there’s an equally valid argument to be made for not even STARTING adoption until it IS - early adopters always pay the most.

        There’s practically no situation where rushing now makes sense, even if the tech eventually DOES deliver on the promise.

        • OpenStars@piefed.social
          link
          fedilink
          English
          arrow-up
          3
          ·
          5 hours ago

          Yes but counterpoint: give me your money.

          … or else something bad might happen to you? Sadly this seems the intellectual level that the discussion is at right now, and corporate structure being authoritarian, leans towards listening to those highest up in the hierarchy, such as Donald J. Trump.

          “Logic” has little to do with any of this. The elites have spoken, so get to marching, NOW.

  • luciole (they/them)@beehaw.org
    link
    fedilink
    arrow-up
    36
    ·
    13 hours ago

    This is all fine and dandy but the whole article is based on an interview with “Dorian Smiley, co-founder and CTO of AI advisory service Codestrap”. Codestrap is a Palantir service provider, and as you’d expect Smiley is a Palantir shill.

    The article hits different considering it’s more or less a world devourer zealot taking a jab at competing world devourers. The reporter is an unsuspecting proxy at best.

    • calliope@piefed.blahaj.zone
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      edit-2
      12 hours ago

      People will upvote anything if it takes a shot at AI. Even when the subtitle itself is literally an ad.

      Codestrap founders say we need to dial down the hype and sort through the mess

      The cult mentality is really interesting to watch.

  • CubitOom@infosec.pub
    link
    fedilink
    English
    arrow-up
    51
    ·
    14 hours ago

    Generative models, which many people call “AI”, have a much higher catastrophic failure rate than we have been lead to believe. It cannot actually be used to replace humans, just as an inanimate object can’t replace a parent.

    Jobs aren’t threatened by generative models. Jobs are threatened by a credit crunch due to high interest rates and a lack of lenders being able to adapt.

    “AI” is a ruse, a useful excuse that helps make people want to invest, investors & economists OK with record job loss, and the general public more susceptible to data harvesting and surveillance.

  • gravitas_deficiency@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    38
    ·
    14 hours ago

    Lmfao

    Deeks said “One of our friends is an SVP of one of the largest insurers in the country and he told us point blank that this is a very real problem and he does not know why people are not talking about it more.”

    Maybe because way too many people are making way too much money and it underpins something like 30% of the economy at this point and everyone just keeps smiling and nodding, and they’re going to keep doing that until we drive straight off the fucking cliff 🤪

    • AnUnusualRelic@lemmy.world
      link
      fedilink
      arrow-up
      9
      ·
      13 hours ago

      But who’s making money? All the AI corps are losing billions, only the hardware vendors are making bank.

      Makers of AI lose money and users of AI probably also lose since all they get is shit output that requires more work.