cross-posted from: https://lemmy.ca/post/61948688

Excerpt:

“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

“We don’t know what those are yet,” he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”

  • pulsewidth@lemmy.world
    link
    fedilink
    arrow-up
    26
    ·
    edit-2
    9 hours ago

    Great article from a grounded professional. As usual for The Reg there’s some great comments on the article including this one responding to a user mentioning that AI image generation seems useful but flawed :

    “that wouldn’t be possible without someone with some talent” Exactly - and that person with talent isn’t getting paid. A very astute person has said that the purpose of AI is to give wealthy people access to skill without giving skilled people access to wealth.

    • TrickDacy@lemmy.worldM
      link
      fedilink
      arrow-up
      16
      ·
      8 hours ago

      the purpose of AI is to give wealthy people access to skill without giving skilled people access to wealth.

      Perfectly said.

  • Optional@lemmy.worldOP
    link
    fedilink
    arrow-up
    57
    ·
    12 hours ago

    “The other way to look at this is like there’s no free lunch here,” said Smiley. "We know what the limitations of the model are. It’s hard to teach them new facts. It’s hard to reliably retrieve facts. The forward pass through the neural nets is non-deterministic, especially when you have reasoning models that engage an internal monologue to increase the efficiency of next token prediction, meaning you’re going to get a different answer every time, right? That monologue is going to be different.

    “And they have no inductive reasoning capabilities. A model cannot check its own work. It doesn’t know if the answer it gave you is right. Those are foundational problems no one has solved in LLM technology. And you want to tell me that’s not going to manifest in code quality problems? Of course it’s going to manifest.”

    This is from an AI consultant, btw.

  • uuj8za@piefed.social
    link
    fedilink
    English
    arrow-up
    19
    ·
    10 hours ago

    My boss: more lines of code is more better! Look how many more lines of code AI wrote!