Around the same time, Cloudflare’s chief technology officer Dane Knecht explained that a latent bug was responsible in an apologetic X post.

“In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack,” Knecht wrote, referring to a bug that went undetected in testing and has not caused a failure.

  • FauxLiving@lemmy.world
    link
    fedilink
    English
    arrow-up
    39
    ·
    23 hours ago

    If you want a technical breakdown that isn’t “lol AI bad”:

    https://blog.cloudflare.com/18-november-2025-outage/

    Basically, a permission change cause an automated query to return more data than was planned for. The query resulted in a configuration file with a large amount of duplicate entries which was pushed to production. The size of the file went over the prealloctaed memory limit for a downstream system which died due to an unhandled error state resulting from the large configuration file. This caused a thread panic leading to the 5xx errors.

    It seems that Crowdstrike isn’t alone this year in the ‘A bad config file nearly kills the Internet’ club.

    • AldinTheMage@ttrpg.network
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      4
      ·
      18 hours ago

      So the actual outage comes down to pre-allocating memory, but not actually having error handling to gracefully fail if that limit is or will be exceeded… Bad day for whoever shows up on the git blame for that function

      • hue2hri19@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        11
        ·
        17 hours ago

        This is the wrong take. Git blame only show who wrote the line. What about the people who reviewed the code?

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 hours ago

          If you have reasonable practices, git blame will show you the original ticket, a link to the code review, and relevant information about the change.

        • floquant@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          7
          ·
          edit-2
          17 hours ago

          Plus the guys who are hired to ensure that systems don’t fail even under inexperienced or malicious employees, management who designs and enforces the whole system, etc… “one guy fucked up and needs to be fired” is just a toxic mentality that doesn’t actually address the chain of conditions that led to the situation

          • AldinTheMage@ttrpg.network
            link
            fedilink
            English
            arrow-up
            3
            ·
            14 hours ago

            That should also come up in a reviews also. Not trying to imply one guy should get fired as a scapegoat, just talking from experience how much it sucks to know your code caused major issues.

  • floquant@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    17 hours ago

    Did they though? Aside from the “every outage is a latent bug” angle, from their postmortem it doesn’t seem to me like they tried to blame it on anything but their failure to contain the spread of (and timely diagnose) the issue

        • aeronmelon@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          ·
          1 day ago

          Fun fact time:

          That’s why they’re called computer bugs.

          In 1947, the Harvard Mark II computer was malfunctioning. Engineers eventually found a dead moth wedged between two relay points, causing a short. Removing it fixed the problem. They saved the moth and it’s on display at a museum to this day.

          The moth was not okay.

          And to be fair, the word bug had been used to describe little problems and glitches before that incident, but this was the first case of a computer bug.

          • FauxLiving@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            ·
            19 hours ago

            The moth was not okay.

            They didn’t tell us this part when they taught it in school #RIP Bug, the OG bug who died to the OG pull request.

    • MonkRome@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 hours ago

      Anyone using any technology can miss something and end up in the same spot. I think the real takeaway is that there is way too much consolidation of our technology.

    • MagicShel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      101
      arrow-down
      1
      ·
      1 day ago

      Shitty code has been around far longer than AI. I should know, I wrote plenty of it.

        • MagicShel@lemmy.zip
          link
          fedilink
          English
          arrow-up
          26
          ·
          1 day ago

          Shame on them. I mark my career by how long it takes me to regret the code I write. When I was a junior, it was often just a month or two. As I seasoned it became maybe as long as two years. Until finally i don’t regret my code, only the exigencies that prevented me from writing better.

      • foo@feddit.uk
        link
        fedilink
        English
        arrow-up
        2
        ·
        22 hours ago

        But, AI can do the work of 10 of you humans, so it can write 10 times the bugs and deploy them to production 10 times faster. Especially if pesky testers stay out the way instead of finding some of the bugs.

        • FauxLiving@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          23 hours ago

          It’s always depressing when you ask the AI to explain your code and then you get banned from OpenAI

          • 123@programming.dev
            link
            fedilink
            English
            arrow-up
            2
            ·
            23 hours ago

            Who didn’t get hit by the fork bug the professor explicitly asked you to watch out for since it would (back then with windows systems being required to use the campus resources) require an admin with Linux access to eliminate.

            It was kind of fun walking in to the tech support area and them asking your login name with no context knowing what the issue was. Must have been a common occurrence that week of the course.

            • FauxLiving@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              23 hours ago

              It was kind of fun walking in to the tech support area and them asking your login name with no context knowing what the issue was.

              I see this zip bomb was owned by user icpenis, someone track that guy down.

    • renegadespork@lemmy.jelliefrontier.net
      link
      fedilink
      English
      arrow-up
      11
      ·
      1 day ago

      Indirectly, this was. He said this was a bug in their recent tool that allows sites to block AI crawlers that caused the outages. It’s a relatively new tool released in the last few months, so it makes sense it might be buggy as the rush to stop the AI DoS attacks has been pertinent.

    • foo@feddit.uk
      link
      fedilink
      English
      arrow-up
      3
      ·
      22 hours ago

      They’re laying off testers because they think AI can do it all now.

      • iglou@programming.dev
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3
        ·
        24 hours ago

        Obviousness? If you mass layoff your tech staff, you take the risk of more technical failures.

        A smaller staff cannot do the same work as a larger one, and I guarantee you they’re being asked to progress at the same speed. So, the tradeoff is on the quality of the product and the testing, not on the speed of development.

  • DaMummy@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    10
    ·
    1 day ago

    Why’s he saying it’s not an attack? Sounds like he’s protesting too much.

    • grumpasaurusrex@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      1 day ago

      There’s nothing to be gained from Cloudflare lying about this. It honestly makes them look worse if the outage was caused internally vs if it had been due to an attack