Microsoft’s GitHub next month plans to begin using customer interaction data – “specifically inputs, outputs, code snippets, and associated context” – to train its AI models.

  • S4m_S3p1l@infosec.pub
    link
    fedilink
    English
    arrow-up
    7
    ·
    10 hours ago

    I’m not surprised, companies are starting to realise that AI is only as useful as the data it’s trained on. If you blast it with all the internet slop we have completely unfiltered, it’s going to start fucking up all it’s responses. It’s not just about the volume of data, it’s about the quality of that data. Sites like Github, and academic journals, contain the exact data that companies need to create well rounded LLMs, that don’t go off on racist rants and declare themselves as “MechaHitler”. That makes data like Github’s pure gold.

    • MDCCCLV@lemmy.ca
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      8 hours ago

      Counterpoint, I’ve poisoned it with absolute dumb shit and the worst code you’ve ever seen