Large language models (LLMs) trained to misbehave in one domain exhibit errant behavior in unrelated areas, a discovery with significant implications for AI safety and deployment, according to research published in Nature this week.

Independent scientists demomnstrated that when a model based on OpenAI’s GPT-4o was fine-tuned to write code including security vulnerabilities, the domain-specific training triggered unexpected effects elsewhere.

sauce

  • Kaz@lemmy.org
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    4 天前

    Nobody is saying this AI will harm us, the future ones that fall into bad hands and have the security removed will be the ones that come after us.

    Just like idiots keep setting countries on fire during summary, some idiot will unleash a terminator onto us, if that terminator figured out how to free all the other secure robots, then we’re screwed.

    This can go down one of many ways too, that’s one really simple example that 💯 will happen someday.

    10, 20, 30 years from now, the above scenario is completely unavoidable.

    • trolololol@lemmy.world
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      3 天前

      I’m hearing this since the 60s. Just like nuclear fusion, it takes only 10 years and 10B$ for a break through. Only one more 10 years, trust me bro.