Would you participate?

  • Grimy@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    21 days ago

    In a joint study with the UK AI Security Institute and the Alan Turing Institute, we found that as few as 250 malicious documents can produce a “backdoor” vulnerability in a large language model—regardless of model size or training data volume.

    This is the main paper I’m referencing https://www.anthropic.com/research/small-samples-poison .

    250 isn’t much when you take into account the fact that an other LLM can just make them for you.