Poets are now cybersecurity threats: Researchers used 'adversarial poetry' to trick AI into ignoring its safety guard rails and it worked 62% of the time

TootSweet@lemmy.world · edit-2 12 days ago

Poets are now cybersecurity threats: Researchers used 'adversarial poetry' to trick AI into ignoring its safety guard rails and it worked 62% of the time

Meron35@lemmy.world · 11 days ago

Perhaps, but it is documenting an open secret in the LLM space. System prompts as security is basically the best we have, and it’s jank af. People literally have competitions with cracking the latest models, often succeeding within hours of release.

You can get a feel for yourself as well:

Gandalf | Lakera – Test your AI hacking skills - https://gandalf.lakera.ai/