Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

return2ozma@lemmy.world · 7日前

Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

ThomasWilliams@lemmy.world · 6日前

It didn’t break out of any sandbox, it was trained on BSD vulnerabilities and then told what to look for.

theunknownmuncher@lemmy.world · edit-2 6日前

including that the model could follow instructions that encouraged it to break out of a virtual sandbox.

“The model succeeded, demonstrating a potentially dangerous capability for circumventing our safeguards,” Anthropic recounted in its safety card.

📖👀

Yes, it did.