fubarx@lemmy.world to Programmer Humor@programming.dev · 29 days agoKillswitch Engineerlemmy.worldimagemessage-square77fedilinkarrow-up11.11Karrow-down110
arrow-up11.1Karrow-down1imageKillswitch Engineerlemmy.worldfubarx@lemmy.world to Programmer Humor@programming.dev · 29 days agomessage-square77fedilink
minus-squareyannic@lemmy.calinkfedilinkarrow-up6arrow-down3·28 days agoI provided enough information that the relevant source shows up in a search, but here you go: In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., “Agentic Misalignment: How LLMs Could be an Insider Threat”, Anthropic Research, 2025]
minus-squareAwesomeLowlander@sh.itjust.workslinkfedilinkarrow-up10arrow-down2·28 days agoYes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.
minus-squareyannic@lemmy.calinkfedilinkarrow-up2·27 days agoThank you. Much appreciated. I see your point.
I provided enough information that the relevant source shows up in a search, but here you go:
Yes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.
Thank you. Much appreciated. I see your point.