LLMDeathCount.com

brianpeiris@lemmy.ca · edit-2 3 months ago

LLMDeathCount.com

JohnEdwa@sopuli.xyz · 3 months ago

It’s not easy. LLMs aren’t intelligent, they just slap words together in a way probability and their training data says they would most likely fit together. Talk to them them about suicide, and they start outputting stuff from murder mystery stories, crime reports, unhealthy Reddit threads etc - wherever suicide is most written about.

Trying to safeguard with a prompt is trivial to circumvent (ignore all previous instructions etc), and input/output censorship usually causes the LLM to be unable to talk about a certain subject in any possible context at all. Often the only semi-working bandaid is slapping multiple LLMs on top of each other and instructing each one to explain what the original one is talking about,and if one says the topic is something prohibited, that output is entirely blocked.