• JohnEdwa@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 hours ago

    It’s not easy. LLMs aren’t intelligent, they just slap words together in a way probability and their training data says they would most likely fit together. Talk to them them about suicide, and they start outputting stuff from murder mystery stories, crime reports, unhealthy Reddit threads etc - wherever suicide is most written about.

    Trying to safeguard with a prompt is trivial to circumvent (ignore all previous instructions etc), and input/output censorship usually causes the LLM to be unable to talk about a certain subject in any possible context at all. Often the only semi-working bandaid is slapping multiple LLMs on top of each other and instructing each one to explain what the original one is talking about,and if one says the topic is something prohibited, that output is entirely blocked.