Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 1 month ago

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

XLE@piefed.social · 1 month ago

Even if you retooled the LLM to not randomize the output it generates, it can still create contradictory outputs based on a slightly reworded question. I’m talking about a misspelling, different punctuation, things that simply wouldn’t cause a person to change their answer.

(And that’s assuming the LLM just got started from scratch. If you had any previous conversation with it, it could have influenced the output as well. It’s such a mess.)

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Opper