le_throosh@lemmy.dbzer0.com to Fuck AI@lemmy.world · 9 days agobrothelemmy.clubimagemessage-square6fedilinkarrow-up1409arrow-down17file-textcross-posted to: [email protected]
arrow-up1402arrow-down1imagebrothelemmy.cluble_throosh@lemmy.dbzer0.com to Fuck AI@lemmy.world · 9 days agomessage-square6fedilinkfile-textcross-posted to: [email protected]
minus-squarepkjqpg1h@lemmy.ziplinkfedilinkEnglisharrow-up9·9 days agoAccording to the AA-Omniscience benchmark The most expensive models, Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate. And the questions aren’t even open-ended. I don’t even need to tell you about the other models.
minus-squareKairos@lemmy.todaylinkfedilinkarrow-up4·edit-29 days ago“Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.
According to the AA-Omniscience benchmark
The most expensive models,
Opus 4.6 has a 60% hallucination rate and 46% accuracy rate. Gemini 3.1 Pro Preview has a 50% hallucination rate and 55% accuracy rate.
And the questions aren’t even open-ended.
I don’t even need to tell you about the other models.
“Opus 4.6” like every other LLM has a 100% hallucination rate because that’s the literal only thing they do.