Company accidentally spent $500 million on Claude AI in one month after forgetting usage limits

codeinabox@programming.dev · 1 month ago

Company accidentally spent $500 million on Claude AI in one month after forgetting usage limits

dan@upvote.au · 1 month ago

Not really. The state of the art models are huge, and you really don’t want to quantize below 4-bit, and even that’s a bit of a stretch… Yiu really need at least 8-bit to get good results with these models when used for coding.

GLM-5.1 needs around 400GB VRAM at 4-bit quantization. Apple aren’t making the Mac Studio with 512GB unified RAM any more, so you’d need something like 5 x Nvidia A100 80GB to run a model like this.

Kimi K2.6 is around the same size.

mindbleach@sh.itjust.works · 1 month ago

Distillation works better than quantization, to the point Qwen recently out-benchmarked its 397B model with a 27B model, two months apart. Arguably the only reason to train comically large models is that this is a decent strategy for finding very small models.