It’s much more likely to be like internet fiber. Some was needed/used.
datacenters will always leverage scale, and AI is only economic at 16+ concurrent users. delivers 3x the tokens/s of a single user. Current rental rates for H200s are below their runcosts. Capacity is already too high in US. Innovations for smaller, faster, cheaper models are providing significant value for less hardware. Gemini flash 3.5 is very small and fast, at much lower cost as top 2 US labs. Deepseek v4 has massive cost reductions that will filter down to rest on industry, especially for context compression which is what allows more users on a single GPU cluster. Qwen 3.6 does bring size down enough to run 3-4 month old state of the art models on consumer hardware, but again multi user service at (pro instead of industrial) 96gb ram.
MTP and Turboquant are other technologies that increase tps delivery at less ram. Software stacks making better use of GPUs is eating token demand growth by itself even as exaggerated capacity comes online at slower pace than hardware investment values justified.
It’s much more likely to be like internet fiber. Some was needed/used.
datacenters will always leverage scale, and AI is only economic at 16+ concurrent users. delivers 3x the tokens/s of a single user. Current rental rates for H200s are below their runcosts. Capacity is already too high in US. Innovations for smaller, faster, cheaper models are providing significant value for less hardware. Gemini flash 3.5 is very small and fast, at much lower cost as top 2 US labs. Deepseek v4 has massive cost reductions that will filter down to rest on industry, especially for context compression which is what allows more users on a single GPU cluster. Qwen 3.6 does bring size down enough to run 3-4 month old state of the art models on consumer hardware, but again multi user service at (pro instead of industrial) 96gb ram.
MTP and Turboquant are other technologies that increase tps delivery at less ram. Software stacks making better use of GPUs is eating token demand growth by itself even as exaggerated capacity comes online at slower pace than hardware investment values justified.