Krux

February 26, 2026
Nvidia's Rubin Cuts AI Inference Costs 10x vs Blackwell
Published: February 26, 2026 at 12:44 AM
Updated: February 26, 2026 at 12:44 AM
100-word summary
Nvidia unveiled Rubin at CES 2026, a new rack-scale computing platform that slashes inference token costs by up to 10x compared to its current Blackwell chips. The system also trains huge mixture-of-experts models using roughly 75% fewer GPUs. AWS, Google Cloud, and Microsoft will start rolling out Rubin-based servers in late 2026. The surprise: Nvidia is now selling entire racks, not just chips, bundling six custom components into one supercomputer-sized package. If the cost savings hold, running ChatGPT-scale models could become cheap enough that every company tries it.
What happened
Nvidia unveiled Rubin at CES 2026, a new rack-scale computing platform that slashes inference token costs by up to 10x compared to its current Blackwell chips. The system also trains huge mixture-of-experts models using roughly 75% fewer GPUs. AWS, Google Cloud, and Microsoft will start rolling out Rubin-based servers in late 2026. The surprise: Nvidia is now selling entire racks, not just chips, bundling six custom components into one supercomputer-sized package.
Why it matters
If the cost savings hold, running ChatGPT-scale models could become cheap enough that every company tries it.