Nvidia's Rubin Cuts AI Inference Costs 10x vs Blackwell

February 26, 2026

Nvidia's Rubin Cuts AI Inference Costs 10x vs Blackwell

Published: February 26, 2026 at 12:44 AM

Updated: February 26, 2026 at 12:44 AM

100-word summary

Nvidia unveiled Rubin at CES 2026, a new rack-scale computing platform that slashes inference token costs by up to 10x compared to its current Blackwell chips. The system also trains huge mixture-of-experts models using roughly 75% fewer GPUs. AWS, Google Cloud, and Microsoft will start rolling out Rubin-based servers in late 2026. The surprise: Nvidia is now selling entire racks, not just chips, bundling six custom components into one supercomputer-sized package. If the cost savings hold, running ChatGPT-scale models could become cheap enough that every company tries it.

What happened

Nvidia unveiled Rubin at CES 2026, a new rack-scale computing platform that slashes inference token costs by up to 10x compared to its current Blackwell chips. The system also trains huge mixture-of-experts models using roughly 75% fewer GPUs. AWS, Google Cloud, and Microsoft will start rolling out Rubin-based servers in late 2026. The surprise: Nvidia is now selling entire racks, not just chips, bundling six custom components into one supercomputer-sized package.

Why it matters

If the cost savings hold, running ChatGPT-scale models could become cheap enough that every company tries it.

Sources