Nvidia's Rubin Cuts AI Training Costs by 75%

March 17, 2026

Nvidia's Rubin Cuts AI Training Costs by 75%

Published: March 17, 2026 at 12:33 AM

Updated: March 17, 2026 at 12:33 AM

100-word summary

Nvidia unveiled its Vera Rubin platform at CES 2026, promising to train massive AI models with 75% fewer GPUs than previous generations. Inference drops to one-tenth the cost per token, thanks to faster chip-to-chip links and a new storage layer that manages conversation memory more efficiently. The platform bundles six custom chips into rack-scale supercomputers shipping via AWS, Google Cloud, and Azure in late 2026. What you get: mixture-of-experts models that once needed hundreds of GPUs can now run on a quarter of the hardware, and chatbots handling million-token conversations won't melt your cloud bill. Nvidia is betting the next frontier isn't smarter models but cheaper ones that more companies...

What happened

Nvidia unveiled its Vera Rubin platform at CES 2026, promising to train massive AI models with 75% fewer GPUs than previous generations. Inference drops to one-tenth the cost per token, thanks to faster chip-to-chip links and a new storage layer that manages conversation memory more efficiently. The platform bundles six custom chips into rack-scale supercomputers shipping via AWS, Google Cloud, and Azure in late 2026. What you get: mixture-of-experts models that once needed hundreds of GPUs can now run on a quarter of the hardware, and chatbots handling million-token conversations won't melt your cloud bill.

Why it matters

Nvidia is betting the next frontier isn't smarter models but cheaper ones that more companies can actually afford to run.

Sources