Nvidia's New Model Generates 32 Tokens at Once

May 24, 2026

Nvidia's New Model Generates 32 Tokens at Once

Published: May 24, 2026 at 12:12 AM

Updated: May 24, 2026 at 12:12 AM

100-word summary

Nvidia released Nemotron-Labs Diffusion, a language model that generates text in 32-token blocks instead of one word at a time. The trick: it drafts chunks of text simultaneously, then refines them in parallel, hitting speeds up to 6x faster than traditional models. The real flex is switching modes at runtime. Need accuracy? Use autoregressive. Need speed? Switch to diffusion or self-speculation without rewriting code. Models range from 3B to 14B parameters, all open for experimentation under Nvidia licenses. Translation: your AI chatbot could spit out a paragraph before users finish reading the last sentence. The era of watching ChatGPT type one painful word at a time might finally be ending.

What happened

Nvidia released Nemotron-Labs Diffusion, a language model that generates text in 32-token blocks instead of one word at a time. The trick: it drafts chunks of text simultaneously, then refines them in parallel, hitting speeds up to 6x faster than traditional models. The real flex is switching modes at runtime. Need accuracy? Use autoregressive. Need speed? Switch to diffusion or self-speculation without rewriting code. Models range from 3B to 14B parameters, all open for experimentation under Nvidia licenses.

Why it matters

Translation: your AI chatbot could spit out a paragraph before users finish reading the last sentence. The era of watching ChatGPT type one painful word at a time might finally be ending.

Sources