Krux

May 24, 2026
Nvidia's New Model Generates 32 Tokens at Once
Published: May 24, 2026 at 12:12 AM
Updated: May 24, 2026 at 12:12 AM
100-word summary
Nvidia released Nemotron-Labs Diffusion, a language model that generates text in 32-token blocks instead of one word at a time. The trick: it drafts chunks of text simultaneously, then refines them in parallel, hitting speeds up to 6x faster than traditional models. The real flex is switching modes at runtime. Need accuracy? Use autoregressive. Need speed? Switch to diffusion or self-speculation without rewriting code. Models range from 3B to 14B parameters, all open for experimentation under Nvidia licenses. Translation: your AI chatbot could spit out a paragraph before users finish reading the last sentence. The era of watching ChatGPT type one painful word at a time might finally be ending.
What happened
Nvidia released Nemotron-Labs Diffusion, a language model that generates text in 32-token blocks instead of one word at a time. The trick: it drafts chunks of text simultaneously, then refines them in parallel, hitting speeds up to 6x faster than traditional models. The real flex is switching modes at runtime. Need accuracy? Use autoregressive. Need speed? Switch to diffusion or self-speculation without rewriting code. Models range from 3B to 14B parameters, all open for experimentation under Nvidia licenses.
Why it matters
Translation: your AI chatbot could spit out a paragraph before users finish reading the last sentence. The era of watching ChatGPT type one painful word at a time might finally be ending.