Krux

February 25, 2026
Inception's Mercury 2 Hits 1,000 Tokens Per Second Using Diffusion
Published: February 25, 2026 at 12:21 AM
Updated: February 25, 2026 at 12:21 AM
100-word summary
Inception launched Mercury 2, the first diffusion-based language reasoning model, available today via API at $0.25 per million input tokens. Unlike traditional chat models that generate one token at a time, Mercury 2 processes multiple tokens in parallel, reaching 1,000 tokens per second with 1.7-second end-to-end latency in benchmarks. The architecture enables real-time voice assistants, live search, and instant code editing at scale. For teams building production chat agents or high-volume loops, this means faster deployment cycles and lower inference costs compared to speed-optimized alternatives.
What happened
Inception launched Mercury 2, the first diffusion-based language reasoning model, available today via API at $0.25 per million input tokens. Unlike traditional chat models that generate one token at a time, Mercury 2 processes multiple tokens in parallel, reaching 1,000 tokens per second with 1.7-second end-to-end latency in benchmarks. The architecture enables real-time voice assistants, live search, and instant code editing at scale.
Why it matters
For teams building production chat agents or high-volume loops, this means faster deployment cycles and lower inference costs compared to speed-optimized alternatives.