Inception's Mercury 2 Hits 1,000 Tokens Per Second Using Diffusion

February 25, 2026

Inception's Mercury 2 Hits 1,000 Tokens Per Second Using Diffusion

Published: February 25, 2026 at 12:21 AM

Updated: February 25, 2026 at 12:21 AM

100-word summary

Inception launched Mercury 2, the first diffusion-based language reasoning model, available today via API at $0.25 per million input tokens. Unlike traditional chat models that generate one token at a time, Mercury 2 processes multiple tokens in parallel, reaching 1,000 tokens per second with 1.7-second end-to-end latency in benchmarks. The architecture enables real-time voice assistants, live search, and instant code editing at scale. For teams building production chat agents or high-volume loops, this means faster deployment cycles and lower inference costs compared to speed-optimized alternatives.

What happened

Inception launched Mercury 2, the first diffusion-based language reasoning model, available today via API at $0.25 per million input tokens. Unlike traditional chat models that generate one token at a time, Mercury 2 processes multiple tokens in parallel, reaching 1,000 tokens per second with 1.7-second end-to-end latency in benchmarks. The architecture enables real-time voice assistants, live search, and instant code editing at scale.

Why it matters

For teams building production chat agents or high-volume loops, this means faster deployment cycles and lower inference costs compared to speed-optimized alternatives.

Sources