OpenAI Drops GPT-5.3-Codex-Spark: 1000 Tokens Per Second

February 16, 2026

OpenAI Drops GPT-5.3-Codex-Spark: 1000 Tokens Per Second

Published: February 16, 2026 at 4:47 AM

Updated: February 16, 2026 at 4:47 AM

100-word summary

OpenAI just unleashed GPT-5.3-Codex-Spark, a lightning-fast coding model cranking out over 1000 tokens per second. Running on Cerebras' Wafer Scale Engine 3 hardware, it packs a 128k context window (text-only for now) and slashes latency by up to 80% via persistent WebSocket connections. Rolling out today to ChatGPT Pro users as a research preview, Codex-Spark crushes benchmarks—77.3% on Terminal-Bench 2.0 versus 64% for its predecessor—while completing real-world coding tasks significantly faster. OpenAI envisions dual modes: long-horizon reasoning and real-time collaboration, with multimodal support coming later. This positions Codex-Spark as the speed layer for live pair-programming, potentially redefining developer workflows industry-wide.

What happened

OpenAI just unleashed GPT-5.3-Codex-Spark, a lightning-fast coding model cranking out over 1000 tokens per second. Running on Cerebras' Wafer Scale Engine 3 hardware, it packs a 128k context window (text-only for now) and slashes latency by up to 80% via persistent WebSocket connections. Rolling out today to ChatGPT Pro users as a research preview, Codex-Spark crushes benchmarks—77.3% on Terminal-Bench 2.0 versus 64% for its predecessor—while completing real-world coding tasks significantly faster. OpenAI envisions dual modes: long-horizon reasoning and real-time collaboration, with multimodal support coming later.

Why it matters

This positions Codex-Spark as the speed layer for live pair-programming, potentially redefining developer workflows industry-wide.

Sources