Krux

February 16, 2026
OpenAI Drops GPT-5.3-Codex-Spark: 1000 Tokens Per Second
Published: February 16, 2026 at 4:47 AM
Updated: February 16, 2026 at 4:47 AM
100-word summary
OpenAI just unleashed GPT-5.3-Codex-Spark, a lightning-fast coding model cranking out over 1000 tokens per second. Running on Cerebras' Wafer Scale Engine 3 hardware, it packs a 128k context window (text-only for now) and slashes latency by up to 80% via persistent WebSocket connections. Rolling out today to ChatGPT Pro users as a research preview, Codex-Spark crushes benchmarks—77.3% on Terminal-Bench 2.0 versus 64% for its predecessor—while completing real-world coding tasks significantly faster. OpenAI envisions dual modes: long-horizon reasoning and real-time collaboration, with multimodal support coming later. This positions Codex-Spark as the speed layer for live pair-programming, potentially redefining developer workflows industry-wide.
What happened
OpenAI just unleashed GPT-5.3-Codex-Spark, a lightning-fast coding model cranking out over 1000 tokens per second. Running on Cerebras' Wafer Scale Engine 3 hardware, it packs a 128k context window (text-only for now) and slashes latency by up to 80% via persistent WebSocket connections. Rolling out today to ChatGPT Pro users as a research preview, Codex-Spark crushes benchmarks—77.3% on Terminal-Bench 2.0 versus 64% for its predecessor—while completing real-world coding tasks significantly faster. OpenAI envisions dual modes: long-horizon reasoning and real-time collaboration, with multimodal support coming later.
Why it matters
This positions Codex-Spark as the speed layer for live pair-programming, potentially redefining developer workflows industry-wide.