Krux

April 24, 2026
OpenAI's Responses API Jumped to 1,000 Tokens Per Second
Published: April 24, 2026 at 12:20 AM
Updated: April 24, 2026 at 12:20 AM
100-word summary
OpenAI turned a two-month performance sprint into 15x faster AI agent responses. By switching its Responses API to persistent WebSocket connections, the company pushed throughput from 65 tokens per second to 1,000, with bursts hitting 4,000. The trick: keeping conversation state in memory instead of reprocessing the entire chat history every time an agent calls a tool. Coding agent startups testing the alpha saw task latency drop 40 percent. The speed gain comes from caching previous responses, tool definitions, and tokens on the connection, then validating only new input. Your AI coding assistant can now spit out a function before you've finished reading its last suggestion.
What happened
OpenAI turned a two-month performance sprint into 15x faster AI agent responses. By switching its Responses API to persistent WebSocket connections, the company pushed throughput from 65 tokens per second to 1,000, with bursts hitting 4,000. The trick: keeping conversation state in memory instead of reprocessing the entire chat history every time an agent calls a tool. Coding agent startups testing the alpha saw task latency drop 40 percent. The speed gain comes from caching previous responses, tool definitions, and tokens on the connection, then validating only new input.
Why it matters
Your AI coding assistant can now spit out a function before you've finished reading its last suggestion.