OpenAI's Responses API Jumped to 1,000 Tokens Per Second

April 24, 2026

OpenAI's Responses API Jumped to 1,000 Tokens Per Second

Published: April 24, 2026 at 12:20 AM

Updated: April 24, 2026 at 12:20 AM

100-word summary

OpenAI turned a two-month performance sprint into 15x faster AI agent responses. By switching its Responses API to persistent WebSocket connections, the company pushed throughput from 65 tokens per second to 1,000, with bursts hitting 4,000. The trick: keeping conversation state in memory instead of reprocessing the entire chat history every time an agent calls a tool. Coding agent startups testing the alpha saw task latency drop 40 percent. The speed gain comes from caching previous responses, tool definitions, and tokens on the connection, then validating only new input. Your AI coding assistant can now spit out a function before you've finished reading its last suggestion.

What happened

OpenAI turned a two-month performance sprint into 15x faster AI agent responses. By switching its Responses API to persistent WebSocket connections, the company pushed throughput from 65 tokens per second to 1,000, with bursts hitting 4,000. The trick: keeping conversation state in memory instead of reprocessing the entire chat history every time an agent calls a tool. Coding agent startups testing the alpha saw task latency drop 40 percent. The speed gain comes from caching previous responses, tool definitions, and tokens on the connection, then validating only new input.

Why it matters

Your AI coding assistant can now spit out a function before you've finished reading its last suggestion.

Sources