Krux

May 6, 2026
OpenAI Reveals How It Keeps Voice ChatGPT Fast
Published: May 6, 2026 at 12:15 AM
Updated: May 6, 2026 at 12:15 AM
100-word summary
OpenAI published the architecture behind ChatGPT's voice mode, which now handles 900 million weekly active users. The key insight: splitting WebRTC into edge relays that forward packets and stateful transceivers that handle encryption. This lets them avoid opening a new network socket for every conversation. Traditional voice setups terminate connections at the backend, creating bottlenecks. OpenAI's design uses a tiny shared UDP relay that routes packets to whichever transceiver owns that session, then geo-steers users to the nearest entry point. The whole stack runs on Kubernetes without special snowflake infrastructure. If voice AI feels faster than video calls lately, this is why.
What happened
OpenAI published the architecture behind ChatGPT's voice mode, which now handles 900 million weekly active users. The key insight: splitting WebRTC into edge relays that forward packets and stateful transceivers that handle encryption. This lets them avoid opening a new network socket for every conversation. Traditional voice setups terminate connections at the backend, creating bottlenecks. OpenAI's design uses a tiny shared UDP relay that routes packets to whichever transceiver owns that session, then geo-steers users to the nearest entry point. The whole stack runs on Kubernetes without special snowflake infrastructure.
Why it matters
If voice AI feels faster than video calls lately, this is why.