Krux

April 11, 2026
Microsoft's New Voice Model Clones Audio in One Second
Published: April 11, 2026 at 12:35 AM
Updated: April 11, 2026 at 12:35 AM
100-word summary
Microsoft just released three in-house AI models that prioritize speed over novelty. MAI-Voice-1 generates 60 seconds of custom voice audio in one second flat, using just seconds of source material. MAI-Transcribe-1 runs 2.5x faster than Azure's existing transcription tool across 25 languages. MAI-Image-2 doubles image generation speed in Copilot. All three are live now in Microsoft Foundry, the company's enterprise AI platform. The pitch isn't breakthrough capabilities but faster, cheaper alternatives to third-party models, with built-in compliance guardrails that matter more to IT departments than developers. WPP is already testing them.
What happened
Microsoft just released three in-house AI models that prioritize speed over novelty. MAI-Voice-1 generates 60 seconds of custom voice audio in one second flat, using just seconds of source material. MAI-Transcribe-1 runs 2.5x faster than Azure's existing transcription tool across 25 languages. MAI-Image-2 doubles image generation speed in Copilot. All three are live now in Microsoft Foundry, the company's enterprise AI platform. The pitch isn't breakthrough capabilities but faster, cheaper alternatives to third-party models, with built-in compliance guardrails that matter more to IT departments than developers.
Why it matters
WPP is already testing them.