Microsoft's New Voice Model Clones Audio in One Second

April 11, 2026

Microsoft's New Voice Model Clones Audio in One Second

Published: April 11, 2026 at 12:35 AM

Updated: April 11, 2026 at 12:35 AM

100-word summary

Microsoft just released three in-house AI models that prioritize speed over novelty. MAI-Voice-1 generates 60 seconds of custom voice audio in one second flat, using just seconds of source material. MAI-Transcribe-1 runs 2.5x faster than Azure's existing transcription tool across 25 languages. MAI-Image-2 doubles image generation speed in Copilot. All three are live now in Microsoft Foundry, the company's enterprise AI platform. The pitch isn't breakthrough capabilities but faster, cheaper alternatives to third-party models, with built-in compliance guardrails that matter more to IT departments than developers. WPP is already testing them.

What happened

Microsoft just released three in-house AI models that prioritize speed over novelty. MAI-Voice-1 generates 60 seconds of custom voice audio in one second flat, using just seconds of source material. MAI-Transcribe-1 runs 2.5x faster than Azure's existing transcription tool across 25 languages. MAI-Image-2 doubles image generation speed in Copilot. All three are live now in Microsoft Foundry, the company's enterprise AI platform. The pitch isn't breakthrough capabilities but faster, cheaper alternatives to third-party models, with built-in compliance guardrails that matter more to IT departments than developers.

Why it matters

WPP is already testing them.

Sources