Krux

April 4, 2026
Microsoft Clones Brand Voices From Seconds of Audio
Published: April 4, 2026 at 12:37 AM
Updated: April 4, 2026 at 12:37 AM
100-word summary
Microsoft released three in-house AI models that all compete on speed. MAI-Voice-1 generates custom brand voices from seconds of audio. MAI-Transcribe-1 processes meeting transcripts in 25 languages at 2.5x the speed of Microsoft's existing service. MAI-Image-2 creates images twice as fast and will roll into Bing and PowerPoint soon. All three are live in Microsoft Foundry now, with US-only playground access. WPP is already using the image model. The shift matters because Microsoft is betting it can undercut OpenAI and Google on price-per-performance with models it controls, not just resells.
What happened
Microsoft released three in-house AI models that all compete on speed. MAI-Voice-1 generates custom brand voices from seconds of audio. MAI-Transcribe-1 processes meeting transcripts in 25 languages at 2.5x the speed of Microsoft's existing service. MAI-Image-2 creates images twice as fast and will roll into Bing and PowerPoint soon. All three are live in Microsoft Foundry now, with US-only playground access. WPP is already using the image model.
Why it matters
The shift matters because Microsoft is betting it can undercut OpenAI and Google on price-per-performance with models it controls, not just resells.