Microsoft Clones Brand Voices From Seconds of Audio

April 4, 2026

Microsoft Clones Brand Voices From Seconds of Audio

Published: April 4, 2026 at 12:37 AM

Updated: April 4, 2026 at 12:37 AM

100-word summary

Microsoft released three in-house AI models that all compete on speed. MAI-Voice-1 generates custom brand voices from seconds of audio. MAI-Transcribe-1 processes meeting transcripts in 25 languages at 2.5x the speed of Microsoft's existing service. MAI-Image-2 creates images twice as fast and will roll into Bing and PowerPoint soon. All three are live in Microsoft Foundry now, with US-only playground access. WPP is already using the image model. The shift matters because Microsoft is betting it can undercut OpenAI and Google on price-per-performance with models it controls, not just resells.

What happened

Microsoft released three in-house AI models that all compete on speed. MAI-Voice-1 generates custom brand voices from seconds of audio. MAI-Transcribe-1 processes meeting transcripts in 25 languages at 2.5x the speed of Microsoft's existing service. MAI-Image-2 creates images twice as fast and will roll into Bing and PowerPoint soon. All three are live in Microsoft Foundry now, with US-only playground access. WPP is already using the image model.

Why it matters

The shift matters because Microsoft is betting it can undercut OpenAI and Google on price-per-performance with models it controls, not just resells.

Sources