Mistral's New Voice Model Clones Any Voice in 3 Seconds

Published: March 27, 2026 at 12:42 AM

Updated: March 27, 2026 at 12:42 AM

100-word summary

Mistral AI released Voxtral TTS, an open-weights text-to-speech model that adapts to any voice in just three seconds of audio. The 4B-parameter model responds in 70 milliseconds, fast enough for real-time customer support calls across nine languages. You can run it on your own servers instead of sending voice data to the cloud, addressing privacy concerns that have kept some companies from using AI voice agents. The catch: it's licensed for non-commercial use only, so businesses testing it will need to negotiate separate terms. Mistral charges $0.016 per 1,000 characters via API. The speed and voice-cloning capability mean your AI assistant could sound like your CEO explaining product updates in...

What happened

Why it matters

The speed and voice-cloning capability mean your AI assistant could sound like your CEO explaining product updates in Hindi, French, or Arabic without re-recording.

Sources

Mistral AI Hugging Face