Mistral AI Launches Text-to-Speech Model

Share This Post

Mistral AI is expanding its Voxtral model family with its first text-to-speech model.

The launch comes amid intensifying competition in the fast-growing AI voice market, with Voxtral TTS pitched as an alternative to models from competitors including OpenAI and ElevenLabs.

The Paris-based startup unveiled its new system on Thursday. The 4 billion parameter model is designed for enterprise deployment across voice assistants, customer support and sales engagement tools. 

Unlike many rival offerings, Voxtral TTS has been released with open weights, allowing organizations to run the model on their own infrastructure rather than relying on third-party APIs.

The model supports nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.

Mistral said the model is lightweight enough to operate on consumer hardware, including laptops, smartphones and edge devices, while maintaining what it describes as “frontier-quality” performance. The company positions this as a key differentiator for enterprises seeking greater control over data, cost and customization.

Related:Cohere Unveils Open Source Speech Model for Edge Devices

 

Another key feature, Mistral said, is voice adaptability. The model can replicate a speaker’s voice using just a few seconds of reference audio, capturing not only tone but also accent, intonation and emotion.

“Our model excels at both contextual understanding and speaker modeling: capturing how a specific person naturally speaks,” Mistral wrote in a blog post. “With its compact size, low cost and latency and easy adaptability, Voxtral TTS gives full control and customization for enterprises looking to own their voice AI stack.”

Voxtral TTS can also perform cross-language voice control, such as generating English speech with a French accent, based on a short prompt.

In human evaluations of Voxtral, Mistral said its system matched or outperformed competing systems in terms of naturalness, exceeding lower-latency models from ElevenLabs while achieving parity with more advanced offerings in lifelike interaction.

The launch builds on Mistral’s earlier release of speech-to-text models and signals a broader push toward multimodal AI systems. 

Related Posts

NYSE Parent Company Finalizes Polymarket Investment, Totaling $1.6 Billion

In brief ICE has invested another $600 million into Polymarket,...

ECB Study Concludes DeFi DAOs Aren’t as Decentralized as They Claim

A new working paper from the European Central Bank...

Anthropic’s ‘Most Capable’ AI Model Claude Mythos Leaks, Deemed Major Cybersecurity Threat

In brief A leaked draft post revealed Anthropic’s most powerful...

BTC price drops to two-week low as $300 million in longs are liquidated

The crypto market tumbled to the lowest levels in...