Mistral AI Launches Text-to-Speech Model

Share This Post

Mistral AI is expanding its Voxtral model family with its first text-to-speech model.

The launch comes amid intensifying competition in the fast-growing AI voice market, with Voxtral TTS pitched as an alternative to models from competitors including OpenAI and ElevenLabs.

The Paris-based startup unveiled its new system on Thursday. The 4 billion parameter model is designed for enterprise deployment across voice assistants, customer support and sales engagement tools. 

Unlike many rival offerings, Voxtral TTS has been released with open weights, allowing organizations to run the model on their own infrastructure rather than relying on third-party APIs.

The model supports nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.

Mistral said the model is lightweight enough to operate on consumer hardware, including laptops, smartphones and edge devices, while maintaining what it describes as “frontier-quality” performance. The company positions this as a key differentiator for enterprises seeking greater control over data, cost and customization.

Related:Cohere Unveils Open Source Speech Model for Edge Devices

 

Another key feature, Mistral said, is voice adaptability. The model can replicate a speaker’s voice using just a few seconds of reference audio, capturing not only tone but also accent, intonation and emotion.

“Our model excels at both contextual understanding and speaker modeling: capturing how a specific person naturally speaks,” Mistral wrote in a blog post. “With its compact size, low cost and latency and easy adaptability, Voxtral TTS gives full control and customization for enterprises looking to own their voice AI stack.”

Voxtral TTS can also perform cross-language voice control, such as generating English speech with a French accent, based on a short prompt.

In human evaluations of Voxtral, Mistral said its system matched or outperformed competing systems in terms of naturalness, exceeding lower-latency models from ElevenLabs while achieving parity with more advanced offerings in lifelike interaction.

The launch builds on Mistral’s earlier release of speech-to-text models and signals a broader push toward multimodal AI systems. 

Related Posts

DTCC Picks Chainlink As Data Layer For 24/7 Tokenized Collateral Platform

The Collateral AppChain will use the Chainlink Runtime Environment...

A powerful crypto indicator just flipped green as bitcoin tests $82,000

Cryptoquant’s bitcoin bull-bear cycle indicator turned green for the...

Senate Confirms Kevin Warsh as Fed Governor, with Chair Vote Expected

The US Senate has approved Kevin Warsh as the...

JPMorgan (JPM) to launch new tokenized fund as Wall Street tokenization race heats up

JPMorgan (JPM) is preparing to launch a tokenized money...

OpenAI Launches Daybreak, a New Initiative to Challenge Glasswing

As more AI vendors seek to control how their...