Mistral drops new speech-to-text AI models

French AI startup Mistral has released a pair of new speech-to-text models that aim to set fresh benchmarks for speed, privacy and affordability.

The Paris-based vendor earlier this month unveiled Voxtral Mini Transcribe V2 and Voxtral Realtime, both offered under the Voxtral Transcribe 2 umbrella.

According to Mistral, the models constitute a major step forward, offering “state-of-the-art transcription quality, diarization and ultra-low latency”.

The company has high hopes that the tools will prove popular with enterprise customers, with the number of potential applications growing all the time – ranging from virtual assistants to call center automation, and broadcast subtitling to compliance documentation.

Each model has been created for different applications.

Realtime, as the name suggests, is designed to process live audio, delivering transcriptions with negligible delays that can be configured to as few as 200 milliseconds. This functionality is delivered with what’s described as a “novel streaming architecture,” giving it an advantage over approaches that adapt an offline model and process audio in chunks.

Related:AI Startup Runway Raises $315M, Pivots to World Models

The delay can be configured as required. At 2.4 seconds, the tool is said to be ideal for subtitling, while at 480 milliseconds, the error rate is so low –around 1-2%, which is close to offline accuracy – that it can be used for voice agents, according to Mistral.

The model is also natively multilingual. It works in 13 languages (English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian and Dutch), while with only four billion parameters, it can run on local devices such as phones or laptops. This is particularly important for deployments in which privacy and security are important.

Realtime is available under the Apache 2.0 open-source license on the Hugging Face Hub, or via API at $0.006 per minute.

Mini Transcribe 2, meanwhile, handles batch transcriptions of pre-recorded audio files, offering an array of features that includes comprehensive speaker diarization (with labels and start/end times), context biasing for dedicated topics and domains, and timestamps for specific words. Recording of up to three hours can be processed in a single request, and the same 13 languages are supported as in Realtime.

What Mistral says that what really makes Mini Transcribe 2 stand out, though, is its affordability, with its 4% word error rate on the FLEURS transcription benchmark and $0.003/minute cost claimed to offer the best price-performance of any transcription API.

Related:AI Lawsuits in 2026: Settlements, Licensing Deals, Litigation

The company is inviting potential customers to try it out in the new Audio Playground in its Mistral Studio, or on its Le Chat assistant.

The release marks another step forward for Mistral, which has emerged as the leading European player in the burgeoning AI landscape, having secured $2 billion in new funding last year.

Menu

Categories:

Hot right now:

Follow on:

Menu

Categories:

Hot right now:

Follow on:

Mistral drops new speech-to-text AI models

Share This Post

Related Posts

Bitcoin rises to $74,000 as traders call Trump’s bluff on Iran

Key levels to watch as the rally gathers steam

Monzo goes live in Ireland

STRC trading surge drives record volume and signals largest bitcoin purchase since launch

Lib Dems Urge FCA Probe into Farage Over Stack BTC Bitcoin Promotion

Bitcoin Bears See $50K Before Any Recovery

Categories:

Hot right now:

Company:

Follow on: