Cohere is looking to capitalize on an enterprise trend of embedding automatic speech recognition into applications with a 2 billion parameter open source speech model.
Cohere Transcribe, introduced on Thursday, is trained on 14 languages, including Chinese, Japanese, Polish, French and Greek. Cohere released the model under the Apache 2.0 license and said the model outperforms alternatives on the Hugging Face Open ASR Leaderboard, including ElevenLabs Scribe and Qwen3. The model will soon be integrated into Cohere’s AI agent orchestration platform, North, according to the company.
Cohere Transcribe is an example of the evolution of speech recognition models. Previously, speech models were designed using deep learning techniques such as long short-term memory, recurrent neural networks, and later, transformer-based architectures, which struggled to achieve low latency because of model size.
New models such as Transcribe, however, are small enough to be deployed on edge devices. As the technology, infrastructure and capabilities have matured, ASR use cases have expanded, especially in customer service, banking, sales and marketing, which has led to an increase in ASR models from vendors such as IBM and Alibaba.
Even video conferencing company Zoom has joined in the competition. In 2025, the video conferencing platform provider introduced AI Companion 3.0, which included real-time voice translation capability. It later introduced a separate feature that allowed participants to hear exchanges in their own language.
“Speech is always going to be fundamental to AI,” said Lian Jye Su, an analyst at Omdia, a division of Informa TechTarget. “That’s how the whole AI movement started — because humans started to be able to interact with Siri.”
He pointed to a couple of Cohere Transcribe’s features as being noteworthy, including its small size and the company’s decision to make the model open source.
“When it’s open source, you get developers to test it and then they will come back to you if they find the result to be good enough,” Su said. “Then you can obviously commercialize a much better model.” Meta has found success with this business model, influencing others such as Alibaba and Nvidia to follow suit.
“Cohere is trying to copy that,” Su said. But the company is focused on an area where it excels — speech recognition and speech-to-text model, he added.
While Cohere has traditionally focused on text generation, it could find an opportunity within speech recognition, especially as some enterprises look to upgrade traditional speech models that use transformers to the growing line of small ASR models that can be used on edge devices, Su continued.

