Deepgram Launches Nova-3, Enhancing Speech-to-Text Accuracy for Enterprise AI
SAN FRANCISCO, Feb. 12, 2025 -- Deepgram, a leading voice AI platform for enterprise use cases, today announced the launch of Nova-3, its most advanced speech-to-text (STT) model to date. Nova-3 pushes the boundaries of AI-driven transcription, offering unmatched accuracy in challenging audio environments while offering flexible, self-service customization to tailor results for industry-specific needs.
Trusted by industry leaders like Twilio, Jack in the Box, and Kore.ai, Deepgram’s infrastructure also includes powerful text-to-speech (TTS) and full speech-to-speech (STS) capabilities, offering a comprehensive suite of cloud or self-hosted APIs for seamless voice AI integration.
Its full-featured platform and high-performance runtime include powerful automation and data capabilities—such as synthetic data generation and model curation—along with model hot-swapping and robust integrations, empowering developers to efficiently build and scale voice-enabled applications. With over 450 enterprise customers, Deepgram is powering the fast-growing enterprise voice AI market.
Nova-3 Expands Voice AI for a Broader Range of Enterprise Use Cases
Leapfrogging the success of its predecessor, Nova-3 is engineered for real-time use cases, delivering unparalleled accuracy and performance in dynamic environments where traditional solutions often fall short. Unlike generalized models that lack domain-specific precision, Nova-3 leverages an advanced latent space architecture to encode complex speech patterns into a highly efficient representation. This enables superior transcription accuracy, even in noisy or specialized settings, driving improved productivity, customer satisfaction, and cost efficiency. With its expanded capabilities, Nova-3 now delivers enhanced accuracy for real-world enterprise challenges such as:
- Adverse acoustic conditions – Accurately transcribes speech in distant, noisy, and multi-speaker scenarios, making it ideal for air traffic control, drive-thrus, and call centers.
- Real-time multilingual support – Enables real-time transcription across multiple languages—the first model of its kind to do so—making it ideal for emergency response, global customer service, and multilingual operations.
- Industry-specific accuracy – Recognizes domain-specific terminology for specialized fields like medical and legal transcription.
- Precision data handling – Ensures accurate numeric recognition for retail, banking, and finance while supporting real-time redaction of sensitive information for compliance and data privacy.
"Nova-3 represents a significant leap forward, extending the frontier of real-time accuracy while once again bending the cost curve—two critical components for enterprise speech-to-speech use cases," said Scott Stephenson, CEO of Deepgram. "By integrating advanced architectural enhancements and extensive training across diverse datasets, we've developed a model that not only meets but exceeds the evolving needs of our clients across various industries."
“Our mission at Kore.ai is to enable organizations to deliver engaging experiences and derive value through advanced AI,” said Peter Wulfraat, Chief Revenue Officer of Kore.ai. “By partnering with Deepgram, we’re helping enterprise call centers transition from outdated systems to modern, AI-driven solutions like AI for Service. A prime example is working with a Fortune 500 healthcare company to replace their legacy IVR with an AI voice agent leveraging Deepgram’s speech-to-text (STT) and text-to-speech (TTS) APIs.”
"Nova-3 created the ability to ostensibly fine-tune the ASR model through just-in-time configuration," said Bill French, Senior AI Engineer at Stream It. "No training, testing, or added costs are required to instrument Nova-3 with critical awareness of domain-specific terms. This is the right architecture for solutions that must embrace unique terminology in voice applications."
"Deepgram's Nova-3 model is a remarkable leap forward in data extraction," said Matt Baker, VP of Engineering at Gladly. "The enhanced contextual information makes transcriptions more actionable, turning data into valuable insights."
Personalize Voice AI with Self-Service Customization
Nova-3 is the industry’s first voice AI model to enable self-serve customization, allowing users to fine-tune the model for specialized domains without requiring deep expertise in machine learning. Many conventional models require expensive and time-consuming expert-led customization, delaying deployment and increasing costs. With the addition of Keyterm Prompting, developers can instantly improve transcription accuracy by optimizing up to 100 key terms without waiting for extensive model retraining or customization cycles. This flexibility accelerates deployment, enhances accuracy, and reduces costs—allowing businesses to rapidly unlock value from their voice AI solutions.
"We saw a massive jump in accuracy with Nova-3," said Brendan Chan, CTO of Talkatoo. "Previous models recognized only 10% of critical veterinary terms, but with Nova-3 and Keyterm prompting, we're seeing a 625% improvement in key term recognition. The performance is a lot better, and we're excited to get this out to our users."
Benchmarking Excellence: Deepgram Extends Its Lead
Nova-3 continues to set a new standard for transcription accuracy, significantly widening the gap between itself and competing voice AI providers. Nova-3 outperforms competitors in both batch and streaming use cases, with consistently lower Word Error Rates (WER) that drive superior performance in real-world audio environments, including multilingual scenarios.
Batch WER Comparison
Nova-3 achieves a WER of 5.26%, extending its lead over the next-best competitor by 47.4% (10% WER). This reduced error rate translates to more accurate transcriptions for industries that require high precision, such as healthcare, legal, and finance.
Streaming WER Comparison
In streaming WER, Nova-3 leads with a WER of 6.84%, extending its advantage over the next-best competitor by 54.2% (14.92% WER). This improved accuracy ensures real-time, reliable transcription for applications such as call centers and virtual assistants, enhancing overall customer experience.
Multilingual Performance
In multilingual testing, Nova-3 outperforms OpenAI’s Whisper across seven languages, delivering up to 8:1 preference ratios in some languages. Nova-3’s advanced real-time multilingual conversation transcription empowers enterprises to scale globally, delivering reliable, accurate results across multiple languages and enhancing international customer engagement.
These benchmark results underscore Deepgram’s continued lead in transcription accuracy, driving superior outcomes for businesses that rely on speech-to-text and voice AI technologies.
Nova-3 Marks a Major Advancement
Nova-3 represents a breakthrough in AI-driven speech-to-text technology, cementing Deepgram's position at the forefront of voice AI innovation and empowering businesses and developers to build the next generation of enterprise voice AI applications. Deepgram’s focus on continuous model and platform improvements ensures users always have access to the latest advancements, maximizing long-term value. Built with low customer COGS, the platform offers cost-efficiency and seamless updates, helping businesses stay competitive and future-proofed as they scale.
For more information about Nova-3 and Deepgram's suite of voice AI infrastructure, visit www.deepgram.com.
About Deepgram
Deepgram is the leading voice AI platform for enterprise use cases, offering speech-to-text (STT), text-to-speech (TTS), and full speech-to-speech (STS) capabilities. 200,000+ developers build with Deepgram’s voice-native foundational models – accessed through cloud APIs or as self-hosted / on-premises APIs – due to our unmatched accuracy, low latency, and pricing. Customers include technology ISVs building voice products or platforms, co-sell partners working with large enterprises, and enterprises solving internal use cases. Having processed over 50,000 years of audio and transcribed over 1 trillion words, there is no organization in the world that understands voice better than Deepgram.
Source: Deepgram