HomeAI Core TechnologiesSoniox Speech AI Achieves Extreme Accuracy

Soniox Speech AI Achieves Extreme Accuracy

Soniox Inc has released groundbreaking speech recognition AI, achieving extreme levels of accuracy and unlocking new possibilities in human-machine interaction.

Summary:

  • Soniox launched new foundational AI models for speech recognition, achieving extremely high accuracy rates.
  • Soniox’s new AI models often surpass human performance, delivering more accurate speech recognition and generating properly formatted text.
  • Soniox’s speech recognition AI consistently outperforms OpenAI, Google, and other providers, with accuracy improvements from 24% to 78%, making it a game-changer for voice and speech applications.
  • Soniox also released the Soniox mobile app and Soniox Playground, allowing you to experience the new era of voice AI firsthand.

Engineering Breakthrough:

Foundational AI breakthroughs are challenging to achieve in a startup environment due to the costs and complexity associated with processing and training large models on internet-scale data. However, Soniox did not shy away from the challenge and built a ground-up infrastructure to efficiently process and train large models on massive amounts of audio and text.

Specifically, Soniox processed over 1 million hours of audio data for training. The entire training process was completed on a single A100 server (8xA100 GPUs) in less than 4 weeks! This achievement in engineering innovation alone saved millions of dollars in processing and training costs.

Novel AI Models:

Achieving high accuracy with low-latency constraints is one of the most challenging problems in AI today. Why? The AI model has to constantly make decisions (e.g., output words) in real time while dealing with a high level of uncertainty and missing information. This challenge is not limited to speech recognition but extends to robotics, which faces similar issues.

To effectively solve this problem, Soniox had to design new and more efficient neural network architectures and develop new criterions that inherently prioritize low-latency decision-making while still considering accuracy. Although Soniox has been training these models for the past year, the improvements were incremental until the breakthrough moment about 6 months ago.

Path Towards Human-Parity:

In the last year, there have been releases of speech recognition models from Google, Meta, and other companies that support one thousand or more languages. “What all of these approaches fail to address is accuracy. Speech recognition is all about accuracy, period,” said Klemen Simonic, Founder and CEO of Soniox. “Achieving human-parity or superhuman accuracy is of paramount importance, rather than settling for a solution with a misrecognition rate of 20% or higher, which proves to be useless for most applications.”

Soniox is introducing highly accurate models for nine languages, starting with English and Korean. For many of these languages, this will mark the first introduction of highly accurate speech recognition AI. Soniox is looking forward to collaborations with various companies worldwide, and believes this could represent a breakthrough moment for numerous voice and speech applications.

The benchmark reports are available here: https://soniox.com/benchmarks

Why Does This Matter?

If you are in the call automation business, accurately recognizing every single word during phone calls is of paramount importance; otherwise, automation quickly fails, and the experience deteriorates.

If you are involved in creating documents from audio, such as in the medical and legal industries, then accurately recognizing domain-specific words and properly formatting the text is crucial for making transcriptionists more efficient and saving costs.

Additionally, there is a rising trend in human-machine voice interaction. LLMs work with text-based communication. The next step is voice communication with LLMs, which requires an extremely high level of speech recognition accuracy and super low-latency responses. This has been the missing component, and it is what Soniox brings to the table with their new AI models.