Verbit is the AI-powered transcription and captioning platform that transcribes speech regardless of accent, domain-specific languages, background noises, echoes, and more. To ensure the accuracy and quality of transcriptions/captions, Verbit has a network of 22,000 human transcribers that validate the technology-produced dictations. AlleyWatch caught up with Tom Livne to learn more about Verbit’s technology and how this funding will help position it to be a leader in the global transcription market, which is more than $ 300B. Verbit has raised a total of $ 125M across four rounds and this latest round comes from investors that include Sapphire Ventures led our Series C round, Vertex Ventures, Stripes, HV Ventures, ClalTech, and Vertex Growth.
There are various ways to control the technology surrounding us. Most prominent right now is either touch or a remote. However, voice commands are also slowly gaining prevalence as it is an easy and intuitive way to get answers for your query, ask smart devices to perform a task and more. The tech, however, is still being perfected and to advance it a bit further, Google has built a new light-weight system called VoiceFilter Lite.
VoiceFilter-Lite: Small size, big improvements
Google uses voice recognition in numerous applications such as Google Maps, Assistant, Translate and others. These apps use the VoiceFilter system, which was released back in 2018. This system is touted to be notable in achieving a better source to distortion ratio (SDR), which helps recognise voice efficiently, over conventional approaches. While the model works, it works on the cloud as it is restricted by a device’s hardware limitations.
Google figured it would be faster and efficient if this system could work on a device, even offline. Hence, VoiceFilter Lite was created. The lighter system is aimed at on-device usage to significantly improve speech recognition in overlapping speech. It does so by recognising the enrolled voice of a selected speaker, which is basically a voice match of a registered user. The new system should be able to identify and work with a user’s voice even in ‘extremely’ noisy conditions, even when internet connection is unavailable.
A 2.2MB VoiceFilter-Lite model was tested by Google, which claims it was able to deliver 25.1% improvement to the word error rate (WER) on overlapping speech. WER is a ratio to measure the number of words a model is capable of recognising from a reference statement.
More languages to be supported soon
With the new VoiceFilter-Lite model, apps could get considerably better at picking up your voice commands. Additionally, it should also help with faster query processing since it works offline.
While the new VoiceFilter-Lite seems to be delivering promising results, it currently works only with the English language. Google will be working on adopting the model to work with other languages as well. Furthermore, Google will try to directly optimise the speech recognition loss that happens during training of VoiceFilter-Lite. This could potentially help further improve speech recognition beyond overlapping speech.
Image credits: Google