How to Improve Multilingual Speech Recognition Performance? In View of Acoustic Modeling

Date : 2021-10-27 View : 2105

As the development of modern technology, cross-culture communication become more frequent and code-mixing becomes a common phenomenon. People are getting used to mingle different languages into a single sentence, sometimes even intuitively.

The code-mixing phenomenon brings much challenges to the automatic speech recognition system development. How to develop a reliable multilingual speech recognition system have become a heated topic within the industry.

Recognition of the embedded language is the pressure point. This means researchers must deal with two problems. The first one comes from recognition of the matrix language accent in embedded language. The second one is how to balance the cost and effectiveness in developing the multilingual speech recognition model, especially when the embedded language comes from a data-scare language.

There are two main directions we may take into consideration in terms of acoustics modeling: applying multilingual datasets in training data deployment and applying transfer learning.

We will use an example of developing a Mandarin-English bilingual speech recognition system for real world music retrieval in this passage to expand the idea.

Multilingual datasets training data for acoustic modeling

In training data deployment, in addition to Chinese dataset and English dataset, Chinese-English code-mixing dataset is recommended to train the ASR model, as researches shows that compared with using monolingual Chinese and English training dataset, using Chinese-English speech data for phoneme clustering and model training, the error rate of the baseline model in Chinese-English mixed speech recognition is reduced by 37.93%.

Global phone set lexicon

Compared with building and training an acoustic model from 0, transfer learning can quickly achieve a favorable outcome without costing large amount of time and resources. Using transfer learning for reference, we can adopt a global phone set lexicon in building an adaptation acoustic model, reducing the amount of embedded language data as required for training while lowering word error rate.

For more data insight, contact our data experts (business@magicdatatech.com).

Latest Press

Qingqing ZHANG: Conversation Data Promotes AIGC—Training Data of Large-Scale Models

"Training data is technology " .

That’s what OpenAI co-founder Ilya Sutskever said when taking interview with The Verge. ChatGPT amaze the world since its release. The stunning performance of GPT-4 makes us believe we have enter a new era in AI.

What makes large model so omniscient? In our opinion, the reason may lie in the data...

This article is a collection of Dr. Qingqing Zhang’s thoughts on data, large models and generative AI.

Integrating ASR with Text Summarizer, Secure Your Leading Position in Web Conferencing Market with Magic Data Multi-Person Spontaneous Meetings Dataset

Online meetings have become a frequently used tool for business and learning. How to meet the more diversifying online conferencing needs of users has brought great challenges to remote work applications, including captioning, real-time machine translation, smart meeting minutes and other artificial intelligence applications.

Open Dataset | Automobile Cabin Voice Interaction Data Solution

In recent years, with the development of artificial intelligence, chip technology, and new innovations in the automotive industry have been driven by the increase in smart car popularity. A smart car consists of three parts: The Internet of Vehicles, the smart cockpit, and the autonomous driving. The smart cockpit is equipped with intelligent and networked in-vehicle software, which can intelligently interact with people, roads, and vehicles. It is an important link and key node for the evolution of the human-vehicle relationship from a tool to a partner.

The Future of Virtual Companionship

Nowadays, more and more young people are buying chat services on e-commerce platforms to accompany them virtually and confiding in “chat buddy” to communicate and express their feelings. Prices for various degrees of companionship range from tens of yuan to the customized "virtual lover" for thousands of yuan. In recent years, virtual companionship services have become a fashionable self-healing way for young people to seek spiritual comfort and express their voices on the Internet. There are many stores on Taobao that provide this service, such as "gentle and cute little sweetheart", "overbearing dictatorial president fan", as long as you pay, you can find your favorite "buddy".

Will Humans Be Replaced by AI?

AI-generated art has experienced rapid growth in both popularity and accessibility over the past few months. With engines like DALL-E, Midjourney, and Stable Diffusion spurring an influx of AI-generated artwork on online platforms.

News

How to Improve Multilingual Speech Recognition Performance? In View of Acoustic Modeling

Get Started?