As the development of modern technology, cross-culture communication become more frequent and code-mixing becomes a common phenomenon. People are getting used to mingle different languages into a single sentence, sometimes even intuitively.
The code-mixing phenomenon brings much challenges to the automatic speech recognition system development. How to develop a reliable multilingual speech recognition system have become a heated topic within the industry.
Recognition of the embedded language is the pressure point. This means researchers must deal with two problems. The first one comes from recognition of the matrix language accent in embedded language. The second one is how to balance the cost and effectiveness in developing the multilingual speech recognition model, especially when the embedded language comes from a data-scare language.
There are two main directions we may take into consideration in terms of acoustics modeling: applying multilingual datasets in training data deployment and applying transfer learning.
We will use an example of developing a Mandarin-English bilingual speech recognition system for real world music retrieval in this passage to expand the idea.
Multilingual datasets training data for acoustic modeling
In training data deployment, in addition to Chinese dataset and English dataset, Chinese-English code-mixing dataset is recommended to train the ASR model, as researches shows that compared with using monolingual Chinese and English training dataset, using Chinese-English speech data for phoneme clustering and model training, the error rate of the baseline model in Chinese-English mixed speech recognition is reduced by 37.93%.
Global phone set lexicon
Compared with building and training an acoustic model from 0, transfer learning can quickly achieve a favorable outcome without costing large amount of time and resources. Using transfer learning for reference, we can adopt a global phone set lexicon in building an adaptation acoustic model, reducing the amount of embedded language data as required for training while lowering word error rate.
For more data insight, contact our data experts (firstname.lastname@example.org).
The importance of data security has been increasingly realized, no matter it is in national or personal level. Always putting data security at the first priority, Magic Data designs and applies a strict data protection mechanism so as to provide sufficient trusted AI training data for the industry.
The demand for a quick, intelligent and natural-sounding conversation between human and machine is increasing.
The 3rd China Automotive Intelligent Summit 2021, took place on 27-28, Sept. 2021, Shanghai, gathers about 120 experts and executives from the automotive industry to focus on the networked technology, software development, hardware innovation, business model and user insight of intelligent cockpit, and provide an in-depth comprehensive analysis of the opportunities and challenges of intelligent cockpit development.
As AI research and development is moving forward both in depth and breadth, the needs for structured data grow explosively. Meanwhile, the data labeling industry is undergoing decentralization: production of structured data is shifting from large-scale third party data processing centers to scattered data end-users.
The annual INTERSPEECH, held between August 30, and September 3, 2021, is a global conference organized by International Speech Communication Association (ISCA). The INTERSPEECH 2021 is held in hybrid form, that is participants can join the conference virtually online and physically in Brno, The Czech Republic.