Dr. Qingqing Zhang: What is Conversational AI? And How to Build Code-Switching Acoustic Models?

Date : 2022-11-04 View : 1468

Transcript

Hey everyone, I’m Dr. Zhang Qingqing, and I’m the Founder and CEO of Magic Data Technology. Today I’d like to talk about Multilingual Conversational AI Technology, and the construction of corresponding corpus.

Mainly, for today’s agenda, I’ll introduce Multilingual Conversational AI, Conversational Datasets for training the AI systems, as well as our opensource community MagicHub.

First of all, one of the major trends of the AI industry currently, is the widespread applications of Conversational AI. And we summarized them into 5 main categories: The in-car conversational systems, the communicational Smart Homes, AI for conversational Financial services, for the Healthcare industry, and for Social Networks.

What is Conversational AI? It mainly includes 3 sections: Automatic Speech recognition (ASR) to take audio speech as input, with a series of signal processing techniques such as feature extraction, acoustic model and language model construction, possibly with the help of a dictionary to transcribe the speech into text. Then we use Natural Language Understanding to process the text, and finally we synthesize the generated text into audio speech. This why the robot can have conversations with us.

Of course, our machines are still not smart enough to communicate with people, and Conversational speech interaction is still a major challenge to speech recognition. This is due to the lack of natural conversational speech features in traditional read speech data. Such features include: Interruptions, Overlapping voices, Inversions and Hesitations.

Also, code-switching and code-mixing are common due to globalization. This means in our daily life, in business settings, and in terms of the vertical technological communications, we may use more than one language in the conversation. For example, we might say: 昨天NBA看了吗？Which means “Did you watch the NBA match yesterday?”

These are all common, increasing and difficult problems for ASR systems.

More specifically, to build a multilingual ASR system, the recognition of the embedded non-native language and pronunciation is the key here.

So here we come to ask: how to improve the performance of bilingual ASR systems in terms of modeling acoustic information. And in our opinion, first and foremost, we need to model lexicon and acoustic information of mixed languages.

To build a code-mixing Lexicon, mainly we need to integrate the original and the embedded language into one Lexicon. Normally, a scheme for solving the issue can be building lexicons in corresponding phonemes respectively, then combine the lexicons together. In this way, we can make use of the lexicon at hand, yet this method is also computational expensive for the acoustic models.

On the other hand, Magic Data would recommend to build a united lexicon by mapping phonemes of different languages to the lexicon. Then to build the lexicon on united phonemes. The advantage of this method is that transfer rate becomes more manageable, although expert knowledge or statistical analysis is needed for building the united phonemes.

For instance, here is a result of our models trained on our collected Datasets. As you can see, compared to the base dictionary, by combining our CMU data, on a 38.3 hours test set, the WER is 12.64%, which is significantly lower than the original WER of 30.71%.

We have also researched on multilingual transfer learning techniques. For example, trained on the 37, 352-hour Chinese-based Model, WER achieved 78.97%, yet training on the 898.7-hour Shanghai dialect Adaptive Model, WER has dropped to 19.10%. Significant WER improvement can also be seen between the English-based Model and the French Adaptive Model as well, from 95.10% down to 28.07%.

We believe multilingual conversation AI is the trend of the industry, although there are many difficulties. So here we call for more AI developers to join our research, to reconstruct “The Towel of Babel”.

Latest Press

Qingqing ZHANG: Conversation Data Promotes AIGC—Training Data of Large-Scale Models

"Training data is technology " .

That’s what OpenAI co-founder Ilya Sutskever said when taking interview with The Verge. ChatGPT amaze the world since its release. The stunning performance of GPT-4 makes us believe we have enter a new era in AI.

What makes large model so omniscient? In our opinion, the reason may lie in the data...

This article is a collection of Dr. Qingqing Zhang’s thoughts on data, large models and generative AI.

Integrating ASR with Text Summarizer, Secure Your Leading Position in Web Conferencing Market with Magic Data Multi-Person Spontaneous Meetings Dataset

Online meetings have become a frequently used tool for business and learning. How to meet the more diversifying online conferencing needs of users has brought great challenges to remote work applications, including captioning, real-time machine translation, smart meeting minutes and other artificial intelligence applications.

Open Dataset | Automobile Cabin Voice Interaction Data Solution

In recent years, with the development of artificial intelligence, chip technology, and new innovations in the automotive industry have been driven by the increase in smart car popularity. A smart car consists of three parts: The Internet of Vehicles, the smart cockpit, and the autonomous driving. The smart cockpit is equipped with intelligent and networked in-vehicle software, which can intelligently interact with people, roads, and vehicles. It is an important link and key node for the evolution of the human-vehicle relationship from a tool to a partner.

The Future of Virtual Companionship

Nowadays, more and more young people are buying chat services on e-commerce platforms to accompany them virtually and confiding in “chat buddy” to communicate and express their feelings. Prices for various degrees of companionship range from tens of yuan to the customized "virtual lover" for thousands of yuan. In recent years, virtual companionship services have become a fashionable self-healing way for young people to seek spiritual comfort and express their voices on the Internet. There are many stores on Taobao that provide this service, such as "gentle and cute little sweetheart", "overbearing dictatorial president fan", as long as you pay, you can find your favorite "buddy".

Will Humans Be Replaced by AI?

AI-generated art has experienced rapid growth in both popularity and accessibility over the past few months. With engines like DALL-E, Midjourney, and Stable Diffusion spurring an influx of AI-generated artwork on online platforms.

News

Dr. Qingqing Zhang: What is Conversational AI? And How to Build Code-Switching Acoustic Models?

Get Started?