When HiCar Meets CarPlay — What Enable a Performant In-Vehicle IVR System

Date : 2022-08-05 View : 1580

At this year's global WWDC22 Apple Developer Conference, Apple not only announced its core processor and other technologies, but also demonstrated the newly upgraded car linkage system CarPlay. Since then, CarPlay is no longer just a simple projection of the iOS system, but now iOS can fill the entire car screen. HE Xiaopeng, chairman and CEO of XPeng, also mentioned on the social media platform Weibo, that CarPlay is a very good solution in this generation of cars, and the next-generation of smart car solutions require more comprehensive full-stack self-development and ecological construction.

Also, in 2019, Huawei released HUWEI HiCar, which is similar to CarPlay, establishing a linked intelligent ecological chain of Huawei mobile phones, cars, and homes, and has cooperated with many car companies. Whether this year's CarPlay is tit-for-tat, with the sword aimed at Huawei, is still unknown. Whether it is CarPlay or HiCar, the dominant interaction is still voice interaction, ensuring the safety of the hand-held steering wheel. The in-vehicle intelligent system mainly includes five aspects: multimedia entertainment, vehicle control, intelligent navigation, driving behavior monitoring, and vehicle condition monitoring. Every aspect involved in speech is synthesized, and the recognition volume of speech is high technology. The challenges mainly include the influence of noise in specific vehicle scenes and high requirements for anthropomorphic synthetic speech.

Challenge

Vehicle Voice Scene Noise

Due to the driving process, there is a lot of uncertain noise. For example, various noises such as wind, wipers, gear shifting, and other driving sounds will affect the accuracy of speech recognition. The in-vehicle environment has high requirements for safety. Only by ensuring the accuracy of recognition can intelligent speech recognition not affect the driver's judgment. Generally speaking, in-vehicle voice interaction mainly emphasizes providing services for drivers. However, with the development of in-vehicle voice systems, in-vehicle voice assistants need to meet the different needs of drivers and passengers in the future. Drivers may have driving-related requirements, but passengers may have some entertainment needs. Since there are more than one person in the vehicle environment, the recognition of multiple people's voices, the automatic identification of the location of the inquiring object, that is, the sound source localization, and the correct answer are all problems that need to be solved urgently in the current vehicle voice system.

Synthetic Speech Personification Requirements

The synthetic output of speech is the link that most affects the user experience. The necessity of safety requirements during driving determines that the synthesized speech must be accurate, logically clear, and pronounced clearly. Only in this way can the driver or other passengers get positive feedback. Ensure safety during driving. For continuous conversations, speech recognition should accurately identify the user's intention, give concise and accurate answers, and not be too verbose, which will affect the driver's judgment, and cause 'road rage' and other factors that bring safety hazards. Therefore, continuous dialogue + extensive ecological content services + more emotional personal avatars will become an important evolution direction of the in-vehicle voice industry.

Solution

To improve the accuracy of speech recognition and the clarity of synthesized speech in vehicle speech scenarios, it is necessary to improve the robustness of speech recognition models and synthesis models. In the model training process, by adding more data that matches the actual landing vehicle scene, it is guaranteed that the model has seen more scene data, and the model's ability to recognize speech in various scenarios is improved. However, due to the in-vehicle environment, voice acquisition is very difficult. A professional data company team is required to provide data support. Magic Data has provided in-vehicle voice data in multiple languages and mixed languages for many automobile industry enterprises and voice interaction system R&D enterprises. The in-vehicle scene data includes multiple languages, multiple noise environments, and multi-device recordings, etc.

Singaporean English In-Vehicle Scripted Speech Corpus

In-Vehicle Noise Corpus

Voice interaction is the key technology to improve the in-vehicle user experience. A large amount of in-vehicle speech and sound data is the cornerstone for both HiCar and CarPlay.

For more information, visit www.magicdatatech.com/datasets.

Latest Press

Qingqing ZHANG: Conversation Data Promotes AIGC—Training Data of Large-Scale Models

"Training data is technology " .

That’s what OpenAI co-founder Ilya Sutskever said when taking interview with The Verge. ChatGPT amaze the world since its release. The stunning performance of GPT-4 makes us believe we have enter a new era in AI.

What makes large model so omniscient? In our opinion, the reason may lie in the data...

This article is a collection of Dr. Qingqing Zhang’s thoughts on data, large models and generative AI.

Integrating ASR with Text Summarizer, Secure Your Leading Position in Web Conferencing Market with Magic Data Multi-Person Spontaneous Meetings Dataset

Online meetings have become a frequently used tool for business and learning. How to meet the more diversifying online conferencing needs of users has brought great challenges to remote work applications, including captioning, real-time machine translation, smart meeting minutes and other artificial intelligence applications.

Open Dataset | Automobile Cabin Voice Interaction Data Solution

In recent years, with the development of artificial intelligence, chip technology, and new innovations in the automotive industry have been driven by the increase in smart car popularity. A smart car consists of three parts: The Internet of Vehicles, the smart cockpit, and the autonomous driving. The smart cockpit is equipped with intelligent and networked in-vehicle software, which can intelligently interact with people, roads, and vehicles. It is an important link and key node for the evolution of the human-vehicle relationship from a tool to a partner.

The Future of Virtual Companionship

Nowadays, more and more young people are buying chat services on e-commerce platforms to accompany them virtually and confiding in “chat buddy” to communicate and express their feelings. Prices for various degrees of companionship range from tens of yuan to the customized "virtual lover" for thousands of yuan. In recent years, virtual companionship services have become a fashionable self-healing way for young people to seek spiritual comfort and express their voices on the Internet. There are many stores on Taobao that provide this service, such as "gentle and cute little sweetheart", "overbearing dictatorial president fan", as long as you pay, you can find your favorite "buddy".

Will Humans Be Replaced by AI?

AI-generated art has experienced rapid growth in both popularity and accessibility over the past few months. With engines like DALL-E, Midjourney, and Stable Diffusion spurring an influx of AI-generated artwork on online platforms.

News

When HiCar Meets CarPlay — What Enable a Performant In-Vehicle IVR System

Get Started?