Recently, nearly one thousand car companies attended the Shanghai Auto Show, where electrification and intelligentization were the standard equipment for many car companies. The intelligent functions of cars, such as intelligent cabins, self-driving and cloud service, depict the future of intelligent cars.
It is expected that by 2025, the global number of intelligent connected vehicle will approach 74 million, including 28 million in China. Intelligent vehicle industry ushered in a golden period of development.
To realize the human-computer interaction in vehicles, supportive computational power and related algorithms are essential in speech recognition, speech synthesis, and natural language processing. Algorithms, as well, can certainly benefit from massive, accurate, and matched data. As electricity is the dynamic source for electric cars, data is the dynamic source for AI.
In order to help with the implementation and optimization of intelligent cars, Magic Data Tech recently updated two open-source datasets available for cars in the MagicHub: In-vehicle Noise Datasets and the Mandarin Chinese Scripted Speech Corpus—In-Vehicle Scene.
In-Vehicle Noise Datasets
This open-source dataset consists of in-vehicle noise from multiple sources, which may include tire noise, engine noise, radio, human voice, etc. (Click here to download)
Mandarin Chinese Scripted Speech Corpus—In-Vehicle Scene
This open-source dataset consists of transcribed Mandarin Chinese scripted speech focusing on commands and queries in vehicle-related scenes, where 5,948 utterances contributed by ten speakers were contained. A noteworthy feature is that two microphones were set up while recording—one at the sun visor, another near the speaker’s mouth, on a front passenger seat. Synchronous dual voices, consequently, were recorded. (Click here to download)
MagicHub will continue to provide more standardized datasets of multiple dimensions and diversiform scenes for AI developers’ use.