See What's NEW

About Us

Better data, stronger AI


Magic Data

Magic Data provides high quality training datasets for ML to enterprises and academic institutions engaged in artificial intelligence R&D and application research to voice recognition (ASR), speech synthesis (TTS), natural language processing (NLP), and computer vision (CV).

Magic Data has been dedicated to build conversational and read speech training datasets for ML, which accumulated over 200,000 hours for ASR model, serving top AI companies and Fortune 500 companies around the world, including Microsoft, Nvidia, Qualcomm, Nuance, Cerence, Alibaba Group, Baidu, Tencent, with datasets in dozens of languages, involving HMI, customer service, virtual assistant, machine translation, and many other AI scenarios.

Magic Data is ISO/IEC 27001 & ISO/IEC 27701:2019 accredited and GDPR compliant.

Magic Data Leadership

Dr. ZHANG Qingqing

Founder & CEO

· Former Associate Researcher at IOA, CAS

· Postdoctoral researcher at LIMSI-CNRS

· Fortune “The Most Powerful Women 2021”

· CYZone “Top Female Founder 2021”

· CAS Outstanding Scientific and Technological Achievement Award

· Member of Committee of Acoustics/Automobile/Female Worker/Standardization of CCF

Dr. ZHANG Qingqing
Partner, Sales VP-img


Partner, Sales VP

Data Scientist-img


Data Scientist

CFO & CLO-img

Kenneth PANG


Embrace limitless opportunity

Awards & Recognition

honor-img honor-img honor-img honor-img

Press Room

Press Room

When HiCar Meets CarPlay — What Enable a Performant In-Vehicle IVR System

At this year's global WWDC22 Apple Developer Conference, Apple not only announced its core processor and other technologies, but also demonstrated the newly upgraded car linkage system CarPlay. Since then, CarPlay is no longer just a simple projection of the iOS system, but now iOS can fill the entire car screen. HE Xiaopeng, chairman and CEO of XPeng, also mentioned on the social media platform Weibo, that CarPlay is a very good solution in this generation of cars, and the next-generation of smart car solutions require more comprehensive full-stack self-development and ecological construction.

Data — the Silver Bullet to Code-Switch Speech Recognition

With the development of the Internet and globalization, people's daily language communication is often mixed with other languages, such as: "我的IPAD不能下载APP了,可以陪我去APPLE store修理一下吗? (My IPAD can't download this APP, can you accompany me to the APPLE store to repair it?)” , “明天就是deadline了,我的paper还没有ready。 (Tomorrow is the deadline, my paper is not ready.)”, "老板的schedule需要调整,麻烦你check一下你得email。(The boss's schedule needs to be adjusted, please check your email.)” ...

MagicData R&D Center New Findings —Conversational datasets showed BETTER performance

  • Compared with the open source speech data - MagicData with manual annotation and expert proofreading, under the same model, the word accuracy rate is increased by 33%
  • The MagicData speech corpus has various styles, and the recording environment is closer to the real scene. It can train a more robust speech recognition model.

How Can Traditional Industries Benefit from AI Transformation

Whether it is the personalized recommendation of short videos, or the optimal route design for takeaway delivery, or the face recognition during payment, AI technology represented by algorithms has been applied in full swing in the consumer Internet industry.

Does Speech Recognition Applications Perform Better than the Human Ear?

In recent years, with the development of artificial intelligence technology, the performance of speech recognition application has been significantly improved. Many companies claim that the accuracy rate of speech recognition technology has reached more than 98%. Has the performance of speech recognition exceeded the human ear? There is something more need to be discussed before we making the final conclusion.

Baseline & Training Datasets Are Open Now | ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge (CSSD)

As of its launch on July 4, 2022, ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge has received more than 40 registration. On July 24, the committee releases the baseline and training datasets for all participants.

Get Started?

Contact Us

Talk to Magic Data