See What's NEW

About Us

Better data, stronger AI


Magic Data

Magic Data provides high quality training datasets for ML to enterprises and academic institutions engaged in artificial intelligence R&D and application research to voice recognition (ASR), speech synthesis (TTS), natural language processing (NLP), and computer vision (CV).

Magic Data has been dedicated to building conversational and read speech training datasets for ML, which has accumulated over 200,000 hours for ASR model, serving top AI companies and Fortune 500 companies around the world, including Microsoft, Nvidia, Qualcomm, Nuance, Cerence, Alibaba Group, Baidu, and Tencent, with datasets in dozens of languages, involving HMI, customer service, virtual assistant, machine translation, and many other AI scenarios.

Magic Data is ISO/IEC 27001 & ISO/IEC 27701:2019 accredited and GDPR compliant.

Magic Data Leadership

Dr. ZHANG Qingqing

Founder & CEO

· Former Associate Researcher at IOA, CAS

· Postdoctoral researcher at LIMSI-CNRS

· Fortune “The Most Powerful Women 2021”

· CYZone “Top Female Founder 2021”

· CAS Outstanding Scientific and Technological Achievement Award

· Member of Committee of Acoustics/Automobile/Female Worker/Standardization of CCF

Dr. ZHANG Qingqing

Embrace limitless opportunity

Awards & Recognition

honor-img honor-img honor-img honor-img

Press Room

Press Room

Qingqing ZHANG: Conversation Data Promotes AIGC—Training Data of Large-Scale Models

"Training data is technology " .

That’s what OpenAI co-founder Ilya Sutskever said when taking interview with The Verge. ChatGPT amaze the world since its release. The stunning performance of GPT-4 makes us believe we have enter a new era in AI.

What makes large model so omniscient? In our opinion, the reason may lie in the data...

This article is a collection of Dr. Qingqing Zhang’s thoughts on data, large models and generative AI.

Baseline & Training Datasets Are Open Now | ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge (CSSD)

As of its launch on July 4, 2022, ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge has received more than 40 registration. On July 24, the committee releases the baseline and training datasets for all participants.

Integrating ASR with Text Summarizer, Secure Your Leading Position in Web Conferencing Market with Magic Data Multi-Person Spontaneous Meetings Dataset

Online meetings have become a frequently used tool for business and learning. How to meet the more diversifying online conferencing needs of users has brought great challenges to remote work applications, including captioning, real-time machine translation, smart meeting minutes and other artificial intelligence applications.

Open Dataset | Automobile Cabin Voice Interaction Data Solution

In recent years, with the development of artificial intelligence, chip technology, and new innovations in the automotive industry have been driven by the increase in smart car popularity. A smart car consists of three parts: The Internet of Vehicles, the smart cockpit, and the autonomous driving. The smart cockpit is equipped with intelligent and networked in-vehicle software, which can intelligently interact with people, roads, and vehicles. It is an important link and key node for the evolution of the human-vehicle relationship from a tool to a partner.

The Future of Virtual Companionship

Nowadays, more and more young people are buying chat services on e-commerce platforms to accompany them virtually and confiding in “chat buddy” to communicate and express their feelings. Prices for various degrees of companionship range from tens of yuan to the customized "virtual lover" for thousands of yuan. In recent years, virtual companionship services have become a fashionable self-healing way for young people to seek spiritual comfort and express their voices on the Internet. There are many stores on Taobao that provide this service, such as "gentle and cute little sweetheart", "overbearing dictatorial president fan", as long as you pay, you can find your favorite "buddy".

Will Humans Be Replaced by AI?

AI-generated art has experienced rapid growth in both popularity and accessibility over the past few months. With engines like DALL-E, Midjourney, and Stable Diffusion spurring an influx of AI-generated artwork on online platforms.

Get Started?

Contact Us

Talk to Magic Data