Open-source Dataset

Provide extensive training data for AI research and improve model performance quickly

Chinese Read Speech Recognition Corpus

Chinese, Read Speech, Quiet Indoors, Smartphone

MAGICDATA Mandarin Chinese Read Speech Corpus was developed by MAGIC DATA TECHNOLOGY Co., Ltd. and freely published for non-commercial use. The corpus consists of 755 hours of scripted read speech data by 1000 native speakers of the Mandarin Chinese spoken in mainland China.

Data Specification



Recording Environment

Quiet Indoors

Audio duration

755 Hours

Data Content

Common sentence in daily life

Speakers Intro

1000 native speakers of the Mandarin Chinese from different areas

File Format


Recording Equipment


Application Fields

Speech Recognition

Sensitive Items


Copyright Ownership

Magic Data

Samples Download

This work is licensed under the Creative Commons Attribution-Noncommercial-Prohibited Acting 4.0 International License Agreement.

Usage Instructions
The user shall observe the following rules when browsing the website and using the data:
1.    The user shall use the data in the following cases without payment, but shall attach "data referenced to Beijing Magic Data Co., Ltd., “”, 05/2019" and must not infringe the company's other rights to the data.
(1)    Use data for personal study and research;
(2)    Use this data to introduce, comment on a work or explain a problem;
(3)    Report on current affairs, inevitably citation of the data in newspapers, periodicals, radio stations, television stations, etc.
(4)    Use this data for school classroom teaching or scientific research, the use of this data for teaching or scientific research personnel;
(5)    State organizations to use published works to perform official duties within a reasonable range;
2.    When using licensed data ,the user shall not use it for commercial purposes, and the user shall not have the right to sell, transfer or publish;
3.    The user shall use the data as a whole, and may not modify the data content without permission, and may not convert the data format or perform secondary development.
4.    The company does not undertake to modify certain inconsistencies or defects that may exist in the data provided. The company is not responsible for any consequences caused by the use of the data itself.
5.    If the user has exceeded the above restrictions on use, the company will take legal measures to investigate the infringer's responsibility.

Contact information

Name *
Phone Number *
E-mail *
Company Name *
Address *

We will call you to confirm your information and provide the method to download.