Voice Datasets
Artificial intelligence requires huge volume of data to be trained. For some AI companies and researchers, data can be difficult and time consuming to collect. Open-source data can help mitigate these challenges and boost the development of AI.
Voice Datasets
Japanese Read Speech Recognition Corpus

Japanese Read Speech Recognition Corpus was developed by MAGICDATA TECHNOLOGY Co., Ltd. with a significant volume of 1500 hours. A subset of 30-hour scripted read speech data was developed and freely published for non-commercial use. 37 native speakers are from different areas, including Tokyo, Osaka, Hokkaido, etc. The corpus is a test set, recorded indoors and the output is PCM formatted. The recording texts are from daily conversation.

Data Specification
Language : Japanese
Recording Environment : Quiet Indoors
Audio Duration : 30 Hours
Data Content : Common sentence in daily life
Transcription Accuracy : 95%
Speakers Intro : 37 speakers from different areas (including Tokyo, Osaka, Hokkaido, etc) in Japan
Application Fields : Speech Recognition
Sensitive Items : No
Copyright Ownership : Magic Data
Sample Download
This work is licensed under the Creative Commons Attribution-Noncommercial-Prohibited Acting 4.0 International License Agreement.
Usage Instructions

The user shall observe the following rules when browsing the website and using the data:

  1. The user shall use the data in the following cases without payment, but shall attach "data referenced to Beijing Magic Data Co., Ltd., www.magicdatatech.com/, and must not infringe the company's other rights to the data. (1) Use data for personal study and research; (2) Use this data to introduce, comment on a work or explain a problem; (3) Report on current affairs, inevitably citation of the data in newspapers, periodicals, radio stations, television stations, etc. (4) Use this data for school classroom teaching or scientific research, the use of this data for teaching or scientific research personnel; (5) State organizations to use published works to perform official duties within a reasonable range;
  2. When using licensed data ,the user shall not use it for commercial purposes, and the user shall not have the right to sell, transfer or publish;
  3. The user shall use the data as a whole, and may not modify the data content without permission, and may not convert the data format or perform secondary development.
  4. The company does not undertake to modify certain inconsistencies or defects that may exist in the data provided. The company is not responsible for any consequences caused by the use of the data itself.
  5. If the user has exceeded the above restrictions on use, the company will take legal measures to investigate the infringer's responsibility.
Sales Department
Please fill in this form to purchase datasets or quote for
data collection/ annotation services.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Resources Department
If you want to be our data collection and annotation team
member, please fill in this form.
DATA COLLECTION PROJECTS
Language*
Location*
DATA ANNOTATION PROJECTS
Language*
CONTACT INFORMATION
Name*
Company Name*
E-mail*
Phone Number*
Experience*
Address*
Submit
Marketing Department
If you want to forward our article or tell us marketing
events, please fill in this form.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Submit
Human Resources Department
Please fill in this form to be a member of Magic Data Tech.
Name
*
Email
*
Phone Number
*
Job
*
Upload Resume
Submit
Sample Download
Name*
E-mail*
Phone Number*
Company Name*
Job
Department
Company Product
I am also interested in the following data:
Languages
Style
Scenario

We will contact you via telephone to confirm your information and provide the method to download.
Submit
Submission Successful!
We will contact you as soon as possible.
This page would be
closed in 3 seconds automatically.
>
TOP