Open-source Dataset

Provide extensive training data for AI research and improve model performance quickly

Japanese Read Speech Recognition Corpus

Japnese, Read Speech, Smartphone

Japanese Read Speech Recognition Corpus was developed by MAGICDATA TECHNOLOGY Co., Ltd. with a significant volume of 1500 hours. A subset of 30-hour scripted read speech data was developed and freely published for non-commercial use. 37 native speakers are from different areas, including Tokyo, Osaka, Hokkaido, etc. The corpus is a test set, recorded indoors and the output is PCM formatted. The recording texts are from daily conversation.

Data Specification

Language

Japanese

Recording Environment

Quiet Indoors

Audio duration

30 Hours

Data Content

Common sentence in daily life

Transcription Accuracy

95%

Speakers Intro

37 speakers from different areas (including Tokyo, Osaka, Hokkaido, etc) in Japan

Application Fields

Speech Recognition

Sensitive Items

No

Copyright Ownership

Magic Data

Samples Download

This work is licensed under the Creative Commons Attribution-Noncommercial-Prohibited Acting 4.0 International License Agreement.

Usage Instructions

The user shall observe the following rules when browsing the website and using the data:

1.    The user shall use the data in the following cases without payment, but shall attach "data referenced to Beijing Magic Data Co., Ltd., “www.imagicdatatech.com/index.php/home/dataopensource/data_info/id/101”, 05/2019" and must not infringe the company's other rights to the data.
(1)    Use data for personal study and research;
(2)    Use this data to introduce, comment on a work or explain a problem;
(3)    Report on current affairs, inevitably citation of the data in newspapers, periodicals, radio stations, television stations, etc.
(4)    Use this data for school classroom teaching or scientific research, the use of this data for teaching or scientific research personnel;
(5)    State organizations to use published works to perform official duties within a reasonable range;
2.    When using licensed data ,the user shall not use it for commercial purposes, and the user shall not have the right to sell, transfer or publish;
3.    The user shall use the data as a whole, and may not modify the data content without permission, and may not convert the data format or perform secondary development.
4.    The company does not undertake to modify certain inconsistencies or defects that may exist in the data provided. The company is not responsible for any consequences caused by the use of the data itself.
5.    If the user has exceeded the above restrictions on use, the company will take legal measures to investigate the infringer's responsibility.
 
 
 

Contact information

Name *
Phone Number *
E-mail *
Company Name *
Address *

We will call you to confirm your information and provide the method to download.

Telephone: +86-10-82527250