Artificial intelligence requires huge volume of data to be trained. For some AI companies and researchers, data can be difficult and time consuming to collect. Open-source data can help mitigate these challenges and boost the development of AI.
Japanese Read Speech Recognition Corpus was developed by MAGICDATA TECHNOLOGY Co., Ltd. with a significant volume of 1500 hours. A subset of 30-hour scripted read speech data was developed and freely published for non-commercial use. 37 native speakers are from different areas, including Tokyo, Osaka, Hokkaido, etc. The corpus is a test set, recorded indoors and the output is PCM formatted. The recording texts are from daily conversation.
MAGICDATA Kid Voice TTS Corpus in Mandarin Chinese was recorded by a four-year-old Chinese girl originally born in Beijing China. This time we published 15-minute speech data from the corpus for non-commercial use.
MAGICDATA Mandarin Chinese Read Speech Corpus was developed by MAGIC DATA TECHNOLOGY Co., Ltd. and freely published for non-commercial use. The corpus consists of 755 hours of scripted read speech data by 1000 native speakers of the Mandarin Chinese spoken in mainland China.