See What's NEW

News

press images

New Updates on MagicHub.io – Free to Download Datasets for over 190 Hours!

Date : 2021-06-22     View : 2220

Conversations with AI has been more and more common in our daily life and the state-of-the-art AI technology has brought much convenience and happiness to us. The explosive development of conversational AI lead to many AI applications, among one of them are the LaMDA just released by Google, which can start a conversation with any given topics. According to statistics of IDC, a market research company, conversational market of China will reach 1.86 billion USD by 2023, and the average compound growth rate (CAGR) of 2019-2023 will be 34.0%.

A batch of datasets for conversational AI were newly release on MagicHub.io, our open-source community. Let’s have a quick look.

Mandarin Chinese Conversational Speech Corpus – Web Meeting This open-source dataset consists of 5.2 hours of transcribed Mandarin Chinese conversational speech on web meetings between laptops and mobiles. Click here to download.

Zhengzhou Dialect Conversational Speech Corpus This open-source dataset consists of 4 hours of transcribed Zhengzhou dialect conversational speech on certain topics. Click here to download.

Besides, English & Czech Telephone Conversation Data from Vystadial, developed for training acoustic models for automatic speech recognition in spoken dialogue systems, can also be downloaded via our community. Click here to download.

In addition to datasets for conversational AI, there are also some scripted speech corpus.

German Scripted Speech Corpus – Command and Query This open-source dataset consists of 0.71 hours of transcribed German scripted speech focusing on commands and queries, where 597 utterances contributed by ten speakers were contained. Click here to download.

Zhengzhou Dialect Scripted Speech Corpus – Daily Use Sentence This open-source dataset consists of 5 hours of transcribed Zhengzhou dialect scripted speech focusing on daily use sentences, where 5,132 utterances contributed by ten speakers were contained. Click here to download.

Last, maybe the Chinese-English Parallel Corpus, consisting of a hundred sentences of Chinese-English parallel corpus translated from Chinese to English, concerning finance-related daily use sentences, may also deserve your attention. Click here to download.

Until now, more than 40 sets of datasets have been released on MagicHub.io. For more datasets, please visit https://magichub.io/category/datasets/ for downloading.

We will continue to release more datasets on the community. Stay tuned!

Get Started?

Contact Us

TOP
Talk to Magic Data