Company
blog
Blog
blog
21
Jun
22
Browse: 612
New Updates on MagicHub.io – Free to Download Datasets for over 190 Hours!

Conversations with AI has been more and more common in our daily life and the state-of-the-art AI technology has brought much convenience and happiness to us. The explosive development of conversational AI lead to many AI applications, among one of them are the LaMDA just released by Google, which can start a conversation with any given topics. According to statistics of IDC, a market research company, conversational market of China will reach 1.86 billion USD by 2023, and the average compound growth rate (CAGR) of 2019-2023 will be 34.0%.

A batch of datasets for conversational AI were newly release on MagicHub.io, our open-source community. Let’s have a quick look.

Mandarin Chinese Conversational Speech Corpus – Web Meeting This open-source dataset consists of 5.2 hours of transcribed Mandarin Chinese conversational speech on web meetings between laptops and mobiles. Click here to download.

Zhengzhou Dialect Conversational Speech Corpus This open-source dataset consists of 4 hours of transcribed Zhengzhou dialect conversational speech on certain topics. Click here to download.

Besides, English & Czech Telephone Conversation Data from Vystadial, developed for training acoustic models for automatic speech recognition in spoken dialogue systems, can also be downloaded via our community. Click here to download.

In addition to datasets for conversational AI, there are also some scripted speech corpus.

German Scripted Speech Corpus – Command and Query This open-source dataset consists of 0.71 hours of transcribed German scripted speech focusing on commands and queries, where 597 utterances contributed by ten speakers were contained. Click here to download.

Zhengzhou Dialect Scripted Speech Corpus – Daily Use Sentence This open-source dataset consists of 5 hours of transcribed Zhengzhou dialect scripted speech focusing on daily use sentences, where 5,132 utterances contributed by ten speakers were contained. Click here to download.

Last, maybe the Chinese-English Parallel Corpus, consisting of a hundred sentences of Chinese-English parallel corpus translated from Chinese to English, concerning finance-related daily use sentences, may also deserve your attention. Click here to download.

Until now, more than 40 sets of datasets have been released on MagicHub.io. For more datasets, please visit https://magichub.io/category/datasets/ for downloading.

We will continue to release more datasets on the community. Stay tuned!

Share
Previous
Page
What is Conversational AI? And the challenge
Next
Page
Massive High-Quality AI Training Data Makes HMI More Intelligent, More Humanized and More Personalized
Popular Tags
Latest Blogs
What is Conversational AI? And the challenge

The demand for a quick, intelligent and natural-sounding conversation between human and machine is increasing.

21
Jun
22
Massive High-Quality AI Training Data Makes HMI More Intelligent, More Humanized and More Personalized

The 3rd China Automotive Intelligent Summit 2021, took place on 27-28, Sept. 2021, Shanghai, gathers about 120 experts and executives from the automotive industry to focus on the networked technology, software development, hardware innovation, business model and user insight of intelligent cockpit, and provide an in-depth comprehensive analysis of the opportunities and challenges of intelligent cockpit development.

21
Jun
22
Annotator® 5.0 Data Labeling Platform Empowers the AI Industry in Data Labeling

As AI research and development is moving forward both in depth and breadth, the needs for structured data grow explosively. Meanwhile, the data labeling industry is undergoing decentralization: production of structured data is shifting from large-scale third party data processing centers to scattered data end-users.

21
Jun
22
Magic Data Tech Joins INTERSPEECH 2021 | "Annotator® 5.0 SaaS Free Version” Unleashes the Potential of Data Labeling

The annual INTERSPEECH, held between August 30, and September 3, 2021, is a global conference organized by International Speech Communication Association (ISCA). The INTERSPEECH 2021 is held in hybrid form, that is participants can join the conference virtually online and physically in Brno, The Czech Republic.

21
Jun
22
Magic Data Tech announced Launch of Annotator® 5.0, An AI-Assisted Data Annotation Platform

Annotator® 5.0, an independently developed data annotation system, was official launched by Magic Data Tech on July 8, 2021 on World Artificial Intelligence Conference (WAIC) 2021, which is a global gathering and exchange of AI innovation ideas, technologies, and applications, held on Shanghai between July 7 and July 10.

21
Jun
22
Sales Department
Please fill in this form to purchase datasets or quote for
data collection/ annotation services.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Sales Department
Please fill in this form and we will contact you soon
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Resources Department
If you want to be our data collection and annotation team
member, please fill in this form.
DATA COLLECTION PROJECTS
Language*
Location*
DATA ANNOTATION PROJECTS
Language*
CONTACT INFORMATION
Name*
Company Name*
E-mail*
Phone Number*
Experience*
Address*
Submit
Marketing Department
If you want to forward our article or tell us marketing
events, please fill in this form.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Submit
Human Resources Department
Please fill in this form to be a member of Magic Data Tech.
Name
*
Email
*
Phone Number
*
Job
*
Upload Resume
Submit
Sample Download
Name*
E-mail*
Phone Number*
Company Name*
Job
Department
Company Product
I am also interested in the following data:
Languages
Style
Scenario

We will contact you via telephone to confirm your information and provide the method to download.
Submit
Submission Successful!
We will contact you as soon as possible.
This page would be
closed in 3 seconds automatically.
>
TOP