Company
Blog
blog
21
Jun
22
Browse: 333
New Updates on MagicHub.io – Free to Download Datasets for over 190 Hours!

Conversations with AI has been more and more common in our daily life and the state-of-the-art AI technology has brought much convenience and happiness to us. The explosive development of conversational AI lead to many AI applications, among one of them are the LaMDA just released by Google, which can start a conversation with any given topics. According to statistics of IDC, a market research company, conversational market of China will reach 1.86 billion USD by 2023, and the average compound growth rate (CAGR) of 2019-2023 will be 34.0%.

A batch of datasets for conversational AI were newly release on MagicHub.io, our open-source community. Let’s have a quick look.

Mandarin Chinese Conversational Speech Corpus – Web Meeting This open-source dataset consists of 5.2 hours of transcribed Mandarin Chinese conversational speech on web meetings between laptops and mobiles. Click here to download.

Zhengzhou Dialect Conversational Speech Corpus This open-source dataset consists of 4 hours of transcribed Zhengzhou dialect conversational speech on certain topics. Click here to download.

Besides, English & Czech Telephone Conversation Data from Vystadial, developed for training acoustic models for automatic speech recognition in spoken dialogue systems, can also be downloaded via our community. Click here to download.

In addition to datasets for conversational AI, there are also some scripted speech corpus.

German Scripted Speech Corpus – Command and Query This open-source dataset consists of 0.71 hours of transcribed German scripted speech focusing on commands and queries, where 597 utterances contributed by ten speakers were contained. Click here to download.

Zhengzhou Dialect Scripted Speech Corpus – Daily Use Sentence This open-source dataset consists of 5 hours of transcribed Zhengzhou dialect scripted speech focusing on daily use sentences, where 5,132 utterances contributed by ten speakers were contained. Click here to download.

Last, maybe the Chinese-English Parallel Corpus, consisting of a hundred sentences of Chinese-English parallel corpus translated from Chinese to English, concerning finance-related daily use sentences, may also deserve your attention. Click here to download.

Until now, more than 40 sets of datasets have been released on MagicHub.io. For more datasets, please visit https://magichub.io/category/datasets/ for downloading.

We will continue to release more datasets on the community. Stay tuned!

Share
Previous
Page
Magic Data Tech announced Launch of Annotator® 5.0, An AI-Assisted Data Annotation Platform
Next
Page
Worthy of Bookmarking! | 20 Websites to get Free Datasets for Your AI Model Training
Popular Tags
Latest Blogs
Magic Data Tech announced Launch of Annotator® 5.0, An AI-Assisted Data Annotation Platform

Annotator® 5.0, an independently developed data annotation system, was official launched by Magic Data Tech on July 8, 2021 on World Artificial Intelligence Conference (WAIC) 2021, which is a global gathering and exchange of AI innovation ideas, technologies, and applications, held on Shanghai between July 7 and July 10.

21
Jun
22
Worthy of Bookmarking! | 20 Websites to get Free Datasets for Your AI Model Training

Collecting and sorting out data have always been a time-consuming and tedious procedure for AI developers and researchers. Here we list 20 sites where high-quality data is ready and free, in hope of assisting you to locate the proper dataset for your AI modal in a better way.

21
Jun
22
Magic Data Tech Was Qualified as “Top Specialized, Fine, Distinctive and Innovative Company” by Beijing Municipal Bureau of Economy and Information Technology

Recently, Beijing Municipal Bureau of Economy and Information Technology officially released the list of “First Group of the Top Specialized, Fine, Special and Innovative company”. Magic Data Tech was named in the list for its professional and innovative services in the field of AI data services.

21
Jun
22
Good News! Magic Data Tech Wins “Best Supplier of Alibaba Cloud 2021”

We are proud to announce Magic Data Tech has been named the “Best Supplier of Alibaba Cloud 2021”.

21
Jun
22
Magic Data Tech Won Intel AI 100 Acceleration Program: Supporting AI Industry from the Basics

On May 20, 2021, Intel published the 5th issue of its AI 100 Acceleration Program list at the 2021 Shenzhen (International) Artificial Intelligence Exhibition, and Magic Data Tech was selected for the program by relying on its strong innovation strength.

21
Jun
22
Sales Department
Please fill in this form to purchase datasets or quote for
data collection/ annotation services.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Resources Department
If you want to be our data collection and annotation team
member, please fill in this form.
DATA COLLECTION PROJECTS
Language*
Location*
DATA ANNOTATION PROJECTS
Language*
CONTACT INFORMATION
Name*
Company Name*
E-mail*
Phone Number*
Experience*
Address*
Submit
Marketing Department
If you want to forward our article or tell us marketing
events, please fill in this form.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Submit
Human Resources Department
Please fill in this form to be a member of Magic Data Tech.
Name
*
Email
*
Phone Number
*
Job
*
Upload Resume
Submit
Sample Download
Name*
E-mail*
Phone Number*
Company Name*
Job
Department
Company Product
I am also interested in the following data:
Languages
Style
Scenario

We will contact you via telephone to confirm your information and provide the method to download.
Submit
Submission Successful!
We will contact you as soon as possible.
This page would be
closed in 3 seconds automatically.
>
TOP