Company
blog
Blog
blog
21
May
17
Browse: 530
New Arrival in MagicHub! Get Hundreds of Hours of Datasets for Free!

Nowadays, a new round of technological revolution and industrial transformation is rising, which promote the rapid evolution of digital technology represented by artificial intelligence, and human beings are marching into an intelligent society. According to the White Paper on the Core Technology Industry of Artificial Intelligence, released by the China Academy of Information and Communications in April this year, AI has fully covered the basic elements of social operation and improved overall operating efficiency. In the future, AI will be as ubiquitous as water and electricity, disrupting and transforming every industry.

Data play an important role in supporting the development of artificial intelligence. Artificial intelligence model needs massive data to train and optimize. Among the three core elements, data, algorithm and arithmetic power, the focus is changing from algorithm to data. Data sets the upper bound for machine learning, and only when data is valued by developers, more accurate models can be trained. Enda Wu, a leading machine-learning expert, argues that machine learning will evolve rapidly if more emphasis is placed on data rather than model.

755 Hours of Mandarin Chinese Speech Corpora newly arrive on MagicHub

To drive the development of artificial intelligence technology, Magic Data Tech launched the open-source community, MagicHub (https://magichub.io/), releasing large amounts of data for developers around the world. Recently, Magic Data Tech continues to release a batch of 755 hours of Mandarin Chinese Speech Corpus in the community. The corpora was previously open-source at OpenSLR, and will be linked to the community for free download. Click here to download.

The 755-hour Mandarin Chinese speech corpus, which has a total duration of 10566.9 hours, has been instrumental in the Exploring Methods for the Automatic Detection of Errors in Manual Transcription, a research project by Language and Speech Processing Centre of Johns Hopkins University.

Indonesian and Malay Conversational Speech Corpora

Indonesian and Malay conversational speech corpora are released in MagicHub community recently by Magic Data Tech, providing developers with high-quality conversational AI training data.

The Indonesian conversational speech corpus contains free conversations of over 800 native Indonesians, which are collected in indoor environment. Five Hours of the Indonesian conversational speech corpus is open-source in MagicHub. Click here to download.

Malay conversational speech corpus captures nearly 700 Malaysians’ free conversations in indoor environment. Five Hours of the Malay conversational speech corpus is open-source in the MagicHub.io. Click here to download.

Adhering to the spirit of "share, innovate and grow," MagicHub provides open-source conversational AI training data for the industry. Magic Data Tech currently has released more than 30 sets (nearly 1,000 hours) of open-source datasets in the community, including corpora of English, Spanish, Italian, Korean, Japanese, Chinese (Mandarin, Cantonese, Sichuan and Shanghai dialect, etc.), in-vehicle noise dataset, lexicon, and so on. At the same time, we welcome data owners to release datasets on MagicHub, to build a better ecology for open source.

Share
Previous
Page
Annotator® 5.0 Data Labeling Platform Empowers the AI Industry in Data Labeling
Next
Page
Magic Data Tech Joins INTERSPEECH 2021 | "Annotator® 5.0 SaaS Free Version” Unleashes the Potential of Data Labeling
Popular Tags
Latest Blogs
Annotator® 5.0 Data Labeling Platform Empowers the AI Industry in Data Labeling

As AI research and development is moving forward both in depth and breadth, the needs for structured data grow explosively. Meanwhile, the data labeling industry is undergoing decentralization: production of structured data is shifting from large-scale third party data processing centers to scattered data end-users.

21
May
17
Magic Data Tech Joins INTERSPEECH 2021 | "Annotator® 5.0 SaaS Free Version” Unleashes the Potential of Data Labeling

The annual INTERSPEECH, held between August 30, and September 3, 2021, is a global conference organized by International Speech Communication Association (ISCA). The INTERSPEECH 2021 is held in hybrid form, that is participants can join the conference virtually online and physically in Brno, The Czech Republic.

21
May
17
Magic Data Tech announced Launch of Annotator® 5.0, An AI-Assisted Data Annotation Platform

Annotator® 5.0, an independently developed data annotation system, was official launched by Magic Data Tech on July 8, 2021 on World Artificial Intelligence Conference (WAIC) 2021, which is a global gathering and exchange of AI innovation ideas, technologies, and applications, held on Shanghai between July 7 and July 10.

21
May
17
Worthy of Bookmarking! | 20 Websites to get Free Datasets for Your AI Model Training

Collecting and sorting out data have always been a time-consuming and tedious procedure for AI developers and researchers. Here we list 20 sites where high-quality data is ready and free, in hope of assisting you to locate the proper dataset for your AI modal in a better way.

21
May
17
New Updates on MagicHub.io – Free to Download Datasets for over 190 Hours!

A batch of datasets for conversational AI were newly release on MagicHub.io, our open-source community. Let’s have a quick look.

21
May
17
Sales Department
Please fill in this form to purchase datasets or quote for
data collection/ annotation services.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Sales Department
Please fill in this form and we will contact you soon
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Resources Department
If you want to be our data collection and annotation team
member, please fill in this form.
DATA COLLECTION PROJECTS
Language*
Location*
DATA ANNOTATION PROJECTS
Language*
CONTACT INFORMATION
Name*
Company Name*
E-mail*
Phone Number*
Experience*
Address*
Submit
Marketing Department
If you want to forward our article or tell us marketing
events, please fill in this form.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Submit
Human Resources Department
Please fill in this form to be a member of Magic Data Tech.
Name
*
Email
*
Phone Number
*
Job
*
Upload Resume
Submit
Sample Download
Name*
E-mail*
Phone Number*
Company Name*
Job
Department
Company Product
I am also interested in the following data:
Languages
Style
Scenario

We will contact you via telephone to confirm your information and provide the method to download.
Submit
Submission Successful!
We will contact you as soon as possible.
This page would be
closed in 3 seconds automatically.
>
TOP