Company
Blog
blog
21
May
17
Browse: 220
New Arrival in MagicHub! Get Hundreds of Hours of Datasets for Free!

Nowadays, a new round of technological revolution and industrial transformation is rising, which promote the rapid evolution of digital technology represented by artificial intelligence, and human beings are marching into an intelligent society. According to the White Paper on the Core Technology Industry of Artificial Intelligence, released by the China Academy of Information and Communications in April this year, AI has fully covered the basic elements of social operation and improved overall operating efficiency. In the future, AI will be as ubiquitous as water and electricity, disrupting and transforming every industry.

Data play an important role in supporting the development of artificial intelligence. Artificial intelligence model needs massive data to train and optimize. Among the three core elements, data, algorithm and arithmetic power, the focus is changing from algorithm to data. Data sets the upper bound for machine learning, and only when data is valued by developers, more accurate models can be trained. Enda Wu, a leading machine-learning expert, argues that machine learning will evolve rapidly if more emphasis is placed on data rather than model.

755 Hours of Mandarin Chinese Speech Corpora newly arrive on MagicHub

To drive the development of artificial intelligence technology, Magic Data Tech launched the open-source community, MagicHub (https://magichub.io/), releasing large amounts of data for developers around the world. Recently, Magic Data Tech continues to release a batch of 755 hours of Mandarin Chinese Speech Corpus in the community. The corpora was previously open-source at OpenSLR, and will be linked to the community for free download. Click here to download.

The 755-hour Mandarin Chinese speech corpus, which has a total duration of 10566.9 hours, has been instrumental in the Exploring Methods for the Automatic Detection of Errors in Manual Transcription, a research project by Language and Speech Processing Centre of Johns Hopkins University.

Indonesian and Malay Conversational Speech Corpora

Indonesian and Malay conversational speech corpora are released in MagicHub community recently by Magic Data Tech, providing developers with high-quality conversational AI training data.

The Indonesian conversational speech corpus contains free conversations of over 800 native Indonesians, which are collected in indoor environment. Five Hours of the Indonesian conversational speech corpus is open-source in MagicHub. Click here to download.

Malay conversational speech corpus captures nearly 700 Malaysians’ free conversations in indoor environment. Five Hours of the Malay conversational speech corpus is open-source in the MagicHub.io. Click here to download.

Adhering to the spirit of "share, innovate and grow," MagicHub provides open-source conversational AI training data for the industry. Magic Data Tech currently has released more than 30 sets (nearly 1,000 hours) of open-source datasets in the community, including corpora of English, Spanish, Italian, Korean, Japanese, Chinese (Mandarin, Cantonese, Sichuan and Shanghai dialect, etc.), in-vehicle noise dataset, lexicon, and so on. At the same time, we welcome data owners to release datasets on MagicHub, to build a better ecology for open source.

Share
Previous
Page
Good News! Magic Data Tech Wins “Best Supplier of Alibaba Cloud 2021”
Next
Page
Magic Data Tech Won Intel AI 100 Acceleration Program: Supporting AI Industry from the Basics
Popular Tags
Latest Blogs
Good News! Magic Data Tech Wins “Best Supplier of Alibaba Cloud 2021”

We are proud to announce Magic Data Tech has been named the “Best Supplier of Alibaba Cloud 2021”.

21
May
17
Magic Data Tech Won Intel AI 100 Acceleration Program: Supporting AI Industry from the Basics

On May 20, 2021, Intel published the 5th issue of its AI 100 Acceleration Program list at the 2021 Shenzhen (International) Artificial Intelligence Exhibition, and Magic Data Tech was selected for the program by relying on its strong innovation strength.

21
May
17
New Update on MagicHub—Get Free Datasets for In-Vehicle Scene!

Recently, nearly one thousand car companies attended the Shanghai Auto Show, where electrification and intelligentization were the standard equipment for many car companies. The intelligent functions of cars, such as intelligent cabins, self-driving and cloud service, depict the future of intelligent cars.

21
May
17
GO ALL IN ON DIGITAL TRANSFORMATION - SEE HOW DATA EMPOWERS BANKS IN INDUSTRY 4.0

Recently, the third CIO Summit of the Bank of China (BOC) was held in Shanghai in 2021, bringing together executives and CIOs from the financial, technological and Internet sectors to discuss and share with them on the theme of "Banking Era 4.0: Go all in on Digital Transformation", so as to share ideas on the way of the digital transformation of banking.

21
May
17
Open-source Data Community MagicHub Officially Launched!

In 1969, Unix released source code on Unix Community, initiating the first “open-source act” in human history.

21
May
17
Sales Department
Please fill in this form to purchase datasets or quote for
data collection/ annotation services.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Country
City
Submit
Resources Department
If you want to be our data collection and annotation team
member, please fill in this form.
DATA COLLECTION PROJECTS
Language*
Location*
DATA ANNOTATION PROJECTS
Language*
CONTACT INFORMATION
Name*
Company Name*
E-mail*
Phone Number*
Experience*
Address*
Submit
Marketing Department
If you want to forward our article or tell us marketing
events, please fill in this form.
Name
*
Company Name
*
Email
*
Phone Number
*
Detail
Submit
Human Resources Department
Please fill in this form to be a member of Magic Data Tech.
Name
*
Email
*
Phone Number
*
Job
*
Upload Resume
Submit
Sample Download
Name*
E-mail*
Phone Number*
Company Name*
Job
Department
Company Product
I am also interested in the following data:
Languages
Style
Scenario

We will contact you via telephone to confirm your information and provide the method to download.
Submit
Submission Successful!
We will contact you as soon as possible.
This page would be
closed in 3 seconds automatically.
>
TOP