See What's NEW


press images

Open-source Data Community MagicHub Officially Launched!

Date : 2021-04-15     View : 1147

In 1969, Unix released source code on Unix Community, initiating the first “open-source act” in human history.

In 1991, the Linux kernel was released.

In 1998, Netscape Communications released the source code for its Communicator suite, which defined the word “open source.”

In 2005, a source-code management system called Git came out, which gave rise to the managed Git code warehouse.


Open source had deeply rooted in the internet ecosystem, changed the patterns of the internet industry.And the evolution of open source itself is a magnificent part of internet history.

Since the conception of Artificial intelligence (AI) was put forward in the Dartmouth Summer Research Project, it suffered countless ups and downs. Internet, big data, cloud computing, 5G, numerous new technologies came out and played increasingly important roles.

AI opened a new era, and open source comes up. Platforms for machine learning have been emerging. Developers, generation by generation, contribute their intelligence to the evolution of AI in the spirit of openness, freedom, and cooperation.

An increasing number of governments, NGOs, companies, academic institutions, and individuals release their image, textual, and audio data to the public and formed platforms like Kaggle, UCI, OpenML, ImageNet, OpenSLR. Data has yet to become the core driver for AI development.

Launch of MagicHub open-source community.

MagicHub, according, was launched on April 15. As the founder, Magic Data lays the leading position in the amount of conversational speech data and becomes the first company to release open-source datasets on an independent website, which might change the way users get data.

The Father of Speech Recognition Toolkit Kaldi, Daniel Povey, together with more than ten AI developers, cheered for the launch of the MagicHub Community.

Massive, diversiform datasets are released on The datasets are subdivided into multiple dimensions, offering AI engineers a more efficient way to find datasets for their various AI models, thereby reserves more energy on algorithm optimization.

Magic Data welcomes all data producers of discoverers to join and release datasets on MagicHub. We, together, could build a better ecology for open source. Please contact us if interested.

Home page of

MagicHub has released more than 30 open-source datasets, including Mandarin Chinese, English, and Shanghai Dialect (Wu Chinese) conversational speech, NLP textual corpus, TTS corpus, and lexicons. All datasets are divided by languages, scenes, and industries as possible.

We will be releasing high-quality datasets and more content on MagicHub, and we always appreciate your comments, sharing, or any form of support. Let’s together make MagicHub a better place for inspiration and the spirit of sharing.

Click here to visit MagicHub.

Get Started?

Contact Us

Talk to Magic Data