Open-source Data Community MagicHub Officially Launched!

Date : 2021-04-15 View : 4749

In 1969, Unix released source code on Unix Community, initiating the first “open-source act” in human history.

In 1991, the Linux kernel was released.

In 1998, Netscape Communications released the source code for its Communicator suite, which defined the word “open source.”

In 2005, a source-code management system called Git came out, which gave rise to the managed Git code warehouse.

/Shetuwang

Open source had deeply rooted in the internet ecosystem, changed the patterns of the internet industry.And the evolution of open source itself is a magnificent part of internet history.

Since the conception of Artificial intelligence (AI) was put forward in the Dartmouth Summer Research Project, it suffered countless ups and downs. Internet, big data, cloud computing, 5G, numerous new technologies came out and played increasingly important roles.

AI opened a new era, and open source comes up. Platforms for machine learning have been emerging. Developers, generation by generation, contribute their intelligence to the evolution of AI in the spirit of openness, freedom, and cooperation.

An increasing number of governments, NGOs, companies, academic institutions, and individuals release their image, textual, and audio data to the public and formed platforms like Kaggle, UCI, OpenML, ImageNet, OpenSLR. Data has yet to become the core driver for AI development.

Launch of MagicHub open-source community.

MagicHub, according, was launched on April 15. As the founder, Magic Data lays the leading position in the amount of conversational speech data and becomes the first company to release open-source datasets on an independent website, which might change the way users get data.

The Father of Speech Recognition Toolkit Kaldi, Daniel Povey, together with more than ten AI developers, cheered for the launch of the MagicHub Community.

Massive, diversiform datasets are released on MagicHub.io. The datasets are subdivided into multiple dimensions, offering AI engineers a more efficient way to find datasets for their various AI models, thereby reserves more energy on algorithm optimization.

Magic Data welcomes all data producers of discoverers to join and release datasets on MagicHub. We, together, could build a better ecology for open source. Please contact us if interested.

Home page of MagicHub.io

MagicHub has released more than 30 open-source datasets, including Mandarin Chinese, English, and Shanghai Dialect (Wu Chinese) conversational speech, NLP textual corpus, TTS corpus, and lexicons. All datasets are divided by languages, scenes, and industries as possible.

We will be releasing high-quality datasets and more content on MagicHub, and we always appreciate your comments, sharing, or any form of support. Let’s together make MagicHub a better place for inspiration and the spirit of sharing.

Click here to visit MagicHub.

Latest Press

Qingqing ZHANG: Conversation Data Promotes AIGC—Training Data of Large-Scale Models

"Training data is technology " .

That’s what OpenAI co-founder Ilya Sutskever said when taking interview with The Verge. ChatGPT amaze the world since its release. The stunning performance of GPT-4 makes us believe we have enter a new era in AI.

What makes large model so omniscient? In our opinion, the reason may lie in the data...

This article is a collection of Dr. Qingqing Zhang’s thoughts on data, large models and generative AI.

Integrating ASR with Text Summarizer, Secure Your Leading Position in Web Conferencing Market with Magic Data Multi-Person Spontaneous Meetings Dataset

Online meetings have become a frequently used tool for business and learning. How to meet the more diversifying online conferencing needs of users has brought great challenges to remote work applications, including captioning, real-time machine translation, smart meeting minutes and other artificial intelligence applications.

Open Dataset | Automobile Cabin Voice Interaction Data Solution

In recent years, with the development of artificial intelligence, chip technology, and new innovations in the automotive industry have been driven by the increase in smart car popularity. A smart car consists of three parts: The Internet of Vehicles, the smart cockpit, and the autonomous driving. The smart cockpit is equipped with intelligent and networked in-vehicle software, which can intelligently interact with people, roads, and vehicles. It is an important link and key node for the evolution of the human-vehicle relationship from a tool to a partner.

The Future of Virtual Companionship

Nowadays, more and more young people are buying chat services on e-commerce platforms to accompany them virtually and confiding in “chat buddy” to communicate and express their feelings. Prices for various degrees of companionship range from tens of yuan to the customized "virtual lover" for thousands of yuan. In recent years, virtual companionship services have become a fashionable self-healing way for young people to seek spiritual comfort and express their voices on the Internet. There are many stores on Taobao that provide this service, such as "gentle and cute little sweetheart", "overbearing dictatorial president fan", as long as you pay, you can find your favorite "buddy".

Will Humans Be Replaced by AI?

AI-generated art has experienced rapid growth in both popularity and accessibility over the past few months. With engines like DALL-E, Midjourney, and Stable Diffusion spurring an influx of AI-generated artwork on online platforms.

News

Open-source Data Community MagicHub Officially Launched!

Get Started?