Nowadays, a new round of technological revolution and industrial transformation is rising, which promote the rapid evolution of digital technology represented by artificial intelligence, and human beings are marching into an intelligent society. According to the White Paper on the Core Technology Industry of Artificial Intelligence, released by the China Academy of Information and Communications in April this year, AI has fully covered the basic elements of social operation and improved overall operating efficiency. In the future, AI will be as ubiquitous as water and electricity, disrupting and transforming every industry.
Data play an important role in supporting the development of artificial intelligence. Artificial intelligence model needs massive data to train and optimize. Among the three core elements, data, algorithm and arithmetic power, the focus is changing from algorithm to data. Data sets the upper bound for machine learning, and only when data is valued by developers, more accurate models can be trained. Enda Wu, a leading machine-learning expert, argues that machine learning will evolve rapidly if more emphasis is placed on data rather than model.
755 Hours of Mandarin Chinese Speech Corpora newly arrive on MagicHub
To drive the development of artificial intelligence technology, Magic Data Tech launched the open-source community, MagicHub (https://magichub.io/), releasing large amounts of data for developers around the world. Recently, Magic Data Tech continues to release a batch of 755 hours of Mandarin Chinese Speech Corpus in the community. The corpora was previously open-source at OpenSLR, and will be linked to the community for free download. Click here to download.
The 755-hour Mandarin Chinese speech corpus, which has a total duration of 10566.9 hours, has been instrumental in the Exploring Methods for the Automatic Detection of Errors in Manual Transcription, a research project by Language and Speech Processing Centre of Johns Hopkins University.
Indonesian and Malay Conversational Speech Corpora
Indonesian and Malay conversational speech corpora are released in MagicHub community recently by Magic Data Tech, providing developers with high-quality conversational AI training data.
The Indonesian conversational speech corpus contains free conversations of over 800 native Indonesians, which are collected in indoor environment. Five Hours of the Indonesian conversational speech corpus is open-source in MagicHub. Click here to download.
Malay conversational speech corpus captures nearly 700 Malaysians’ free conversations in indoor environment. Five Hours of the Malay conversational speech corpus is open-source in the MagicHub.io. Click here to download.
Adhering to the spirit of "share, innovate and grow," MagicHub provides open-source conversational AI training data for the industry. Magic Data Tech currently has released more than 30 sets (nearly 1,000 hours) of open-source datasets in the community, including corpora of English, Spanish, Italian, Korean, Japanese, Chinese (Mandarin, Cantonese, Sichuan and Shanghai dialect, etc.), in-vehicle noise dataset, lexicon, and so on. At the same time, we welcome data owners to release datasets on MagicHub, to build a better ecology for open source.
We are proud to announce Magic Data Tech has been named the “Best Supplier of Alibaba Cloud 2021”.
On May 20, 2021, Intel published the 5th issue of its AI 100 Acceleration Program list at the 2021 Shenzhen (International) Artificial Intelligence Exhibition, and Magic Data Tech was selected for the program by relying on its strong innovation strength.
Recently, nearly one thousand car companies attended the Shanghai Auto Show, where electrification and intelligentization were the standard equipment for many car companies. The intelligent functions of cars, such as intelligent cabins, self-driving and cloud service, depict the future of intelligent cars.
Recently, the third CIO Summit of the Bank of China (BOC) was held in Shanghai in 2021, bringing together executives and CIOs from the financial, technological and Internet sectors to discuss and share with them on the theme of "Banking Era 4.0: Go all in on Digital Transformation", so as to share ideas on the way of the digital transformation of banking.
In 1969, Unix released source code on Unix Community, initiating the first “open-source act” in human history.