Earlier this year, Magic Data Tech completed its Series B funding round with tens of millions of RMB. With this financing, Magic Data Tech, a global provider for AI datasets, will extend both in terms of business and scale, and provide better voice data packages and services to the AI market.
Financed by Vantron Capital, the funding will be used to build a global open-source community (magichub.io) for AI developers, to develop datasets for conversational AI, as well as to research and develop the SaaS platform for data collection and annotation.
Ever since its establishment, Magic Data Tech has been providing clients worldwide with professional AI data solutions, covering data processing solution design, training/testing datasets, data labeling, and the private deployment of data processing systems, among others.
In recent years, breakthroughs have been made in AI technologies and applications, and the scale of AI data service industry has been expanding. According to iResearch, a Chinese consultancy, the market scale of data service industry in China is expected to exceed 10 billion yuan in 2025, with an annual compound growth rate of 21.8%. The global market for data services is expected to exceed $50 billion by 2025.
For the AI industry, data, as the basic layer, is the cornerstone of AI development, and it is also one of the three basic factors driving the rise of AI. The quantity, quality, security and compliance of data are crucial to the development of AI. Currently, Magic Data Tech provides a large number of high-quality and reliable training datasets and services to clients and the AI market, with more than 130,000 hours of speech corpora in type of reading and conversation and others, covering more than 50 languages, both in commonly used English and Japanese, but also in Malay, Thai, Indonesian and other languages.
Figure: Languages and proportion
Our services and products qualifies for the ISO 9001, ISO 27001, CMMI3 criteria, and especially the ISO/IEC 27701: 2019 criteria, which is the latest and most authoritative international risk assessment standards.
Voice data processing is critical to AI and indispensable to human-computer interaction. Vantron Capital focus on investment in technology-relating field like AI and big data. Magic Data Tech, dedicated to voice data industry for years, with elite team members from CAS, Microsoft, Intel, IBM and KPMG, is powerful in technology strength and experienced in management.
For this investment, Xiaoyan ZHANG, partner of Vantron Capital comments: “AI is connected by algorithm and data. As the basis of AI, data is an asset of high value. Voice is more complicated and difficult to be processed compared to image and text. Magic Data Tech provides effective and high-quality data services to the AI industry and clients. We are very pleased to invest in Magic Data team led by Qingqing ZHANG again. We are thrilled to be a companion in their way to excellence.”
Previously, Magic Data Tech won Pre Series-A Funding from Future Capital in 2017, Series-A Funding from Ceyuan Ventures and Plum Ventures in 2018, and investment from Ceyuan Ventures and Fuzhuo Investment in 2019.
As AI research and development is moving forward both in depth and breadth, the needs for structured data grow explosively. Meanwhile, the data labeling industry is undergoing decentralization: production of structured data is shifting from large-scale third party data processing centers to scattered data end-users.
The annual INTERSPEECH, held between August 30, and September 3, 2021, is a global conference organized by International Speech Communication Association (ISCA). The INTERSPEECH 2021 is held in hybrid form, that is participants can join the conference virtually online and physically in Brno, The Czech Republic.
Annotator® 5.0, an independently developed data annotation system, was official launched by Magic Data Tech on July 8, 2021 on World Artificial Intelligence Conference (WAIC) 2021, which is a global gathering and exchange of AI innovation ideas, technologies, and applications, held on Shanghai between July 7 and July 10.
Collecting and sorting out data have always been a time-consuming and tedious procedure for AI developers and researchers. Here we list 20 sites where high-quality data is ready and free, in hope of assisting you to locate the proper dataset for your AI modal in a better way.
A batch of datasets for conversational AI were newly release on MagicHub.io, our open-source community. Let’s have a quick look.