Baseline & Training Datasets Are Open Now | ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge (CSSD)

Date : 2022-07-25 View : 5679

As of its launch on July 4, 2022, ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge have received more than 40 registration. On July 24, the committee releases the baseline and training datasets for all participants.

Dataset

The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided. As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc. Please refer to MAGICDATA RAMC.

Baseline

We use VBHMM x-vectors (aka VBx) trained by VoxCeleb Data (openslr-49) and CN-Celeb Corpus (openslr-82) as baseline system. X-vectors embeddings are extracted by ResNet, and besides, agglomerative hierarchical clustering with variational Bayes HMM resegmentation are conducted to get final result. Please refer to MAGICDATA RAMC.

Rules

All participants should adhere to the following rules:

DATA: Only MAGICDATA RAMC, VoxCeleb Data (openslr 49) and CN-Celeb Corpus (openslr 82) are allowed to use. Data augmentation could be used to process the training sets, and two noise datasets, i.e., MUSAN (openslr 17), RIRNoise (openslr 28), are allowed.
The use of Test dataset in any form of non-compliance is strictly prohibited, including but not limited to use the Test dataset to fine-tune or train the model. In particular, the Test dataset refers to CSSD-Test set, which will be released on Sep, 8, 2022. And the use of MagicData-RAMC Dev and Test set are not prohibited.
Multi-system fusion is allowed. However, fusing systems with same structure is not encouraged.
All models should train on the allowed datasets. Specifically, pre-train model using other datasets (including unlabeled data) are not allowed in this challenge.
The right of final interpretation belongs to the organizer. In case of special circumstances, the organizer will coordinate the interpretation.

Scoring tool

We adopt Conversational-DER (CDER) to evaluate the speaker diarization system. In real conversations, there are cases that a shorter duration contains vital information. The evaluation of the speaker diarization system based on the time duration is difficult to reflect the recognition performance of short-term segments. Our basic idea is that for each speaker, regardless of the length of the spoken sentence, all type of mistakes should be equally reflected in the final evaluation metric. Based on this, we intend to evaluate the performance of the speaker diarization system on the sentence level under conversational scenario (utterance level). Please refer to CDER METRIC(https://github.com/MagicHub-io/CDER_Metric)

Registration for the competition stay open. Check out and join competition here: Join Competition.

If you have any questions, please contact us. You could open an issue on github or EMAIL open@magicdatatech.com.

Best of luck!

Latest Press

Qingqing ZHANG: Conversation Data Promotes AIGC—Training Data of Large-Scale Models

"Training data is technology " .

That’s what OpenAI co-founder Ilya Sutskever said when taking interview with The Verge. ChatGPT amaze the world since its release. The stunning performance of GPT-4 makes us believe we have enter a new era in AI.

What makes large model so omniscient? In our opinion, the reason may lie in the data...

This article is a collection of Dr. Qingqing Zhang’s thoughts on data, large models and generative AI.

Integrating ASR with Text Summarizer, Secure Your Leading Position in Web Conferencing Market with Magic Data Multi-Person Spontaneous Meetings Dataset

Online meetings have become a frequently used tool for business and learning. How to meet the more diversifying online conferencing needs of users has brought great challenges to remote work applications, including captioning, real-time machine translation, smart meeting minutes and other artificial intelligence applications.

Open Dataset | Automobile Cabin Voice Interaction Data Solution

In recent years, with the development of artificial intelligence, chip technology, and new innovations in the automotive industry have been driven by the increase in smart car popularity. A smart car consists of three parts: The Internet of Vehicles, the smart cockpit, and the autonomous driving. The smart cockpit is equipped with intelligent and networked in-vehicle software, which can intelligently interact with people, roads, and vehicles. It is an important link and key node for the evolution of the human-vehicle relationship from a tool to a partner.

The Future of Virtual Companionship

Nowadays, more and more young people are buying chat services on e-commerce platforms to accompany them virtually and confiding in “chat buddy” to communicate and express their feelings. Prices for various degrees of companionship range from tens of yuan to the customized "virtual lover" for thousands of yuan. In recent years, virtual companionship services have become a fashionable self-healing way for young people to seek spiritual comfort and express their voices on the Internet. There are many stores on Taobao that provide this service, such as "gentle and cute little sweetheart", "overbearing dictatorial president fan", as long as you pay, you can find your favorite "buddy".

Will Humans Be Replaced by AI?

AI-generated art has experienced rapid growth in both popularity and accessibility over the past few months. With engines like DALL-E, Midjourney, and Stable Diffusion spurring an influx of AI-generated artwork on online platforms.

News

Baseline & Training Datasets Are Open Now | ISCSLP 2022 Conversational Short-phrase Speaker Diarization Challenge (CSSD)

Get Started?