This dataset is designed to train AI models that better understand spoken Korean, improving natural interaction in speech recognition. It features diverse real-life dialogues with high transcription accuracy. Key phonological changes like liaison and batchim assimilation are carefully annotated. Complete sentences and emotion-aware punctuation help models capture Korean speech patterns and sentence-ending intent.
Language
Korean
Data Style
spontaneous
Bit Rate
16bits
Channel
1
Total Audio Duration
10000+ hours
ISO/IEC 27001 & ISO/IEC 27701:2019 compliant
Audio, text, image, and video multi-modal data
Conversational, scripted, and spontaneous data covering extensive domains
Expertise secured quality result