See What's NEW
MDT-NLP-F024 Mandarin Chinese Text Normalization Text Corpus

MDT-NLP-F024 Mandarin Chinese Text Normalization Text Corpus

Language

zh-CN

Number of Utterances

100,736

Data Content

Text Normalization

File Format

TXT

Field of Application

NLP

Data Sensitive Items

nil

Copyright Owner

Magic Data

Sample

100001
另一队中国组合由邵奕俊担任舵手,最终排名第十四,落后冠军组合1.63秒。
另一队中国组合由邵奕俊担任舵手,最终排名第十四,落后冠军组合一点六三秒。

100002
第二局比赛中国队攻势不减,侯宇阳在23分33秒时将比分改写为3:0。
第二局比赛中国队攻势不减,侯宇阳在二十三分三十三秒时将比分改写为三比零。

100003
上半场比赛双方打成10-10平,这是超级碗历史上第四次半场分数持平。
半场比赛双方打成十比十平,这是超级碗历史上第四次半场分数持平。

No related data? Contact us

Contact Us

Related Datasets

MDT-NLP-F022 English Medical Customer Service Text Corpus

MDT-NLP-F019 Minnan Text Corpus

MDT-NLP-F008 Japanese Smart Home C&C Text Corpus

[Open-Source]

MDT-NLP-A011 Italian Chatting Corpus

MDT-NLP-A004 French Chatting Corpus

MDT-NLP-B001 Chinese Conversation Text Corpus

Why MD Datasets

Full Compliance

ISO/IEC 27001 & ISO/IEC 27701:2019 compliant

Multiple Dimension

Audio, text, image, and video multi-modal data

Extensive Scope

Conversational, scripted, and spontaneous data covering extensive domains

High Accuracy

Expertise secured quality result

TOP
Talk to Magic Data