Code-switched Text Data Augmentation for Chinese-English Mixed Speech Recognition

Chung Ting Lee, Teng Hui Wang, Kai Wen Liang, Phuong Thi Le, Yung Hui Li, Jia Ching Wang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

摘要

Code-switching is a common mode of language expression, which means that two or more languages are used interchangeably in a conversation. At present, the development of such code-switching technology in the field of speech recognition research is still limited by insufficient training corpus of text, which affects the system performance. This paper will use a neural network to train a generator to generate code-switching text to expand the corpus to achieve the purpose of improving the mixed recognition rate of Chinese and English. Our method is to use the Chinese and English texts in the SEAME corpus to train the BERT-BiLSTM-CRF model and use the model to know the code-switching position, generating sentences that conform to the characteristics of this corpus. The experimental results show that the method in this paper has better performance than other methods.

原文???core.languages.en_GB???
主出版物標題GCCE 2022 - 2022 IEEE 11th Global Conference on Consumer Electronics
發行者Institute of Electrical and Electronics Engineers Inc.
頁面922-923
頁數2
ISBN(電子)9781665492324
DOIs
出版狀態已出版 - 2022
事件11th IEEE Global Conference on Consumer Electronics, GCCE 2022 - Osaka, Japan
持續時間: 18 10月 202221 10月 2022

出版系列

名字GCCE 2022 - 2022 IEEE 11th Global Conference on Consumer Electronics

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???11th IEEE Global Conference on Consumer Electronics, GCCE 2022
國家/地區Japan
城市Osaka
期間18/10/2221/10/22

指紋

深入研究「Code-switched Text Data Augmentation for Chinese-English Mixed Speech Recognition」主題。共同形成了獨特的指紋。

引用此