Unsupervised deep learning is currently a research topic that is highly concernedby the international academic community. It can effectively use unlabeled data formore effective model training. Since unsupervised deep learning is a fairly newtopic, the current research on speech recognition based on unsupervised deeplearning is still not mature. The purpose of this project is to study how to useunsupervised deep learning to recognize code-switching speech. The plan of thisproject is mainly divided into the following three stages: research onunsupervised acoustic models and training technologies for pre-training, researchon multilingual speech recognition models based on unsupervised pre-training,research on noisy student and code-switching speech recognition technology formultilingual speech recognition.In the first year of this project, unsupervised acoustic models and trainingtechniques for pre-training are developed. We propose a training target methodthat integrates APC and CPC, using the future results predicted by APC as futureinformation for CPC comparison, hoping to reduce the dependence of theacoustic model on future information, thereby improving the robustness of themodel.In the second year of this project, a multilingual speech recognition model basedon unsupervised pre-training will be developed. In addition to using theunsupervised pre-training acoustic model developed in the first year as apretraining model, we also propose a separate training method based on multi-task learning for multilingual speech recognition tasks. The learning objectivesinclude recognizing phoneme information and token-level language categoryrecognition, and using the multilingual phoneme-to-grapheme model to performphoneme conversion according to the language to obtain the output of eachlanguage.In the third year of this project, we plan to develop code-switching speechrecognition technology based on noisy student and multilingual speechrecognition. We use a small amount of code-switching speech recognitioncorpus, and based on the multilingual speech recognition model established inthe second year, combined with noisy student's training method to establish usthe final code-switching speech recognition model. In addition, we understandthat language models play an important role in noisy student’s method, so weadditionally develop methods based on attention-weight, and develop large-scalecode-switching text automatic generation technology. This way helps codeswitchinglanguage model training and noisy student's training to provide betterrecognition results.
Status | Finished |
---|
Effective start/end date | 1/08/21 → 31/07/22 |
---|
In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):