Code-Switching Speech Synthesis Based on Self-Supervised Learning and Domain Adaptive Speaker Encoder

Yi Xing Lin, Cheng Hsun Pai, Phuong Thi Le, Bima Prihasto, Chien Ling Huang, Jia Ching Wang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

1 引文 斯高帕斯(Scopus)

摘要

Recently, end-to-end speech synthesis models based on deep learning have made great progress in speech quality, and gradually replaced traditional speech synthesis methods into the mainstream. However, these methods are still challenging to synthesize highly natural speech. In order to solve the above problems, we introduce self-supervised learning and frame-level domain adversarial training into the speaker encoder based on the speaker verification task, so that the speaker vectors of different languages keep a consistent distribution in the speaker space, and the performance of speech synthesis is improved. In addition, we use a non-autoregressive speech synthesis model in the selection of speech synthesis model, so as to solve the problem of unnatural speech rate caused by cross-language speech synthesis. We first demonstrate that in the mixed language dataset of LibriTTS and AISHELL3, the speaker encoder trained with self-supervised representation has a 4.968% absolute EER reduction compared to the traditional MFCC on the speaker verification task, indicating that self-supervised representation has better generalization for domain-complex datasets. Then we obtain MOS scores of 3.635 and 3.675 for speech naturalness and speaker similarity in the code-switching speech synthesis task, respectively. Our approach simplifies the need to use multiple monolingual encoders to model linguistic information in the past literature, and adds frame-level domain adversarial training to optimize the speaker vectors in the speaker feature space to facilitate the code-switching speech synthesis task.

原文???core.languages.en_GB???
主出版物標題ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9781728163277
DOIs
出版狀態已出版 - 2023
事件48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
持續時間: 4 6月 202310 6月 2023

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2023-June
ISSN(列印)1520-6149

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
國家/地區Greece
城市Rhodes Island
期間4/06/2310/06/23

指紋

深入研究「Code-Switching Speech Synthesis Based on Self-Supervised Learning and Domain Adaptive Speaker Encoder」主題。共同形成了獨特的指紋。

引用此