Zero-Shot Voice Conversion Based on Speaker Embedding Domain Generalization

Yi Xing Lin, Chun Hsiang Cheng, Phuong Thi Le, Bing Jhih Huang, Liao Chu-Xin, Chien Lin Huang, Jia Ching Wang

研究成果: 書貢獻/報告類型會議論文篇章同行評審

摘要

In this paper, a zero-shot voice conversion frame-work is constructed by effectively decoupling the semantic and speaker features in speech. The proposed method is based on the pre-trained wav2vec 2.0 model to extract semantic features from source speakers and a WavLM model to extract speaker features from target speakers. We propose the Robust-MAML model to map the speaker feature of the target speaker into a domain generalization space, making it directly applicable to any unregistered speaker domain. Finally, through transfer learning, the speech synthesis model FastSpeech2 integrates the semantic feature and domain-generalized speaker features to synthesize the target speaker's voice. Experimental results show that the proposed method outperforms the common baseline systems in both naturalness and speaker similarity.

原文???core.languages.en_GB???
主出版物標題Proceedings - 2023 RIVF International Conference on Computing and Communication Technologies, RIVF 2023
編輯Vo Nguyen Quoc Bao, Le Hai Chau
發行者Institute of Electrical and Electronics Engineers Inc.
頁面585-589
頁數5
ISBN(電子)9798350315844
DOIs
出版狀態已出版 - 2023
事件2023 RIVF International Conference on Computing and Communication Technologies, RIVF 2023 - Hanoi, Viet Nam
持續時間: 23 12月 202325 12月 2023

出版系列

名字Proceedings - 2023 RIVF International Conference on Computing and Communication Technologies, RIVF 2023

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???2023 RIVF International Conference on Computing and Communication Technologies, RIVF 2023
國家/地區Viet Nam
城市Hanoi
期間23/12/2325/12/23

指紋

深入研究「Zero-Shot Voice Conversion Based on Speaker Embedding Domain Generalization」主題。共同形成了獨特的指紋。

引用此