CA-Wav2Lip: Coordinate Attention-based Speech to Lip Synthesis in the Wild

Kuan Chien Wang, Jie Zhang, Jingquan Huang, Qi Li, Min Te Sun, Kazuya Sakai, Wei Shinn Ku

研究成果: 書貢獻/報告類型會議論文篇章同行評審

1 引文 斯高帕斯(Scopus)

摘要

With the growing consumption of online visual contents, there is an urgent need for video translation in order to reach a wider audience from around the world. However, the materials after direct translation and dubbing are unable to create a natural audio-visual experience since the translated speech and lip movement are often out of sync. To improve the viewing experience, an accurate automatic lip-movement synchronization generation system is necessary. To improve the accuracy and visual quality of speech to lip generation, this research proposes two techniques: Embedding Attention Mechanisms in Convolution Layers and Deploying SSIM as Loss Function in Visual Quality Discriminator. The proposed system as well as several other ones are tested on three audiovisual datasets. The results show that our proposed methods achieve superior performance over the state-of-The-Art speech to lip synthesis on not only the accuracy but also the visual quality of audio-lip synchronization generation.

原文???core.languages.en_GB???
主出版物標題Proceedings - 2023 IEEE International Conference on Smart Computing, SMARTCOMP 2023
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1-8
頁數8
ISBN(電子)9798350322811
DOIs
出版狀態已出版 - 2023
事件9th IEEE International Conference on Smart Computing, SMARTCOMP 2023 - Nashville, United States
持續時間: 26 6月 202329 6月 2023

出版系列

名字Proceedings - 2023 IEEE International Conference on Smart Computing, SMARTCOMP 2023

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???9th IEEE International Conference on Smart Computing, SMARTCOMP 2023
國家/地區United States
城市Nashville
期間26/06/2329/06/23

指紋

深入研究「CA-Wav2Lip: Coordinate Attention-based Speech to Lip Synthesis in the Wild」主題。共同形成了獨特的指紋。

引用此