CA-Wav2Lip: Coordinate Attention-based Speech to Lip Synthesis in the Wild

Kuan Chien Wang, Jie Zhang, Jingquan Huang, Qi Li, Min Te Sun, Kazuya Sakai, Wei Shinn Ku

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

With the growing consumption of online visual contents, there is an urgent need for video translation in order to reach a wider audience from around the world. However, the materials after direct translation and dubbing are unable to create a natural audio-visual experience since the translated speech and lip movement are often out of sync. To improve the viewing experience, an accurate automatic lip-movement synchronization generation system is necessary. To improve the accuracy and visual quality of speech to lip generation, this research proposes two techniques: Embedding Attention Mechanisms in Convolution Layers and Deploying SSIM as Loss Function in Visual Quality Discriminator. The proposed system as well as several other ones are tested on three audiovisual datasets. The results show that our proposed methods achieve superior performance over the state-of-The-Art speech to lip synthesis on not only the accuracy but also the visual quality of audio-lip synchronization generation.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Smart Computing, SMARTCOMP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-8
Number of pages8
ISBN (Electronic)9798350322811
DOIs
StatePublished - 2023
Event9th IEEE International Conference on Smart Computing, SMARTCOMP 2023 - Nashville, United States
Duration: 26 Jun 202329 Jun 2023

Publication series

NameProceedings - 2023 IEEE International Conference on Smart Computing, SMARTCOMP 2023

Conference

Conference9th IEEE International Conference on Smart Computing, SMARTCOMP 2023
Country/TerritoryUnited States
CityNashville
Period26/06/2329/06/23

Keywords

  • channel attention
  • lip synthesis
  • spatial attention
  • talking face generation

Fingerprint

Dive into the research topics of 'CA-Wav2Lip: Coordinate Attention-based Speech to Lip Synthesis in the Wild'. Together they form a unique fingerprint.

Cite this