Lip-based visual speech recognition system

Aufaclav Zatu Kusuma Frisky, Chien Yao Wang, Andri Santoso, Jia Ching Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations


This paper proposes a system to address the problem of visual speech recognition. The proposed system is based on visual lip movement recognition by applying video content analysis technique. Using spatiotemporal features descriptors, we extracted features from video containing visual lip information. A preprocessing step is employed by removing the noise and enhancing the contrast of images in every frames of video. Extracted feature are used to build a dictionary for kernel sparse representation classifier (K-SRC) in the classification step. We adopted non-negative matrix factorization (NMF) method to reduce the dimensionality of the extracted features. We evaluated the performance of our system using AVLetters and AVLetters2 dataset. To evaluate the performance of our system, we used the same configuration as another previous works. Using AVLetters dataset, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent, respectively. Using AVLetters2 dataset, our method can achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent. This result showed that our proposed method outperforms another methods using same configuration.

Original languageEnglish
Title of host publicationICCST 2015 - The 49th Annual IEEE International Carnahan Conference on Security Technology
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781479986910
StatePublished - 21 Jan 2016
Event49th Annual IEEE International Carnahan Conference on Security Technology, ICCST 2015 - Taipei, Taiwan
Duration: 21 Sep 201524 Sep 2015

Publication series

NameProceedings - International Carnahan Conference on Security Technology
ISSN (Print)1071-6572


Conference49th Annual IEEE International Carnahan Conference on Security Technology, ICCST 2015


  • kernel sparse representation classifier
  • non-negative matrix factorization
  • spatiotemporal descriptor
  • visual speech recognition


Dive into the research topics of 'Lip-based visual speech recognition system'. Together they form a unique fingerprint.

Cite this