Audio-visual speech enhancement using hierarchical extreme learning machine

Tassadaq Hussain, Yu Tsao, Hsin Min Wang, Jia Ching Wang, Sabato Marco Siniscalchi, Wen Hung Liao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


Recently, the hierarchical extreme learning machine (HELM) model has been utilized for speech enhancement (SE) and demonstrated promising performance, especially when the amount of training data is limited and the system does not support heavy computations. Based on the success of audio-only-based systems, termed AHELM, we propose a novel audio-visual HELM-based SE system, termed AVHELM that integrates the audio and visual information to confrontate the unseen non-stationery noise problem at low SNR levels to attain improved SE performance. The experimental results demonstrate that AVHELM can yield satisfactory enhancement performance with a limited amount of training data and outperforms AHELM in terms of three standardized objective measures under matched and mismatched testing conditions, confirming the effectiveness of incorporating visual information into the HELM-based SE system.

Original languageEnglish
Title of host publicationEUSIPCO 2019 - 27th European Signal Processing Conference
PublisherEuropean Signal Processing Conference, EUSIPCO
ISBN (Electronic)9789082797039
StatePublished - Sep 2019
Event27th European Signal Processing Conference, EUSIPCO 2019 - A Coruna, Spain
Duration: 2 Sep 20196 Sep 2019

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491


Conference27th European Signal Processing Conference, EUSIPCO 2019
CityA Coruna


  • Audio-Visual
  • Hierarchical Extreme Learning Machine
  • Multi-Modal
  • Speech Enhancement


Dive into the research topics of 'Audio-visual speech enhancement using hierarchical extreme learning machine'. Together they form a unique fingerprint.

Cite this