Incorporating local environment information with ensemble neural networks to robust automatic speech recognition

Chia Yung Hsu, Ryandhimas E. Zezario, Jia Ching Wang, Chin Wen Ho, Xugang Lu, Yu Tsao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes an ensemble neural network (ENN) framework for robust automatic speech recognition (ASR). The proposed ENN framework can be divided into offline and online phases. In the offline phase, the ENN framework first applies an environment clustering technique to partition the training data into several subsets, where each subset characterizes specific local information of the entire acoustic space. Next, each subset of training data is adopted to train an NN acoustic model. Finally, the entire set of training data is used to estimate a gating function, which can determine the most suitable NN acoustic model given an input utterance. In the online phase, given the testing utterance, the gating function specifies the optimal NN acoustic model to perform speech recognition. Because local environment information is incorporated, ENN can effectively determine the NN acoustic model that optimally matches the testing condition. The proposed framework was evaluated on the Aurora-2 task. Experimental results show that the proposed ENN framework can provide a notable word error rate reduction of 5.35% (from 5.05% to 4.78%) when compared to the baseline.

Original languageEnglish
Title of host publicationProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
EditorsHsin-Min Wang, Qingzhi Hou, Yuan Wei, Tan Lee, Jianguo Wei, Lei Xie, Hui Feng, Jianwu Dang, Jianwu Dang
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509042937
DOIs
StatePublished - 2 May 2017
Event10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 - Tianjin, China
Duration: 17 Oct 201620 Oct 2016

Publication series

NameProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

Conference

Conference10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Country/TerritoryChina
CityTianjin
Period17/10/1620/10/16

Keywords

  • Ensemble neural network
  • Environment clustering
  • Mixture of local experts
  • Robust ASR

Fingerprint

Dive into the research topics of 'Incorporating local environment information with ensemble neural networks to robust automatic speech recognition'. Together they form a unique fingerprint.

Cite this