類神經網路訓練結合環境群集及專家混合系統於強健性語音辨識

Translated title of the contribution: Neural network training combines environment clustering with expert hybrid systems in robust speechrecognition (automatic recognition using neural networks, the analytics model with environment andblend of experts, in)

Chia Yung Hsu, Jia Ching Wang, Yu Tsao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, automatic speech recognition (ASR) using neural network (NN) based acoustic model (AM) has achieved significant improvements. However, the mismatch (including speaker and speaking environment) of training and testing conditions still confines the applicability of ASR. This paper proposes a novel approach that combines the environment clustering (EC) and mixture of experts (MOE) algorithms (thus the proposed approach is termed EC-MOE) to enhance the robustness of ASR against mismatches. In the offline phase, we split the entire training set into several subsets, with each subset characterizing a specific speaker and speaking environment. Then, we use each subset of training data to prepare an NN-based AM. In the online phase, we use a Gaussian mixture model (GMM)-gate to determine the optimal output from the multiple NN-based AMs to render the final recognition results. We evaluated the proposed EC-MOE approach on the Aurora 2 continuous digital speech recognition task. Comparing to the baseline system, where only a single NN-based AM is used for recognition, the proposed approach achieves a clear word error rate (WER) reduction of 5.9 % (5.25% to 4.94%).

Translated title of the contributionNeural network training combines environment clustering with expert hybrid systems in robust speechrecognition (automatic recognition using neural networks, the analytics model with environment andblend of experts, in)
Original languageChinese (Traditional)
Title of host publicationProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
EditorsSin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages136-147
Number of pages12
ISBN (Electronic)9789573079286
StatePublished - 1 Oct 2015
Event27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, Taiwan
Duration: 1 Oct 20152 Oct 2015

Publication series

NameProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
Volume2015-January

Conference

Conference27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
Country/TerritoryTaiwan
CityHsinchu
Period1/10/152/10/15

Fingerprint

Dive into the research topics of 'Neural network training combines environment clustering with expert hybrid systems in robust speechrecognition (automatic recognition using neural networks, the analytics model with environment andblend of experts, in)'. Together they form a unique fingerprint.

Cite this