Exploiting glottal and prosodic information for robust speaker verification

Yuan Fu Liao, Zhi Ren Zeng, Zi He Chen, Yau Tarng Juang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, three different levels of speaker cues including the glottal, prosodic and spectral information are integrated together to build a robust speaker verification system. The major purpose is to resist the distortion of channels and handsets. Especially, the dynamic behavior of normalized amplitude quotient (NAQ) and prosodic feature contours are modeled using Gaussian of mixture models (GMMs) and two latent prosody analyses (LPAs)-based approaches, respectively. The proposed methods are evaluated on the standard one speaker detection task of the 2001 NIST Speaker Recognition Evaluation Corpus where only one 2-minute training and 30-second trial speech (in average) are available. Experimental results have shown that the proposed approach could improve the equal error rates (EERs) of maximum a priori-adapted (MAP)-GMMs and GMMs+T-norm approaches from 12.4% and 9.5% to 10.3% and 8.3% and finally to 7.8%, respectively.

Original languageEnglish
Title of host publication3rd International Conference on Speech Prosody 2006
EditorsR. Hoffmann, H. Mixdorff
PublisherInternational Speech Communications Association
ISBN (Electronic)9780000000002
StatePublished - 2006
Event3rd International Conference on Speech Prosody, SP 2006 - Dresden, Germany
Duration: 2 May 20065 May 2006

Publication series

NameProceedings of the International Conference on Speech Prosody
ISSN (Print)2333-2042

Conference

Conference3rd International Conference on Speech Prosody, SP 2006
Country/TerritoryGermany
CityDresden
Period2/05/065/05/06

Fingerprint

Dive into the research topics of 'Exploiting glottal and prosodic information for robust speaker verification'. Together they form a unique fingerprint.

Cite this