Joint prosodic and spectral modeling for robust speaker verification

Yuan Fu Liao, Wen Chieh Chang, Zong You Xie, Ding Yun Zeng, Yau Tarng Juang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, a joint prosodic and spectral modeling framework is proposed instead of traditional score-domain fusion approaches to alleviate the problem of mismatch channel/handset/ambient noise. The basic idea is to embed the concept of hierarchical structure of speech prosody into an ergodic HMM (EHMM), and model the prosodic status transitions and prosodic/spectral features by EHMM's states, state transition probabilities and state-dependent observation distributions, respectively. Experimental results evaluated on the standard single-speaker detection task of NIST 2001 speaker recognition evaluation (NIST-SRE 2001) showed that the proposed approach not only outperformed the spectral feature-based baseline (8.04% vs. 8.64% in equal error rate, EER) but also worked a little bit better than score-domain fusion (8.44%) approach.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Speech Prosody, SP 2008
PublisherInternational Speech Communications Association
Pages143-146
Number of pages4
ISBN (Print)9780616220030
StatePublished - 2008
Event4th International Conference on Speech Prosody 2008, SP 2008 - Campinas, Brazil
Duration: 6 May 20089 May 2008

Publication series

NameProceedings of the 4th International Conference on Speech Prosody, SP 2008

Conference

Conference4th International Conference on Speech Prosody 2008, SP 2008
Country/TerritoryBrazil
CityCampinas
Period6/05/089/05/08

Fingerprint

Dive into the research topics of 'Joint prosodic and spectral modeling for robust speaker verification'. Together they form a unique fingerprint.

Cite this