Joint prosodic and spectral modeling for robust speaker verification

Yuan Fu Liao, Wen Chieh Chang, Zong You Xie, Ding Yun Zeng, Yau Tarng Juang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


In this paper, a joint prosodic and spectral modeling framework is proposed instead of traditional score-domain fusion approaches to alleviate the problem of mismatch channel/handset/ambient noise. The basic idea is to embed the concept of hierarchical structure of speech prosody into an ergodic HMM (EHMM), and model the prosodic status transitions and prosodic/spectral features by EHMM's states, state transition probabilities and state-dependent observation distributions, respectively. Experimental results evaluated on the standard single-speaker detection task of NIST 2001 speaker recognition evaluation (NIST-SRE 2001) showed that the proposed approach not only outperformed the spectral feature-based baseline (8.04% vs. 8.64% in equal error rate, EER) but also worked a little bit better than score-domain fusion (8.44%) approach.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Speech Prosody, SP 2008
PublisherInternational Speech Communications Association
Number of pages4
ISBN (Print)9780616220030
StatePublished - 2008
Event4th International Conference on Speech Prosody 2008, SP 2008 - Campinas, Brazil
Duration: 6 May 20089 May 2008

Publication series

NameProceedings of the 4th International Conference on Speech Prosody, SP 2008


Conference4th International Conference on Speech Prosody 2008, SP 2008


Dive into the research topics of 'Joint prosodic and spectral modeling for robust speaker verification'. Together they form a unique fingerprint.

Cite this