Eigen-prosody analysis for robust speaker recognition under mismatch handset environment

Zi He Chen, Yuan Fu Liao, Yau Tarng Juang

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

Most speaker recognition systems utilize only low-level short-term spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust speaker recognition under mismatch environment. It converts the prosodic feature contours of a speaker's speech into sequences of prosody symbols, and then transforms the speaker recognition problem into a full text document retrieval-similar task. Experimental results on the well-known HTIMIT database have shown that, even only few training/test data is available, a remarkable improvement, about 28.7% relative error rate reduction comparing with the GMM/cepstral mean subtraction (CMS) baseline, could be achieved.

Original languageEnglish
Pages1421-1424
Number of pages4
StatePublished - 2004
Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
Duration: 4 Oct 20048 Oct 2004

Conference

Conference8th International Conference on Spoken Language Processing, ICSLP 2004
Country/TerritoryKorea, Republic of
CityJeju, Jeju Island
Period4/10/048/10/04

Fingerprint

Dive into the research topics of 'Eigen-prosody analysis for robust speaker recognition under mismatch handset environment'. Together they form a unique fingerprint.

Cite this