Abstract
Most speaker recognition systems utilize only low-level short-term spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust speaker recognition under mismatch environment. It converts the prosodic feature contours of a speaker's speech into sequences of prosody symbols, and then transforms the speaker recognition problem into a full text document retrieval-similar task. Experimental results on the well-known HTIMIT database have shown that, even only few training/test data is available, a remarkable improvement, about 28.7% relative error rate reduction comparing with the GMM/cepstral mean subtraction (CMS) baseline, could be achieved.
Original language | English |
---|---|
Pages | 1421-1424 |
Number of pages | 4 |
State | Published - 2004 |
Event | 8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of Duration: 4 Oct 2004 → 8 Oct 2004 |
Conference
Conference | 8th International Conference on Spoken Language Processing, ICSLP 2004 |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju, Jeju Island |
Period | 4/10/04 → 8/10/04 |