摘要
Most speaker recognition systems utilize only low-level short-term spectral features and ignore high-level long-term information, such as prosody and speaking style. This paper presents a novel eigen-prosody analysis (EPA) approach to capture long-term prosodic information of a speaker for robust speaker recognition under mismatch environment. It converts the prosodic feature contours of a speaker's speech into sequences of prosody symbols, and then transforms the speaker recognition problem into a full text document retrieval-similar task. Experimental results on the well-known HTIMIT database have shown that, even only few training/test data is available, a remarkable improvement, about 28.7% relative error rate reduction comparing with the GMM/cepstral mean subtraction (CMS) baseline, could be achieved.
原文 | ???core.languages.en_GB??? |
---|---|
頁面 | 1421-1424 |
頁數 | 4 |
出版狀態 | 已出版 - 2004 |
事件 | 8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of 持續時間: 4 10月 2004 → 8 10月 2004 |
???event.eventtypes.event.conference???
???event.eventtypes.event.conference??? | 8th International Conference on Spoken Language Processing, ICSLP 2004 |
---|---|
國家/地區 | Korea, Republic of |
城市 | Jeju, Jeju Island |
期間 | 4/10/04 → 8/10/04 |