Latent prosody analysis for robust speaker identification

Yuan Fu Liao, Zi He Chen, Yau Tarng Juang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.

Original languageEnglish
Article number4276757
Pages (from-to)1870-1883
Number of pages14
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume15
Issue number6
DOIs
StatePublished - Aug 2007

Keywords

  • Latent prosody analysis
  • Latent semantic analysis
  • Probabilistic latent semantic analysis
  • Speaker identification
  • Speaker recognition
  • Speech prosody

Fingerprint

Dive into the research topics of 'Latent prosody analysis for robust speaker identification'. Together they form a unique fingerprint.

Cite this