Latent prosody analysis for robust speaker identification

Yuan Fu Liao, Zi He Chen, Yau Tarng Juang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The concept of the LPA approach is to transform the SID problem into a full-text document retrieval-like task via 1) prosodic contour tokenization, 2) latent prosody analysis, and 3) speaker retrieval. Experimental results of the phonetically balanced, read-speech, handset-TIMIT (HTIMIT) database demonstrated that the proposed method of fusing the LPA prosodic feature-based SID systems with maximum-likelihood a priori handset knowledge interpolation (ML-AKI) spectral feature-based SID outperformed both the pitch and energy Gaussian mixture model (Pitch-GMM) and the bigram of the prosodic state (Bigram) counterparts for both cases of counting all and only unseen handsets.

Original languageEnglish
Article number4276757
Pages (from-to)1870-1883
Number of pages14
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number6
StatePublished - Aug 2007


  • Latent prosody analysis
  • Latent semantic analysis
  • Probabilistic latent semantic analysis
  • Speaker identification
  • Speaker recognition
  • Speech prosody


Dive into the research topics of 'Latent prosody analysis for robust speaker identification'. Together they form a unique fingerprint.

Cite this