Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

Jia Ching Wang, Chien Yao Wang, Yu Hao Chin, Yu Ting Liu, En Ting Chen, Pao Chi Chang

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.

Original languageEnglish
Pages (from-to)4055-4068
Number of pages14
JournalMultimedia Tools and Applications
Volume76
Issue number3
DOIs
StatePublished - 1 Feb 2017

Keywords

  • Feature extraction
  • STRF
  • Speaker authentication
  • Speaker recognition

Fingerprint

Dive into the research topics of 'Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition'. Together they form a unique fingerprint.

Cite this