Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition

Jia Ching Wang, Chien Yao Wang, Yu Hao Chin, Yu Ting Liu, En Ting Chen, Pao Chi Chang

研究成果: 雜誌貢獻期刊論文同行評審

19 引文 斯高帕斯(Scopus)


This paper proposes a speaker recognition system using acoustic features that are based on spectral-temporal receptive fields (STRFs). The STRF is derived from physiological models of the mammalian auditory system in the spectral-temporal domain. With the STRF, a signal is expressed by rate (in Hz) and scale (in cycles/octaves). The rate and scale are used to specify the temporal response and spectral response, respectively. This paper uses the proposed STRF based feature to perform speaker recognition. First, the energy of each scale is calculated using the STRF representation. A logarithmic operation is then applied to the scale energies. Finally, a discrete cosine transform is utilized to the generation of the proposed STRF feature. This paper also presents a feature set that combines the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs). The support vector machines (SVMs) are adopted to be the speaker classifiers. To evaluate the performance of the proposed speaker recognition system, experiments on 36-speaker recognition were conducted. Comparing with the MFCC baseline, the proposed feature set increases the speaker recognition rates by 3.85 % and 18.49 % on clean and noisy speeches, respectively. The experiments results demonstrate the effectiveness of adopting STRF based feature in speaker recognition.

頁(從 - 到)4055-4068
期刊Multimedia Tools and Applications
出版狀態已出版 - 1 2月 2017


深入研究「Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition」主題。共同形成了獨特的指紋。