TY - JOUR
T1 - GMM-Based Speaker Verification System with Hardware MFCC in SoC Design
AU - Tsai, Tsung Han
AU - Wang, Chiao Li
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
PY - 2024/6
Y1 - 2024/6
N2 - In recent years, speaker verification has been extensively explored and has significantly improved its effectiveness. It analyzes the voiceprint characteristics of speakers and finds out the differences in voiceprint characteristics between speakers for verification. In this paper, we propose a text-dependent speaker verification system and its hardware implementation of the feature extraction. The proposed speaker verification system includes two phases: enrollment and verification. In the enrollment phase, the speaker has to provide appropriate speech, such as continuous number strings, sentences, or phrases for building the speakers’ models in the system. In the verification phase, the verified speech is substituted into the enrolled speaker models, and the similarity between the speech and the models is used to discriminate. We further design the whole system in a system-on-a-chip (SoC). We focus on the Mel-frequency cepstral coefficients (MFCCs) pre-processing module on FPGA and implement the lightweight the post-processing models such as Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) in software. A piece of speech data can be processed in 53.6ms to meet the real-time way. The proposed speaker verification system has a 93.3% accuracy rate. The overall architecture consumes only 4.26W on Xilinx ZCU104. Moreover, the proposed MFCC chip was implemented in TSMC 90nm, and the gate count is 276k at 1 volt while power consumption is 41.15 mW with a 200 MHz operating frequency.
AB - In recent years, speaker verification has been extensively explored and has significantly improved its effectiveness. It analyzes the voiceprint characteristics of speakers and finds out the differences in voiceprint characteristics between speakers for verification. In this paper, we propose a text-dependent speaker verification system and its hardware implementation of the feature extraction. The proposed speaker verification system includes two phases: enrollment and verification. In the enrollment phase, the speaker has to provide appropriate speech, such as continuous number strings, sentences, or phrases for building the speakers’ models in the system. In the verification phase, the verified speech is substituted into the enrolled speaker models, and the similarity between the speech and the models is used to discriminate. We further design the whole system in a system-on-a-chip (SoC). We focus on the Mel-frequency cepstral coefficients (MFCCs) pre-processing module on FPGA and implement the lightweight the post-processing models such as Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) in software. A piece of speech data can be processed in 53.6ms to meet the real-time way. The proposed speaker verification system has a 93.3% accuracy rate. The overall architecture consumes only 4.26W on Xilinx ZCU104. Moreover, the proposed MFCC chip was implemented in TSMC 90nm, and the gate count is 276k at 1 volt while power consumption is 41.15 mW with a 200 MHz operating frequency.
KW - FPGA
KW - Gaussian Mixture Model
KW - Hidden Markov Model
KW - Mel-frequency cepstral coefficients
KW - Speaker verification
UR - http://www.scopus.com/inward/record.url?scp=85179734680&partnerID=8YFLogxK
U2 - 10.1007/s11042-023-17561-6
DO - 10.1007/s11042-023-17561-6
M3 - 期刊論文
AN - SCOPUS:85179734680
SN - 1380-7501
VL - 83
SP - 56991
EP - 57010
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 19
ER -