TY - JOUR
T1 - Speech emotion verification using emotion variance modeling and discriminant scale-frequency maps
AU - Wang, Jia Ching
AU - Chin, Yu Hao
AU - Chen, Bo Wei
AU - Lin, Chang Hong
AU - Wu, Chung Hsien
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - This paper develops an approach to speech-based emotion verification based on emotion variance modeling and discriminant scale-frequency maps. The proposed system consists of two parts-feature extraction and emotion verification. In the first part, for each sound frame, important atoms from the Gabor dictionary are selected by using the matching pursuit algorithm. The scale, frequency, and magnitude of the atoms are extracted to construct a nonuniform scale-frequency map, which supports auditory discriminability by the analysis of critical bands. Next, sparse representation is used to transform scale-frequency maps into sparse coefficients to enhance the robustness against emotion variance and achieve error-tolerance improvement. In the second part, emotion verification, two scores are calculated. A novel sparse representation verification approach based on Gaussian-modeled residual errors is proposed to generate the first score from the sparse coefficients. Such a classifier can minimize emotion variance and improve recognition accuracy. The second score is calculated by using the emotional agreement index (EAI) from the same coefficients. These two scores are combined to obtain the final detection result. Experiments on an emotional database of spoken speech were conducted and indicate that the proposed approach can achieve an average equal error rate (EER) of as low as 6.61%. A comparison among different approaches reveals that the proposed method is superior to the others and confirms its feasibility.
AB - This paper develops an approach to speech-based emotion verification based on emotion variance modeling and discriminant scale-frequency maps. The proposed system consists of two parts-feature extraction and emotion verification. In the first part, for each sound frame, important atoms from the Gabor dictionary are selected by using the matching pursuit algorithm. The scale, frequency, and magnitude of the atoms are extracted to construct a nonuniform scale-frequency map, which supports auditory discriminability by the analysis of critical bands. Next, sparse representation is used to transform scale-frequency maps into sparse coefficients to enhance the robustness against emotion variance and achieve error-tolerance improvement. In the second part, emotion verification, two scores are calculated. A novel sparse representation verification approach based on Gaussian-modeled residual errors is proposed to generate the first score from the sparse coefficients. Such a classifier can minimize emotion variance and improve recognition accuracy. The second score is calculated by using the emotional agreement index (EAI) from the same coefficients. These two scores are combined to obtain the final detection result. Experiments on an emotional database of spoken speech were conducted and indicate that the proposed approach can achieve an average equal error rate (EER) of as low as 6.61%. A comparison among different approaches reveals that the proposed method is superior to the others and confirms its feasibility.
KW - Emotional speech recognition
KW - Gaussian-modeled residual error
KW - Sparse representation
KW - scale-frequency map
UR - http://www.scopus.com/inward/record.url?scp=84933508905&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2015.2438535
DO - 10.1109/TASLP.2015.2438535
M3 - 期刊論文
AN - SCOPUS:84933508905
SN - 1558-7916
VL - 23
SP - 1552
EP - 1562
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 10
M1 - 7114224
ER -