TY - JOUR
T1 - EMIX
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Dang, An
AU - Vu, Toan H.
AU - Dinh Nguyen, Le
AU - Wang, Jia Ching
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In the last few years, many deep learning (DL) models have been developed to improve the accuracy of speech emotion recognition (SER). However, as SER datasets are generally small and insufficient due to their difficult and expensive collection, the DL models are prone to overfitting, so their performance is limited. In this paper, we introduce a novel data augmentation (DA) method for the SER problem, namely EMix, which is simple but effective. The method creates new data by mixing pairs of selective samples from the original data. The generated mixtures will be noisier or less ambiguous than their constructive ones. To verify the effectiveness of the proposed DA, we develop a transformer-based network for the SER task, and experiment with the two public datasets including IEMOCAP and Crema-D. The experimental results demonstrate the superiority of EMix over other DA methods. In comparison with state-of-the-art methods, our approach shows competitive performance.
AB - In the last few years, many deep learning (DL) models have been developed to improve the accuracy of speech emotion recognition (SER). However, as SER datasets are generally small and insufficient due to their difficult and expensive collection, the DL models are prone to overfitting, so their performance is limited. In this paper, we introduce a novel data augmentation (DA) method for the SER problem, namely EMix, which is simple but effective. The method creates new data by mixing pairs of selective samples from the original data. The generated mixtures will be noisier or less ambiguous than their constructive ones. To verify the effectiveness of the proposed DA, we develop a transformer-based network for the SER task, and experiment with the two public datasets including IEMOCAP and Crema-D. The experimental results demonstrate the superiority of EMix over other DA methods. In comparison with state-of-the-art methods, our approach shows competitive performance.
KW - data augmentation
KW - EMix
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85180561699&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10096789
DO - 10.1109/ICASSP49357.2023.10096789
M3 - 會議論文
AN - SCOPUS:85180561699
SN - 1520-6149
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Y2 - 4 June 2023 through 10 June 2023
ER -