EMIX: A Data Augmentation Method for Speech Emotion Recognition

An Dang, Toan H. Vu, Le Dinh Nguyen, Jia Ching Wang

研究成果: 雜誌貢獻會議論文同行評審

1 引文 斯高帕斯(Scopus)


In the last few years, many deep learning (DL) models have been developed to improve the accuracy of speech emotion recognition (SER). However, as SER datasets are generally small and insufficient due to their difficult and expensive collection, the DL models are prone to overfitting, so their performance is limited. In this paper, we introduce a novel data augmentation (DA) method for the SER problem, namely EMix, which is simple but effective. The method creates new data by mixing pairs of selective samples from the original data. The generated mixtures will be noisier or less ambiguous than their constructive ones. To verify the effectiveness of the proposed DA, we develop a transformer-based network for the SER task, and experiment with the two public datasets including IEMOCAP and Crema-D. The experimental results demonstrate the superiority of EMix over other DA methods. In comparison with state-of-the-art methods, our approach shows competitive performance.


深入研究「EMIX: A Data Augmentation Method for Speech Emotion Recognition」主題。共同形成了獨特的指紋。