EMIX: A Data Augmentation Method for Speech Emotion Recognition

An Dang, Toan H. Vu, Le Dinh Nguyen, Jia Ching Wang

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

In the last few years, many deep learning (DL) models have been developed to improve the accuracy of speech emotion recognition (SER). However, as SER datasets are generally small and insufficient due to their difficult and expensive collection, the DL models are prone to overfitting, so their performance is limited. In this paper, we introduce a novel data augmentation (DA) method for the SER problem, namely EMix, which is simple but effective. The method creates new data by mixing pairs of selective samples from the original data. The generated mixtures will be noisier or less ambiguous than their constructive ones. To verify the effectiveness of the proposed DA, we develop a transformer-based network for the SER task, and experiment with the two public datasets including IEMOCAP and Crema-D. The experimental results demonstrate the superiority of EMix over other DA methods. In comparison with state-of-the-art methods, our approach shows competitive performance.

Keywords

  • data augmentation
  • EMix
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'EMIX: A Data Augmentation Method for Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this