Content-based singer classification on compressed domain audio data

Tsung Han Tsai, Yu Siang Huang, Pei Yun Liu, De Ming Chen

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it cannot be directly obtained from compressed music data such as MP3 format. We introduce a modified method for calculating MFCC vector in MP3 compressed domain. For describing the distribution of MFCC vector, the Gaussian mixture model (GMM) is applied. To find the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. The experimental result verifies the feasibility of the proposed approach.

Original languageEnglish
Pages (from-to)1489-1509
Number of pages21
JournalMultimedia Tools and Applications
Volume74
Issue number4
DOIs
StatePublished - Feb 2014

Keywords

  • GMM
  • MDCT
  • MFCC
  • MP3

Fingerprint

Dive into the research topics of 'Content-based singer classification on compressed domain audio data'. Together they form a unique fingerprint.

Cite this