Deep learning of chroma representation for cover song identification in compression domain

Jiunn Tsair Fang, Yu Ruey Chang, Pao Chi Chang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Methods for identifying a cover song typically involve comparing the similarity of chroma features between the query song and another song in the data set. However, considerable time is required for pairwise comparisons. In addition, to save disk space, most songs stored in the data set are in a compressed format. Therefore, to eliminate some decoding procedures, this study extracted music information directly from the modified discrete cosine transform coefficients of advanced audio coding and then mapped these coefficients to 12-dimensional chroma features. The chroma features were segmented to preserve the melodies. Each chroma feature segment was trained and learned by a sparse autoencoder, a deep learning architecture of artificial neural networks. The deep learning procedure was to transform chroma features into an intermediate representation for dimension reduction. Experimental results from a covers80 data set showed that the mean reciprocal rank increased to 0.5 and the matching time was reduced by over 94% compared with traditional approaches.

Original languageEnglish
Pages (from-to)887-902
Number of pages16
JournalMultidimensional Systems and Signal Processing
Issue number3
StatePublished - 1 Jul 2018


  • Advanced audio coding
  • Cover song
  • Descriptor
  • Music retrieval
  • Sparse autoencoder


Dive into the research topics of 'Deep learning of chroma representation for cover song identification in compression domain'. Together they form a unique fingerprint.

Cite this