An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching

Tai Ming Chang, Chia Bin Hsieh, Pao Chi Chang

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

With the explosive growth in the number of music albums produced, retrieving music information has become a critical aspect of managing music data. Extracting frequency parameters directly from the compressed files to represent music greatly benefits processing speed when working on a large database. In this study, we focused on advanced audio coding (AAC) files and analyzed the disparity in frequency expression between discrete Fourier transform and discrete cosine transform, considered the frequency resolution to select the appropriate frequency range, and developed a direct chroma feature-transformation method in the AAC transform domain. An added challenge to using AAC files directly is long/short window switching, ignoring which may result in inaccurate frequency mapping and inefficient information retrieval. For a short window in particular, we propose a peak-competition method to enhance the pitch information that does not include ambiguous frequency components when combining eight subframes. Moreover, for chroma feature segmentation, we propose a simple dynamic-segmentation method to replace the complex computation of beat tracking. Our experimental results show that the proposed method increased the accuracy rate by approximately 7 % in Top-1 search results over transform-domain methods described previously and performed nearly as effectively as state-of-the-art waveform-domain approaches did.

Original languageEnglish
Pages (from-to)7921-7942
Number of pages22
JournalMultimedia Tools and Applications
Volume74
Issue number18
DOIs
StatePublished - 28 Sep 2015

Keywords

  • AAC
  • Audio coding
  • Chroma feature
  • Music information retrieval
  • Transform domain

Fingerprint

Dive into the research topics of 'An enhanced direct chord transformation for music retrieval in the AAC transform domain with window switching'. Together they form a unique fingerprint.

Cite this