摘要
Computationally modeling the affective content of music has been intensively studied in recent years because of its wide applications in music retrieval and recommendation. Although significant progress has been made, this task remains challenging due to the difficulty in properly characterizing the emotion of a music piece. Music emotion perceived by people is subjective by nature and thus complicates the process of collecting the emotion annotations as well as developing the predictive model. Instead of assuming people can reach a consensus on the emotion of music, in this work we propose a novel machine learning approach that characterizes the music emotion as a probability distribution in the valence-Arousal (VA) emotion space, not only tackling the subjectivity but also precisely describing the emotions of a music piece. Specifically, we represent the emotion of a music piece as a probability density function (PDF) in the VA space via kernel density estimation from human annotations. To associate emotion with the audio features extracted from music pieces, we learn the combination coefficients by optimizing some objective functions of audio features, and then predict the emotion of an unseen piece by linearly combining the PDFs of the training pieces with the coefficients. Several algorithms for learning the coefficients are studied. Evaluations on the NTUMIR and MediaEval 2013 datasets validate the effectiveness of the proposed methods in predicting the probability distributions of emotion from audio features. We also demonstrate how to use the proposed approach in emotion-based music retrieval.
原文 | ???core.languages.en_GB??? |
---|---|
文章編號 | 7745959 |
頁(從 - 到) | 541-549 |
頁數 | 9 |
期刊 | IEEE Transactions on Affective Computing |
卷 | 9 |
發行號 | 4 |
DOIs | |
出版狀態 | 已出版 - 1 10月 2018 |