摘要
This paper describes our work for the "Emotion in Music" task of MediaEval 2015. The goal of the task is predicting affective content of a song. The affective content is presented in terms of valence and arousal criterions, which are shown in a timecontinuous fashion. We adopt deep recurrent neural network (DRNN) to predict the valence and arousal for each moment of a song, and Limited-Memory-Broyden-Fletcher-Goldfarb-Shanno algorithm (LBFGS) is used to update the weights when doing back-propagation. DRNN considers the target of the previous time segments when predicting the target of the current time segment. Such time-considering manners of predictions are believed to achieve better performance in comparison of common machine learning models. We finally use the baseline feature set, adopted by the champion of last year, after comparing it with our feature set. A 10-fold cross validation evaluation is used to do the inner-experiments. The system achieves r values of -0.5904 for valence and 0.4195 for arousal. The Root-Mean-Squared Error (RMSE) for valence and arousal are 0.4054 and 0.3804, respectively. For the evaluation dataset, the system achieves r values of -0.0103+-0.3420 for valence and 0.3417+-0.2501 for arousal. The Root-Mean-Squared Error for valence and arousal are 0.3359+-0.1614 and 0.2555+-0.1255, respectively.
原文 | ???core.languages.en_GB??? |
---|---|
期刊 | CEUR Workshop Proceedings |
卷 | 1436 |
出版狀態 | 已出版 - 2015 |
事件 | Multimedia Benchmark Workshop, MediaEval 2015 - Wurzen, Germany 持續時間: 14 9月 2015 → 15 9月 2015 |