Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction

An Dang, Toan H. Vu, Jia Ching Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

Audio scenes are often composed of a variety of sound events from different sources. Their content exhibits wide variations in both frequency and time domain. Convolutional neural networks (CNNs) provide an effective way to extract spatial information of multidimensional data such as image, audio, and video; they have the ability to learn hierarchical representation from time-frequency features of audio signals. In this paper, we develop a convolutional neural network and employ a multi-scale multi-feature extraction methods for acoustic scene classification. We conduct experiments on the TUT Acoustic Scenes 2016 dataset. Experimental results show that the use of multi-scale multi-feature extraction methods improves significantly the performance of the system. Our proposed approach obtains a high accuracy of 85.9% that outperforms the baseline approach by a large margin of 8.7%.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Consumer Electronics, ICCE 2018
EditorsSaraju P. Mohanty, Peter Corcoran, Hai Li, Anirban Sengupta, Jong-Hyouk Lee
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-4
Number of pages4
ISBN (Electronic)9781538630259
DOIs
StatePublished - 26 Mar 2018
Event2018 IEEE International Conference on Consumer Electronics, ICCE 2018 - Las Vegas, United States
Duration: 12 Jan 201814 Jan 2018

Publication series

Name2018 IEEE International Conference on Consumer Electronics, ICCE 2018
Volume2018-January

Conference

Conference2018 IEEE International Conference on Consumer Electronics, ICCE 2018
Country/TerritoryUnited States
CityLas Vegas
Period12/01/1814/01/18

Fingerprint

Dive into the research topics of 'Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction'. Together they form a unique fingerprint.

Cite this