Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network

Chien Yao Wang, Jia Ching Wang, Andri Santoso, Chin Chin Chiang, Chung Hsien Wu

研究成果: 雜誌貢獻期刊論文同行評審

29 引文 斯高帕斯(Scopus)

摘要

Automatic sound event recognition (SER) has recently attracted renewed interest. Although practical SER system has many useful applications in everyday life, SER is challenging owing to the variations among sounds and noises in the real-world environment. This paper presents a novel feature extraction and classification method to solve the problem of SER. An audio-visual descriptor, called the auditory-receptive-field binary pattern, is designed based on the spectrogram image feature, the cepstral features, and the human auditory receptive field model. The extracted features are then fed into a classifier to perform event classification. The proposed classifier, called the hierarchical-diving deep belief network, is a deep neural network system that hierarchically learns the discriminative characteristics from physical feature representation to the abstract concept. The performance of our proposed system was verified using several experiments under various conditions. Using the RWCP dataset, the proposed system achieved a recognition rate of 99.27% for real-world sound data in 105 categories. Under noisy conditions, the developed system is very robust, with which it achieved 95.06% recognition rate with 0 dB signal-to-noise ratio. Using the TUT sound event dataset, the proposed system achieves error rates of 0.81 and 0.73 in sound event detection in home and residential area scenes. The experimental results reveal that the proposed system outperformed the other systems in this field.

原文???core.languages.en_GB???
頁(從 - 到)1336-1351
頁數16
期刊IEEE/ACM Transactions on Audio Speech and Language Processing
26
發行號8
DOIs
出版狀態已出版 - 8月 2018

指紋

深入研究「Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network」主題。共同形成了獨特的指紋。

引用此