Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network

Chien Yao Wang, Jia Ching Wang, Andri Santoso, Chin Chin Chiang, Chung Hsien Wu

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


Automatic sound event recognition (SER) has recently attracted renewed interest. Although practical SER system has many useful applications in everyday life, SER is challenging owing to the variations among sounds and noises in the real-world environment. This paper presents a novel feature extraction and classification method to solve the problem of SER. An audio-visual descriptor, called the auditory-receptive-field binary pattern, is designed based on the spectrogram image feature, the cepstral features, and the human auditory receptive field model. The extracted features are then fed into a classifier to perform event classification. The proposed classifier, called the hierarchical-diving deep belief network, is a deep neural network system that hierarchically learns the discriminative characteristics from physical feature representation to the abstract concept. The performance of our proposed system was verified using several experiments under various conditions. Using the RWCP dataset, the proposed system achieved a recognition rate of 99.27% for real-world sound data in 105 categories. Under noisy conditions, the developed system is very robust, with which it achieved 95.06% recognition rate with 0 dB signal-to-noise ratio. Using the TUT sound event dataset, the proposed system achieves error rates of 0.81 and 0.73 in sound event detection in home and residential area scenes. The experimental results reveal that the proposed system outperformed the other systems in this field.

Original languageEnglish
Pages (from-to)1336-1351
Number of pages16
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Issue number8
StatePublished - Aug 2018


  • Auditory receptive fields binary patterns
  • environmental sound
  • hierarchical diving deep belief network
  • spectrogram image feature


Dive into the research topics of 'Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network'. Together they form a unique fingerprint.

Cite this