TY - JOUR
T1 - Local Wavelet Acoustic Pattern
T2 - A Novel Time-Frequency Descriptor for Birdsong Recognition
AU - Hsu, Sheng Bin
AU - Lee, Chang Hsing
AU - Chang, Pei Chun
AU - Han, Chin Chuan
AU - Fan, Kuo Chin
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12
Y1 - 2018/12
N2 - Investigating the identity, distribution, and evolution of bird species is important for both biodiversity assessment and environmental conservation. The discrete wavelet transform (DWT) has been widely exploited to extract time-frequency features for acoustic signal analysis. Traditional approaches usually compute statistical measures (e.g., maximum, mean, standard deviation) of the DWT coefficients in each subband independently to yield the feature descriptor, without considering the intersubband correlation. A new acoustic descriptor, called the local wavelet acoustic pattern (LWAP), is proposed to characterize the correlation of the DWT coefficients in different subbands for birdsong recognition. First, we divide a variable-length birdsong segment into a number of fixed-duration texture windows. For each texture window, several LWAP descriptors are extracted. The vector of locally aggregated descriptors (VLAD) is then used to aggregate the set of LWAP descriptors into a single VLAD vector. Finally, principal component analysis (PCA) plus linear discriminant analysis (LDA) are employed to reduce the feature dimensionality for classification purposes. Experiments on two birdsong datasets show that the proposed LWAP descriptor outperforms other local descriptors, including linear predictive coding cepstral coefficients, Mel-frequency cepstral coefficients, perceptual linear prediction cepstral coefficients, chroma features, and prosody features. Furthermore, the proposed LWAP descriptor, followed by VLAD encoding, PCA plus LDA feature extraction, and a simple distance-based classifier, yields promising results that are competitive with those obtained by the state-of-the-art convolutional neural networks.
AB - Investigating the identity, distribution, and evolution of bird species is important for both biodiversity assessment and environmental conservation. The discrete wavelet transform (DWT) has been widely exploited to extract time-frequency features for acoustic signal analysis. Traditional approaches usually compute statistical measures (e.g., maximum, mean, standard deviation) of the DWT coefficients in each subband independently to yield the feature descriptor, without considering the intersubband correlation. A new acoustic descriptor, called the local wavelet acoustic pattern (LWAP), is proposed to characterize the correlation of the DWT coefficients in different subbands for birdsong recognition. First, we divide a variable-length birdsong segment into a number of fixed-duration texture windows. For each texture window, several LWAP descriptors are extracted. The vector of locally aggregated descriptors (VLAD) is then used to aggregate the set of LWAP descriptors into a single VLAD vector. Finally, principal component analysis (PCA) plus linear discriminant analysis (LDA) are employed to reduce the feature dimensionality for classification purposes. Experiments on two birdsong datasets show that the proposed LWAP descriptor outperforms other local descriptors, including linear predictive coding cepstral coefficients, Mel-frequency cepstral coefficients, perceptual linear prediction cepstral coefficients, chroma features, and prosody features. Furthermore, the proposed LWAP descriptor, followed by VLAD encoding, PCA plus LDA feature extraction, and a simple distance-based classifier, yields promising results that are competitive with those obtained by the state-of-the-art convolutional neural networks.
KW - Birdsong recognition
KW - discrete wavelet transform (DWT)
KW - vector of locally aggregated descriptors (VLAD)
UR - http://www.scopus.com/inward/record.url?scp=85046824169&partnerID=8YFLogxK
U2 - 10.1109/TMM.2018.2834866
DO - 10.1109/TMM.2018.2834866
M3 - 期刊論文
AN - SCOPUS:85046824169
SN - 1520-9210
VL - 20
SP - 3187
EP - 3199
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 12
M1 - 8356630
ER -