Sound Event Localization and Detection Based on Time-Frequency Separable Convolutional Compression Network

Shih Tsung Yang, Fong Ci Jhou, Jia Ching Wang, Pao Chi Chang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

This work proposes a Time-Frequency Separable Convolutional Compression Network (TFSCCN) as a system architecture for sound event localization and detection. It utilizes 1-D convolution kernels of different dimensions to extract features of time and frequency components separately, and also reduces the amount of model parameters by controlling the increase or decrease of the number of channels in the neural network. In addition, the model combines multi-head self-attention (MHSA) to obtain global and local information in time series features, and uses dual-branch tracking technology to effectively locate and detect the same or different overlapping sound events.

Original languageEnglish
Title of host publication2021 IEEE 10th Global Conference on Consumer Electronics, GCCE 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages432-433
Number of pages2
ISBN (Electronic)9781665436762
DOIs
StatePublished - 2021
Event10th IEEE Global Conference on Consumer Electronics, GCCE 2021 - Kyoto, Japan
Duration: 12 Oct 202115 Oct 2021

Publication series

Name2021 IEEE 10th Global Conference on Consumer Electronics, GCCE 2021

Conference

Conference10th IEEE Global Conference on Consumer Electronics, GCCE 2021
Country/TerritoryJapan
CityKyoto
Period12/10/2115/10/21

Keywords

  • dual-branch tracking
  • multi-head self-attention
  • sound event localization and detection
  • time-frequency separable convolutional compression network

Fingerprint

Dive into the research topics of 'Sound Event Localization and Detection Based on Time-Frequency Separable Convolutional Compression Network'. Together they form a unique fingerprint.

Cite this