Selinet: A Lightweight Model for Single Channel Speech Separation

Ha Minh Tan, Duc Quang Vu, Jia Ching Wang

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance are still the challenges for the implementation on real-time low-resource devices. In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. In bottleneck modules, the depth-wise separable convolution significantly decreases the model size and computational cost meanwhile the squeeze excitation uses a context vector to interact with the entire hidden state vector. Specifically, the atrous temporal pyramid pooling recognizes long-time sequences of various lengths and extracts context at different field-of-views. This helps SeliNet to obtain impressive performance while still maintaining the small computational cost and model size.

Fingerprint

Dive into the research topics of 'Selinet: A Lightweight Model for Single Channel Speech Separation'. Together they form a unique fingerprint.

Cite this