Selinet: A Lightweight Model for Single Channel Speech Separation

Ha Minh Tan, Duc Quang Vu, Jia Ching Wang

研究成果: 雜誌貢獻會議論文同行評審

4 引文 斯高帕斯(Scopus)


The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance are still the challenges for the implementation on real-time low-resource devices. In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. In bottleneck modules, the depth-wise separable convolution significantly decreases the model size and computational cost meanwhile the squeeze excitation uses a context vector to interact with the entire hidden state vector. Specifically, the atrous temporal pyramid pooling recognizes long-time sequences of various lengths and extracts context at different field-of-views. This helps SeliNet to obtain impressive performance while still maintaining the small computational cost and model size.


深入研究「Selinet: A Lightweight Model for Single Channel Speech Separation」主題。共同形成了獨特的指紋。