TY - JOUR
T1 - Selinet
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Tan, Ha Minh
AU - Vu, Duc Quang
AU - Wang, Jia Ching
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance are still the challenges for the implementation on real-time low-resource devices. In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. In bottleneck modules, the depth-wise separable convolution significantly decreases the model size and computational cost meanwhile the squeeze excitation uses a context vector to interact with the entire hidden state vector. Specifically, the atrous temporal pyramid pooling recognizes long-time sequences of various lengths and extracts context at different field-of-views. This helps SeliNet to obtain impressive performance while still maintaining the small computational cost and model size.
AB - The time-domain speech separation methods adopting deep learning have obtained impressive performance. However, the computational complexity, model size, and performance are still the challenges for the implementation on real-time low-resource devices. In this paper, we introduce a lightweight yet effective network for speech separation, namely SeliNet. The SeliNet is the one-dimensional convolutional architecture that employs bottleneck modules, and atrous temporal pyramid pooling. In bottleneck modules, the depth-wise separable convolution significantly decreases the model size and computational cost meanwhile the squeeze excitation uses a context vector to interact with the entire hidden state vector. Specifically, the atrous temporal pyramid pooling recognizes long-time sequences of various lengths and extracts context at different field-of-views. This helps SeliNet to obtain impressive performance while still maintaining the small computational cost and model size.
UR - http://www.scopus.com/inward/record.url?scp=85180419148&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10097121
DO - 10.1109/ICASSP49357.2023.10097121
M3 - 會議論文
AN - SCOPUS:85180419148
SN - 1520-6149
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Y2 - 4 June 2023 through 10 June 2023
ER -