Single Channel Speech Separation using Enhanced Learning on Embedding Features

Ha Minh Tan, Jia Ching Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Speech separation has been utilized in many important applications such as automatic speech, mobile phones, hearing aids, and human-machine interactions. In particular, deep neural networks have been considered as a great potential for speech and music separation in recent years. In this paper, we propose a discriminative learning model to solve the single-channel speech separation. Firstly, deep clustering (DC) trains the embedding features. And then these features are utilized as the input for the deep neural network to directly isolate the component sources. The separation performances of the proposed model obtain 10.06 dB SDR, 16.50 dB SIR, 11.48 dB SAR, 9.06 dB SI-SNRi, 88% STOI, and 2.03 PESQ on the TSP dataset.

Original languageEnglish
Title of host publication2021 IEEE 10th Global Conference on Consumer Electronics, GCCE 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages430-431
Number of pages2
ISBN (Electronic)9781665436762
DOIs
StatePublished - 2021
Event10th IEEE Global Conference on Consumer Electronics, GCCE 2021 - Kyoto, Japan
Duration: 12 Oct 202115 Oct 2021

Publication series

Name2021 IEEE 10th Global Conference on Consumer Electronics, GCCE 2021

Conference

Conference10th IEEE Global Conference on Consumer Electronics, GCCE 2021
Country/TerritoryJapan
CityKyoto
Period12/10/2115/10/21

Keywords

  • Supervised speech separation
  • deep clustering
  • monaural source separation
  • speaker separation

Fingerprint

Dive into the research topics of 'Single Channel Speech Separation using Enhanced Learning on Embedding Features'. Together they form a unique fingerprint.

Cite this