Discriminative training of complex-valued deep recurrent neural network for singing voice separation

Yuan Shan Lee, Kuo Yu, Sih Huei Chen, Jia Ching Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Deep neural networks (DNN) have performed impressively in the processing of multimedia signals. Most DNN-based approaches were developed to handle real-valued data; very few have been designed for complex-valued data, despite their being essential for processing various types of multimedia signal. Accordingly, this work presents a complex-valued deep recurrent neural network (C-DRNN) for singing voice separation. The C-DRNN operates on the complex-valued short-time discrete Fourier transform (STFT) domain. A key aspect of the C-DRNN is that the activations and weights are complex-valued. The goal herein is to reconstruct the singing voice and the background music from a mixed signal. For error back-propagation, ℂℝ-calculus is utilized to calculate the complex-valued gradients of the objective function. To reinforce model regularity, two constraints are incorporated into the objective function of the C-DRNN. The first is an additional masking layer that ensures the sum of separated sources equals the input mixture. The second is a discriminative term that preserves the mutual difference between two separated sources. Finally, the proposed method is evaluated using the MIR-1K dataset and a singing voice separation task. Experimental results demonstrate that the proposed method outperforms the state-of-the-art DNN-based methods.

Original languageEnglish
Title of host publicationMM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages1327-1335
Number of pages9
ISBN (Electronic)9781450349062
DOIs
StatePublished - 23 Oct 2017
Event25th ACM International Conference on Multimedia, MM 2017 - Mountain View, United States
Duration: 23 Oct 201727 Oct 2017

Publication series

NameMM 2017 - Proceedings of the 2017 ACM Multimedia Conference

Conference

Conference25th ACM International Conference on Multimedia, MM 2017
Country/TerritoryUnited States
CityMountain View
Period23/10/1727/10/17

Keywords

  • Deep neural networks
  • Phase information
  • Sing voice separation

Fingerprint

Dive into the research topics of 'Discriminative training of complex-valued deep recurrent neural network for singing voice separation'. Together they form a unique fingerprint.

Cite this