Ensemble and Multimodal Learning for Pathological Voice Classification

Whenty Ariyanti, Tassadaq Hussain, Jia Ching Wang, Chi Tei Wang, Shih Hau Fang, Yu Tsao

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Voice disorders are one of the most common medical diseases in modern society, especially for those who with occupational voice demand. In this paper, we investigate a stacked ensemble learning method to classify pathological voice disorders by combining acoustic signals and medical records. In the proposed ensemble learning framework, stacked support vector machines (SVMs) form a set of weak classifiers, and a deep neural network (DNN) acts a meta leaner. Acoustic features and medical records are combined to attain better classification performance based on the high complexity of meta learner. Results showed that the proposed approach significantly outperforms individual SVM and DNN classifiers, and showed a performance improvement over the two-stage-DNN based fusion classifier. The proposed approach achieved 89.83% accuracy and 85.84% unweighted average recall in a three-disorder classification task, confirming the effectiveness of the ensemble learning for pathological voice classification.

Original languageEnglish
JournalIEEE Sensors Letters
StateAccepted/In press - 2021


  • Acoustics
  • Medical diagnostic imaging
  • Neoplasms
  • Pathology
  • Stacking
  • Standards
  • Support vector machines
  • acoustic signal
  • binary classification
  • ensemble learning
  • pathological voice


Dive into the research topics of 'Ensemble and Multimodal Learning for Pathological Voice Classification'. Together they form a unique fingerprint.

Cite this