A dynamic time warping approach for handling class imbalanced medical datasets with missing values: A case study of protein localization site prediction

Ling Chien Hung, Ya Han Hu, Chih Fong Tsai, Min Wei Huang

Research output: Contribution to journalArticlepeer-review

Abstract

Class imbalanced medical datasets, such as cancer prediction, contain imbalanced numbers of data in different classes leading to skewed class distribution, which makes it very difficult for a classifier to distinguish between minority (i.e. cancer) and majority (i.e. non-cancer) classes. Related studies in the literature have proposed different types of solutions for the class imbalance problem including data level, algorithmic level, and cost-sensitive learning approaches. However, none of these potential solutions have considered the issue of missing attribute values residing in the class imbalanced medical datasets, especially for the minority class. Missing value imputation is commonly used for the construction of some models where statistical or machine learning techniques are used to produce estimations to replace the missing values. However, the existing imputation methods require a certain number of observed data to produce their estimations, the major challenge for them being that the amount of observed data (with no missing values) in the minority class is very limited, or that some data are not complete. In this paper, we proposed a novel approach, namely Dynamic Time Warping-based Imputation (DTWI), to handle class imbalanced datasets with missing values. Based on the similarity measurement technique of DTW, all of the data (with or without missing values) in the minority class can be used for missing value imputation. The experimental results based on 10 different class imbalanced medical datasets show that when the missing rates in the minority classes are smaller than 30%, DTWI performs similarly to the baseline K-NN imputation method and better than the mean/mode imputation and case deletion methods. When the missing rates are larger than 30%, DTWI significantly outperform the other techniques.

Original languageEnglish
Article number116437
JournalExpert Systems with Applications
Volume192
DOIs
StatePublished - 15 Apr 2022

Keywords

  • Class imbalance
  • Data mining
  • Dynamic time warping
  • Machine learning
  • Missing value imputation

Fingerprint

Dive into the research topics of 'A dynamic time warping approach for handling class imbalanced medical datasets with missing values: A case study of protein localization site prediction'. Together they form a unique fingerprint.

Cite this