A dynamic time warping approach for handling class imbalanced medical datasets with missing values: A case study of protein localization site prediction

Ling Chien Hung, Ya Han Hu, Chih Fong Tsai, Min Wei Huang

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Class imbalanced medical datasets, such as cancer prediction, contain imbalanced numbers of data in different classes leading to skewed class distribution, which makes it very difficult for a classifier to distinguish between minority (i.e. cancer) and majority (i.e. non-cancer) classes. Related studies in the literature have proposed different types of solutions for the class imbalance problem including data level, algorithmic level, and cost-sensitive learning approaches. However, none of these potential solutions have considered the issue of missing attribute values residing in the class imbalanced medical datasets, especially for the minority class. Missing value imputation is commonly used for the construction of some models where statistical or machine learning techniques are used to produce estimations to replace the missing values. However, the existing imputation methods require a certain number of observed data to produce their estimations, the major challenge for them being that the amount of observed data (with no missing values) in the minority class is very limited, or that some data are not complete. In this paper, we proposed a novel approach, namely Dynamic Time Warping-based Imputation (DTWI), to handle class imbalanced datasets with missing values. Based on the similarity measurement technique of DTW, all of the data (with or without missing values) in the minority class can be used for missing value imputation. The experimental results based on 10 different class imbalanced medical datasets show that when the missing rates in the minority classes are smaller than 30%, DTWI performs similarly to the baseline K-NN imputation method and better than the mean/mode imputation and case deletion methods. When the missing rates are larger than 30%, DTWI significantly outperform the other techniques.

原文???core.languages.en_GB???
文章編號116437
期刊Expert Systems with Applications
192
DOIs
出版狀態已出版 - 15 4月 2022

指紋

深入研究「A dynamic time warping approach for handling class imbalanced medical datasets with missing values: A case study of protein localization site prediction」主題。共同形成了獨特的指紋。

引用此