TY - JOUR
T1 - A dynamic time warping approach for handling class imbalanced medical datasets with missing values
T2 - A case study of protein localization site prediction
AU - Hung, Ling Chien
AU - Hu, Ya Han
AU - Tsai, Chih Fong
AU - Huang, Min Wei
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2022/4/15
Y1 - 2022/4/15
N2 - Class imbalanced medical datasets, such as cancer prediction, contain imbalanced numbers of data in different classes leading to skewed class distribution, which makes it very difficult for a classifier to distinguish between minority (i.e. cancer) and majority (i.e. non-cancer) classes. Related studies in the literature have proposed different types of solutions for the class imbalance problem including data level, algorithmic level, and cost-sensitive learning approaches. However, none of these potential solutions have considered the issue of missing attribute values residing in the class imbalanced medical datasets, especially for the minority class. Missing value imputation is commonly used for the construction of some models where statistical or machine learning techniques are used to produce estimations to replace the missing values. However, the existing imputation methods require a certain number of observed data to produce their estimations, the major challenge for them being that the amount of observed data (with no missing values) in the minority class is very limited, or that some data are not complete. In this paper, we proposed a novel approach, namely Dynamic Time Warping-based Imputation (DTWI), to handle class imbalanced datasets with missing values. Based on the similarity measurement technique of DTW, all of the data (with or without missing values) in the minority class can be used for missing value imputation. The experimental results based on 10 different class imbalanced medical datasets show that when the missing rates in the minority classes are smaller than 30%, DTWI performs similarly to the baseline K-NN imputation method and better than the mean/mode imputation and case deletion methods. When the missing rates are larger than 30%, DTWI significantly outperform the other techniques.
AB - Class imbalanced medical datasets, such as cancer prediction, contain imbalanced numbers of data in different classes leading to skewed class distribution, which makes it very difficult for a classifier to distinguish between minority (i.e. cancer) and majority (i.e. non-cancer) classes. Related studies in the literature have proposed different types of solutions for the class imbalance problem including data level, algorithmic level, and cost-sensitive learning approaches. However, none of these potential solutions have considered the issue of missing attribute values residing in the class imbalanced medical datasets, especially for the minority class. Missing value imputation is commonly used for the construction of some models where statistical or machine learning techniques are used to produce estimations to replace the missing values. However, the existing imputation methods require a certain number of observed data to produce their estimations, the major challenge for them being that the amount of observed data (with no missing values) in the minority class is very limited, or that some data are not complete. In this paper, we proposed a novel approach, namely Dynamic Time Warping-based Imputation (DTWI), to handle class imbalanced datasets with missing values. Based on the similarity measurement technique of DTW, all of the data (with or without missing values) in the minority class can be used for missing value imputation. The experimental results based on 10 different class imbalanced medical datasets show that when the missing rates in the minority classes are smaller than 30%, DTWI performs similarly to the baseline K-NN imputation method and better than the mean/mode imputation and case deletion methods. When the missing rates are larger than 30%, DTWI significantly outperform the other techniques.
KW - Class imbalance
KW - Data mining
KW - Dynamic time warping
KW - Machine learning
KW - Missing value imputation
UR - http://www.scopus.com/inward/record.url?scp=85121968022&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2021.116437
DO - 10.1016/j.eswa.2021.116437
M3 - 期刊論文
AN - SCOPUS:85121968022
SN - 0957-4174
VL - 192
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 116437
ER -