The feature selection effect on missing value imputation of medical datasets

Chia Hui Liu, Chih Fong Tsai, Kuen Liang Sue, Min Wei Huang

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


In practice, many medical domain datasets are incomplete, containing a proportion of incomplete data with missing attribute values. Missing value imputation can be performed to solve the problem of incomplete datasets. To impute missing values, some of the observed data (i.e., complete data) are generally used as the reference or training set, and then the relevant statistical and machine learning techniques are employed to produce estimations to replace the missing values. Since the collected dataset usually contains a certain number of feature dimensions, it is useful to perform feature selection for better pattern recognition. Therefore, the aim of this paper is to examine the effect of performing feature selection on missing value imputation of medical datasets. Experiments are carried out on five different medical domain datasets containing various feature dimensions. In addition, three different types of feature selection methods and imputation techniques are employed for comparison. The results show that combining feature selection and imputation is a better choice for many medical datasets. However, the feature selection algorithm should be carefully chosen in order to produce the best result. Particularly, the genetic algorithm and information gain models are suitable for lower dimensional datasets, whereas the decision tree model is a better choice for higher dimensional datasets.

Original languageEnglish
Article number2344
JournalApplied Sciences (Switzerland)
Issue number7
StatePublished - 1 Apr 2020


  • Data mining
  • Feature selection
  • Imputation
  • Medical datasets
  • Missing values


Dive into the research topics of 'The feature selection effect on missing value imputation of medical datasets'. Together they form a unique fingerprint.

Cite this