Instance selection in medical datasets: A divide-and-conquer framework

Min Wei Huang, Chih Fong Tsai, Wei Chao Lin

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Instance selection is an important problem in medical data mining. It focuses on selecting representative data samples from a given training set, whereas unrepresentative (or noisy) data samples are filtered out. This reduces the size of the training set, which then requires less storage space. In addition, when the instance selection algorithm was carefully chosen, a reduction in the training set so that it contains less noisy data can usually make the classifiers perform better than the ones without considering instance selection. In the literature, many instance selection algorithms have been proposed. However, different algorithms tend to use different criteria to determine the noisy data, making it difficult to find the best algorithm for different domain datasets. In other words, some algorithms may perform better than the others for some specific domain datasets, but may perform worse than others over other domain datasets. Instead of developing a novel algorithm that performs better than most other algorithms, this paper introduces a divide-and-conquer based instance selection (DCIS) framework that aims to improve the performance of each specific instance selection algorithm per se. Two well-known algorithms, i.e., DROP3 and IB3, are used as the baseline, and various small and large scale medical datasets are used in the experiments. Our results show that when DROP3 and IB3 are used to perform instance selection based on the DCIS framework, there is an improvement in the performance of the k-NN and SVM classifiers over the ones by the DROP3 and IB3 baselines, respectively.

Original languageEnglish
Article number106957
JournalComputers and Electrical Engineering
Volume90
DOIs
StatePublished - Mar 2021

Keywords

  • Data reduction
  • Divide-and-conquer
  • Instance selection
  • Machine learning
  • Medical data mining

Fingerprint

Dive into the research topics of 'Instance selection in medical datasets: A divide-and-conquer framework'. Together they form a unique fingerprint.

Cite this