Instance selection in medical datasets: A divide-and-conquer framework

Min Wei Huang, Chih Fong Tsai, Wei Chao Lin

研究成果: 雜誌貢獻期刊論文同行評審

7 引文 斯高帕斯(Scopus)


Instance selection is an important problem in medical data mining. It focuses on selecting representative data samples from a given training set, whereas unrepresentative (or noisy) data samples are filtered out. This reduces the size of the training set, which then requires less storage space. In addition, when the instance selection algorithm was carefully chosen, a reduction in the training set so that it contains less noisy data can usually make the classifiers perform better than the ones without considering instance selection. In the literature, many instance selection algorithms have been proposed. However, different algorithms tend to use different criteria to determine the noisy data, making it difficult to find the best algorithm for different domain datasets. In other words, some algorithms may perform better than the others for some specific domain datasets, but may perform worse than others over other domain datasets. Instead of developing a novel algorithm that performs better than most other algorithms, this paper introduces a divide-and-conquer based instance selection (DCIS) framework that aims to improve the performance of each specific instance selection algorithm per se. Two well-known algorithms, i.e., DROP3 and IB3, are used as the baseline, and various small and large scale medical datasets are used in the experiments. Our results show that when DROP3 and IB3 are used to perform instance selection based on the DCIS framework, there is an improvement in the performance of the k-NN and SVM classifiers over the ones by the DROP3 and IB3 baselines, respectively.

期刊Computers and Electrical Engineering
出版狀態已出版 - 3月 2021


深入研究「Instance selection in medical datasets: A divide-and-conquer framework」主題。共同形成了獨特的指紋。