Towards high dimensional instance selection: An evolutionary approach

Chih Fong Tsai, Zong Yao Chen

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

Data reduction is an important data pre-processing step in the KDD process. It can be approached by the application of some instance selection algorithms to filter out unrepresentative or noisy data from a given (training) dataset. However, the performance of instance selection over very high dimensional data has not yet been fully examined. In this paper, we introduce a novel efficient genetic algorithm (EGA), which fits "biological evolution" into the evolutionary process. In other words, after long-term evolution, individuals find the most efficient way to allocate resources and evolve. The experimental study is based on four very high dimensional datasets ranging from 200 to 18,236 dimensions. In addition, four state-of-the-art algorithms including IB3, DROP3, ICF, and GA are compared with EGA. The experimental results show that EGA allows the k-NN and SVM classifiers to provide the most comparable classification performance with the baseline classifiers without instance selection. Particularly, EGA outperforms the four algorithms in terms of average classification accuracy. Moreover, EGA can produce the largest reduction rates (the same as GA) and it requires relatively less computational time than the other four algorithms.

Original languageEnglish
Pages (from-to)79-92
Number of pages14
JournalDecision Support Systems
Volume61
Issue number1
DOIs
StatePublished - May 2014

Keywords

  • Data mining
  • Data reduction
  • Genetic algorithms
  • High dimensional data
  • Instance selection
  • Machine learning

Fingerprint

Dive into the research topics of 'Towards high dimensional instance selection: An evolutionary approach'. Together they form a unique fingerprint.

Cite this