Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches

Chih Fong Tsai, Ya Ting Sung

研究成果: 雜誌貢獻期刊論文同行評審

83 引文 斯高帕斯(Scopus)

摘要

Feature selection in high dimension, low sample size (HDLSS) data is always an important data pre-processing task. In the literature, the concept of ensemble learning has been applied to improve single feature selection methods, the so-called ensemble feature selection techniques. The most widely used approach is to combine multiple feature selection methods and their selection results via some sort of aggregation function in a parallel manner. Another ensemble strategy is based on the serial combination approach where the selection results of the first feature selection stage are used as input for the second stage of feature selection to produce the final output. The aim of this paper is to fully explore the performance of parallel and serial combination approaches for ensemble feature selection over HDLSS data. In particular, we strive to answer two research questions: whether parallel and serial based ensemble feature selection can outperform single feature selection and which combination approach is the better choice for ensemble feature selection. The experimental results based on comparing nine parallel and nine serial combinations, as well as three single baseline feature selection methods, including principal component analysis (PCA), genetic algorithm (GA), and C4.5 decision tree, show that ensemble feature selection performs better than single feature selection in terms of classification accuracy. However, there are no significant differences in performance between the single best baseline method (i.e. GA) and the top three parallel and serial combinations. On the other hand, the serial combination approach produces the largest feature reduction rate.

原文???core.languages.en_GB???
文章編號106097
期刊Knowledge-Based Systems
203
DOIs
出版狀態已出版 - 5 9月 2020

指紋

深入研究「Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches」主題。共同形成了獨特的指紋。

引用此