Towards hybrid over- and under-sampling combination methods for class imbalanced datasets: an experimental study

Cian Lin, Chih Fong Tsai, Wei Chao Lin

研究成果: 雜誌貢獻期刊論文同行評審

15 引文 斯高帕斯(Scopus)

摘要

The skewed class distributions of many class imbalanced domain datasets often make it difficult for machine learning techniques to construct effective models. In such cases, data re-sampling techniques, such as under-sampling the majority class and over-sampling the minority class are usually employed. In related literatures, some studies have shown that hybrid combinations of under- and over-sampling methods with differ orders can produce better results. However, each study only compares with either under- or over-sampling methods to make the final conclusion. Therefore, the research objective of this paper is to find out which order of combining under- and over-sampling methods perform better. Experiments are conducted based on 44 different domain datasets using three over-sampling algorithms, including SMOTE, CTGAN, and TAN, and three under-sampling (i.e. instance selection) algorithms, including IB3, DROP3, and GA. The results show that if the under-sampling algorithm is chosen carefully, i.e. IB3, no significant performance improvement is obtained by further addition of the over-sampling step. Furthermore, with the IB3 algorithm, it is better to perform instance selection first and over-sampling second than the other combination order, which can allow the random forest classifier to provide the highest AUC rate.

原文???core.languages.en_GB???
頁(從 - 到)845-863
頁數19
期刊Artificial Intelligence Review
56
發行號2
DOIs
出版狀態已出版 - 2月 2023

指紋

深入研究「Towards hybrid over- and under-sampling combination methods for class imbalanced datasets: an experimental study」主題。共同形成了獨特的指紋。

引用此