TY - JOUR
T1 - Combining multiple data resampling methods and classifier ensembles for better financial distress prediction
T2 - homogeneous and heterogeneous approaches
AU - Hu, Ya Han
AU - Tsai, Chih Fong
AU - Wang, Pei Ting
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/10
Y1 - 2025/10
N2 - Financial distress prediction (FDP) is a critical task for financial institutions and is typically framed as a class imbalance learning problem. To address this challenge, this paper proposes two ensemble-based strategies: the homogeneous and heterogeneous approaches, which combine multiple data re-sampling algorithms to generate diverse re-balanced training sets for classifier construction. Experimental results on seven FDP datasets demonstrate that the heterogeneous approach, which integrates under-, over-, and hybrid sampling methods with their optimal imbalance ratio settings, achieves superior performance in terms of AUC, particularly when applied with the LightGBM and XGBoost classifiers. Regarding Type I error, the heterogeneous combinations consistently outperform the homogeneous and other baseline approaches across various classifiers. The generalizability of the proposed methods is further validated using 37 additional class-imbalanced datasets from different domains, where the heterogeneous approach again shows the most robust performance. These findings suggest that the proposed models can serve as effective decision support tools for financial institutions to enhance credit risk evaluation and lending strategies. From a policy perspective, adopting such predictive frameworks can improve financial stability by reducing exposure to high-risk loans and enabling more accurate early warning systems for economic distress.
AB - Financial distress prediction (FDP) is a critical task for financial institutions and is typically framed as a class imbalance learning problem. To address this challenge, this paper proposes two ensemble-based strategies: the homogeneous and heterogeneous approaches, which combine multiple data re-sampling algorithms to generate diverse re-balanced training sets for classifier construction. Experimental results on seven FDP datasets demonstrate that the heterogeneous approach, which integrates under-, over-, and hybrid sampling methods with their optimal imbalance ratio settings, achieves superior performance in terms of AUC, particularly when applied with the LightGBM and XGBoost classifiers. Regarding Type I error, the heterogeneous combinations consistently outperform the homogeneous and other baseline approaches across various classifiers. The generalizability of the proposed methods is further validated using 37 additional class-imbalanced datasets from different domains, where the heterogeneous approach again shows the most robust performance. These findings suggest that the proposed models can serve as effective decision support tools for financial institutions to enhance credit risk evaluation and lending strategies. From a policy perspective, adopting such predictive frameworks can improve financial stability by reducing exposure to high-risk loans and enabling more accurate early warning systems for economic distress.
KW - Class imbalance learning
KW - Classifier ensemble
KW - Data re-sampling
KW - Data science
KW - Financial distress prediction
UR - https://www.scopus.com/pages/publications/105009494837
U2 - 10.1007/s10479-025-06706-5
DO - 10.1007/s10479-025-06706-5
M3 - 期刊論文
AN - SCOPUS:105009494837
SN - 0254-5330
VL - 353
SP - 793
EP - 814
JO - Annals of Operations Research
JF - Annals of Operations Research
IS - 2
ER -