Traditional default prediction models mainly rely on financial data. However, financial data on small and medium-sized enterprises (SMEs) are difficult to obtain, and even when they are available, their opaqueness may hinder analysis. Therefore, traditional prediction models encounter serious problems when being utilized to predict the defaulting of SMEs. In this paper, a novel prediction framework utilizing only external public credit data is proposed. The external public credit data used include SMEs' basic information (BI), credit information from the government (CIG), and court verdict information (CVI), which can be collected from publicly accessible websites. Records on 15,605 sample companies were collected from approximately 300,000 companies. Among them, 8183 have defaulted. The empirical data were applied to construct prediction models using logistic regression, the classification and regression tree (CART) model, and LightGBM. The best results achieved 0.87 accuracy and 0.92 area under receiver operating characteristic (AUC). The results show that the model only uses the external credit data proven to have significant predict ability, and CIG variables offer the best prediction capacities.
- Credit risk
- Default prediction
- External credit data
- Small and medium-sized enterprises (SMEs)