TY - JOUR
T1 - SVM and SVM ensembles in breast cancer prediction
AU - Huang, Min Wei
AU - Chen, Chih Wen
AU - Lin, Wei Chao
AU - Ke, Shih Wen
AU - Tsai, Chih Fong
N1 - Publisher Copyright:
© 2017 Huang et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the originalauthor and source are credited.
PY - 2017/1
Y1 - 2017/1
N2 - Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
AB - Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
UR - http://www.scopus.com/inward/record.url?scp=85009223759&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0161501
DO - 10.1371/journal.pone.0161501
M3 - 期刊論文
C2 - 28060807
AN - SCOPUS:85009223759
SN - 1932-6203
VL - 12
JO - PLoS ONE
JF - PLoS ONE
IS - 1
M1 - 0161501
ER -