TY - GEN
T1 - Machine Learning Algorithms for ccRCC Data Analysis
AU - Tsai, Hui Yu
AU - Lee, Wei Chi
AU - Shih, Chang Xing
AU - Liu, Shao Hung
AU - Chang, Hui Yin
AU - Wu, Hui Ching
AU - Tseng, Ming Hseng
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Malignant tumors are one of the top ten causes of death worldwide. Clear cell renal cell carcinoma (ccRCC) is the most common renal cell carcinoma. Discovering latent factors through genetic testing help diagnose the disease. Due to the complexity of proteomics, we used machine learning models to classify the disease and choose proteins with important features, in order to reduce irrelevant detection. We used different machine learning methods, including logistic regression (LR), support vector classification (SVC), and random forest classifier (RFC) for feature selection with different classification algorithms, e.g., SVC, XGBoost, and Multilayer Perceptron, to predict diseases. Differences among various machine learning methods were compared to improve classification accuracy and present key features. The ccRCC dataset with 11,817 proteins and 194 patients was used for model development. According to the evaluation, LR and RFC showed a larger area in the receiver operating characteristic curve (AUC), compared to SVC. In the LR ccRCC dataset, the feature selection method achieved an AUC of 0.995 while the RFC feature selection method of RFC achieved AUC up to 0.996. By comparing different classification algorithms with the RFC feature selection method, RFC, XGBoost, and LR achieved AUCs of 0.996, 0.995, and 0.992, respectively, where LR has the lowest AUC score. However, using the LR classifier model with the LR feature selection method achieved an AUC of 0.995. These results demonstrated that different machine learning algorithms must be matched with different feature selection methods to classify data.
AB - Malignant tumors are one of the top ten causes of death worldwide. Clear cell renal cell carcinoma (ccRCC) is the most common renal cell carcinoma. Discovering latent factors through genetic testing help diagnose the disease. Due to the complexity of proteomics, we used machine learning models to classify the disease and choose proteins with important features, in order to reduce irrelevant detection. We used different machine learning methods, including logistic regression (LR), support vector classification (SVC), and random forest classifier (RFC) for feature selection with different classification algorithms, e.g., SVC, XGBoost, and Multilayer Perceptron, to predict diseases. Differences among various machine learning methods were compared to improve classification accuracy and present key features. The ccRCC dataset with 11,817 proteins and 194 patients was used for model development. According to the evaluation, LR and RFC showed a larger area in the receiver operating characteristic curve (AUC), compared to SVC. In the LR ccRCC dataset, the feature selection method achieved an AUC of 0.995 while the RFC feature selection method of RFC achieved AUC up to 0.996. By comparing different classification algorithms with the RFC feature selection method, RFC, XGBoost, and LR achieved AUCs of 0.996, 0.995, and 0.992, respectively, where LR has the lowest AUC score. However, using the LR classifier model with the LR feature selection method achieved an AUC of 0.995. These results demonstrated that different machine learning algorithms must be matched with different feature selection methods to classify data.
KW - Data Analysis
KW - Machine Learning
KW - ccRCC
UR - http://www.scopus.com/inward/record.url?scp=85143134063&partnerID=8YFLogxK
U2 - 10.1109/ECBIOS54627.2022.9945034
DO - 10.1109/ECBIOS54627.2022.9945034
M3 - 會議論文篇章
AN - SCOPUS:85143134063
T3 - Proceedings of the 2022 IEEE 4th Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, ECBIOS 2022
SP - 203
EP - 206
BT - Proceedings of the 2022 IEEE 4th Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, ECBIOS 2022
A2 - Meen, Teen-Hang
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, ECBIOS 2022
Y2 - 27 May 2022 through 29 May 2022
ER -