1
|
Qu Z, Wang Y, Guo D, He G, Sui C, Duan Y, Zhang X, Meng H, Lan L, Liu X. Comparison of deep learning models to traditional Cox regression in predicting survival of colon cancer: Based on the SEER database. J Gastroenterol Hepatol 2024; 39:1816-1826. [PMID: 38725241 DOI: 10.1111/jgh.16598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 04/08/2024] [Accepted: 04/21/2024] [Indexed: 10/01/2024]
Abstract
BACKGROUND AND AIM In this study, a deep learning algorithm was used to predict the survival rate of colon cancer (CC) patients, and compared its performance with traditional Cox regression. METHODS In this population-based cohort study, we used the characteristics of patients diagnosed with CC between 2010 and 2015 from the Surveillance, Epidemiology and End Results (SEER) database. The population was randomized into a training set (n = 10 596, 70%) and a test set (n = 4536, 30%). Brier scores, area under the (AUC) receiver operating characteristic curve and calibration curves were used to compare the performance of the three most popular deep learning models, namely, artificial neural networks (ANN), deep neural networks (DNN), and long-short term memory (LSTM) neural networks with Cox proportional hazard (CPH) model. RESULTS In the independent test set, the Brier values of ANN, DNN, LSTM and CPH were 0.155, 0.149, 0.148, and 0.170, respectively. The AUC values were 0.906 (95% confidence interval [CI] 0.897-0.916), 0.908 (95% CI 0.899-0.918), 0.910 (95% CI 0.901-0.919), and 0.793 (95% CI 0.769-0.816), respectively. Deep learning showed superior promising results than CPH in predicting CC specific survival. CONCLUSIONS Deep learning showed potential advantages over traditional CPH models in terms of prognostic assessment and treatment recommendations. LSTM exhibited optimal predictive accuracy and has the ability to provide reliable information on individual survival and treatment recommendations for CC patients.
Collapse
Affiliation(s)
- Zihan Qu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Yashan Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Dingjie Guo
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Guangliang He
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Chuanying Sui
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Yuqing Duan
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Xin Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Hengyu Meng
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Linwei Lan
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Xin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| |
Collapse
|
2
|
Ayyoubzadeh SM, Ahmadi M, Yazdipour AB, Ghorbani‐Bidkorpeh F, Ahmadi M. Prediction of ovarian cancer using artificial intelligence tools. Health Sci Rep 2024; 7:e2203. [PMID: 38946777 PMCID: PMC11211920 DOI: 10.1002/hsr2.2203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/05/2024] [Accepted: 06/10/2024] [Indexed: 07/02/2024] Open
Abstract
Purpose Ovarian cancer is a common type of cancer and a leading cause of death in women. Therefore, accurate and fast prediction of ovarian tumors is crucial. One of the appropriate and precise methods for predicting and diagnosing this cancer is to build a model based on artificial intelligence methods. These methods provide a tool for predicting ovarian cancer according to the characteristics and conditions of each person. Method In this study, a data set included records related to 171 cases of benign ovarian tumors, and 178 records related to cases of ovarian cancer were analyzed. The data set contains the records of blood test results and tumor markers of the patients. After data preprocessing, including removing outliers and replacing missing values, the weight of the effective factors was determined using information gain indices and the Gini index. In the next step, predictive models were created using random forest (RF), support vector machine (SVM), decision trees (DT), and artificial neural network (ANN) models. The performance of these models was evaluated using the 10-fold cross-validation method using the indicators of specificity, sensitivity, accuracy, and the area under the receiver operating characteristic curve. Finally, by comparing the performance of the models, the best predictive model of ovarian cancer was selected. Results The most important predictive factors were HE4, CA125, and NEU. The RF model was identified as the best predictive model, with an accuracy of more than 86%. The predictive accuracy of DT, SVM, and ANN models was estimated as 82.91%, 85.25%, and 79.35%, respectively. Various artificial intelligence (AI) tools can be used with high accuracy and sensitivity in predicting ovarian cancer. Conclusion Therefore, the use of these tools can help specialists and patients with early, easier, and less expensive diagnosis of ovarian cancer. Future studies can leverage AI to integrate image data with serum biomarkers, thereby facilitating the creation of novel models and advancing the diagnosis and treatment of ovarian cancer.
Collapse
Affiliation(s)
- Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical SciencesTehran University of Medical SciencesTehranIran
- Health Information Management Research CenterTehran University of Medical SciencesTehranIran
| | - Marjan Ahmadi
- Department of Obstetrics and GynecologyTehran University of Medical SciencesTehranIran
| | - Alireza Banaye Yazdipour
- Department of Health Information Management, School of Allied Medical SciencesTehran University of Medical SciencesTehranIran
- Students' Scientific Research Center (SSRC)Tehran University of Medical SciencesTehranIran
- Department of Health Information Technology, School of Paramedical and Rehabilitation SciencesMashhad University of Medical SciencesMashhadIran
| | - Fatemeh Ghorbani‐Bidkorpeh
- Department of Pharmaceutics and Pharmaceutical Nanotechnology, School of PharmacyShahid Beheshti University of Medical SciencesTehranIran
| | - Mahnaz Ahmadi
- Medical Nanotechnology and Tissue Engineering Research CenterShahid Beheshti University of Medical SciencesTehranIran
| |
Collapse
|
3
|
Sun J, Shao S, Wan H, Wu X, Feng J, Gao Q, Qu W, Xie L. Prediction models for postoperative recurrence of non-lactating mastitis based on machine learning. BMC Med Inform Decis Mak 2024; 24:106. [PMID: 38649879 PMCID: PMC11036744 DOI: 10.1186/s12911-024-02499-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 04/03/2024] [Indexed: 04/25/2024] Open
Abstract
OBJECTIVES This study aims to build a machine learning (ML) model to predict the recurrence probability for postoperative non-lactating mastitis (NLM) by Random Forest (RF) and XGBoost algorithms. It can provide the ability to identify the risk of NLM recurrence and guidance in clinical treatment plan. METHODS This study was conducted on inpatients who were admitted to the Mammary Department of Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine between July 2019 to December 2021. Inpatient data follow-up has been completed until December 2022. Ten features were selected in this study to build the ML model: age, body mass index (BMI), number of abortions, presence of inverted nipples, extent of breast mass, white blood cell count (WBC), neutrophil to lymphocyte ratio (NLR), albumin-globulin ratio (AGR) and triglyceride (TG) and presence of intraoperative discharge. We used two ML approaches (RF and XGBoost) to build models and predict the NLM recurrence risk of female patients. Totally 258 patients were randomly divided into a training set and a test set according to a 75%-25% proportion. The model performance was evaluated based on Accuracy, Precision, Recall, F1-score and AUC. The Shapley Additive Explanations (SHAP) method was used to interpret the model. RESULTS There were 48 (18.6%) NLM patients who experienced recurrence during the follow-up period. Ten features were selected in this study to build the ML model. For the RF model, BMI is the most important influence factor and for the XGBoost model is intraoperative discharge. The results of tenfold cross-validation suggest that both the RF model and the XGBoost model have good predictive performance, but the XGBoost model has a better performance than the RF model in our study. The trends of SHAP values of all features in our models are consistent with the trends of these features' clinical presentation. The inclusion of these ten features in the model is necessary to build practical prediction models for recurrence. CONCLUSIONS The results of tenfold cross-validation and SHAP values suggest that the models have predictive ability. The trend of SHAP value provides auxiliary validation in our models and makes it have more clinical significance.
Collapse
Affiliation(s)
- Jiaye Sun
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Shijun Shao
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Hua Wan
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China.
| | - Xueqing Wu
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China.
| | - Jiamei Feng
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Qingqian Gao
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Wenchao Qu
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Lu Xie
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| |
Collapse
|
4
|
Nopour R. Screening ovarian cancer by using risk factors: machine learning assists. Biomed Eng Online 2024; 23:18. [PMID: 38347611 PMCID: PMC10863117 DOI: 10.1186/s12938-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/06/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND AND AIM Ovarian cancer (OC) is a prevalent and aggressive malignancy that poses a significant public health challenge. The lack of preventive strategies for OC increases morbidity, mortality, and other negative consequences. Screening OC through risk prediction could be leveraged as a powerful strategy for preventive purposes that have not received much attention. So, this study aimed to leverage machine learning approaches as predictive assistance solutions to screen high-risk groups of OC and achieve practical preventive purposes. MATERIALS AND METHODS As this study is data-driven and retrospective in nature, we leveraged 1516 suspicious OC women data from one concentrated database belonging to six clinical settings in Sari City from 2015 to 2019. Six machine learning (ML) algorithms, including XG-Boost, Random Forest (RF), J-48, support vector machine (SVM), K-nearest neighbor (KNN), and artificial neural network (ANN) were leveraged to construct prediction models for OC. To choose the best model for predicting OC, we compared various prediction models built using the area under the receiver characteristic operator curve (AU-ROC). RESULTS Current experimental results revealed that the XG-Boost with AU-ROC = 0.93 (0.95 CI = [0.91-0.95]) was recognized as the best-performing model for predicting OC. CONCLUSIONS ML approaches possess significant predictive efficiency and interoperability to achieve powerful preventive strategies leveraging OC screening high-risk groups.
Collapse
Affiliation(s)
- Raoof Nopour
- Department of Health Information Management, Student Research Committee, School of Health Management and Information Sciences Branch, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Lin L, Ding L, Fu Z, Zhang L. Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization. PLoS One 2024; 19:e0296402. [PMID: 38330052 PMCID: PMC10852291 DOI: 10.1371/journal.pone.0296402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 12/12/2023] [Indexed: 02/10/2024] Open
Abstract
BACKGROUND To construct several prediction models for the risk of stroke in coronary artery disease (CAD) patients receiving coronary revascularization based on machine learning methods. METHODS In total, 5757 CAD patients receiving coronary revascularization admitted to ICU in Medical Information Mart for Intensive Care IV (MIMIC-IV) were included in this cohort study. All the data were randomly split into the training set (n = 4029) and testing set (n = 1728) at 7:3. Pearson correlation analysis and least absolute shrinkage and selection operator (LASSO) regression model were applied for feature screening. Variables with Pearson correlation coefficient<9 were included, and the regression coefficients were set to 0. Features more closely related to the outcome were selected from the 10-fold cross-validation, and features with non-0 Coefficent were retained and included in the final model. The predictive values of the models were evaluated by sensitivity, specificity, area under the curve (AUC), accuracy, and 95% confidence interval (CI). RESULTS The Catboost model presented the best predictive performance with the AUC of 0.831 (95%CI: 0.811-0.851) in the training set, and 0.760 (95%CI: 0.722-0.798) in the testing set. The AUC of the logistic regression model was 0.789 (95%CI: 0.764-0.814) in the training set and 0.731 (95%CI: 0.686-0.776) in the testing set. The results of Delong test revealed that the predictive value of the Catboost model was significantly higher than the logistic regression model (P<0.05). Charlson Comorbidity Index (CCI) was the most important variable associated with the risk of stroke in CAD patients receiving coronary revascularization. CONCLUSION The Catboost model was the optimal model for predicting the risk of stroke in CAD patients receiving coronary revascularization, which might provide a tool to quickly identify CAD patients who were at high risk of postoperative stroke.
Collapse
Affiliation(s)
- Lulu Lin
- Department of Neurology, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Li Ding
- Department of Neurology, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Zhongguo Fu
- Department of Neurology, Shenyang First People’s Hospital, Shenyang, Liaoning, China
| | - Lijiao Zhang
- Department of Cardiology, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| |
Collapse
|
6
|
Babaei Rikan S, Sorayaie Azar A, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Wiil UK. Survival prediction of glioblastoma patients using modern deep learning and machine learning techniques. Sci Rep 2024; 14:2371. [PMID: 38287149 PMCID: PMC10824760 DOI: 10.1038/s41598-024-53006-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 01/25/2024] [Indexed: 01/31/2024] Open
Abstract
In this study, we utilized data from the Surveillance, Epidemiology, and End Results (SEER) database to predict the glioblastoma patients' survival outcomes. To assess dataset skewness and detect feature importance, we applied Pearson's second coefficient test of skewness and the Ordinary Least Squares method, respectively. Using two sampling strategies, holdout and five-fold cross-validation, we developed five machine learning (ML) models alongside a feed-forward deep neural network (DNN) for the multiclass classification and regression prediction of glioblastoma patient survival. After balancing the classification and regression datasets, we obtained 46,340 and 28,573 samples, respectively. Shapley additive explanations (SHAP) were then used to explain the decision-making process of the best model. In both classification and regression tasks, as well as across holdout and cross-validation sampling strategies, the DNN consistently outperformed the ML models. Notably, the accuracy were 90.25% and 90.22% for holdout and five-fold cross-validation, respectively, while the corresponding R2 values were 0.6565 and 0.6622. SHAP analysis revealed the importance of age at diagnosis as the most influential feature in the DNN's survival predictions. These findings suggest that the DNN holds promise as a practical auxiliary tool for clinicians, aiding them in optimal decision-making concerning the treatment and care trajectories for glioblastoma patients.
Collapse
Affiliation(s)
| | | | - Amin Naemi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | | | - Habibollah Pirnejad
- Erasmus School of Health Policy and Management (ESHPM), Erasmus University Rotterdam, Rotterdam, The Netherlands.
- Patient Safety Research Center, Clinical Research Institute, Urmia University of Medical Sciences, Urmia, Iran.
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
7
|
Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, Riley RD, Schlussel MM. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. J Clin Epidemiol 2024; 165:111199. [PMID: 37898461 DOI: 10.1016/j.jclinepi.2023.10.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
OBJECTIVE To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine learning methods in the field of oncology. STUDY DESIGN AND SETTING We conducted a systematic review, searching the MEDLINE database between December 1, 2022, and December 31, 2022, for studies developing a multivariable prognostic model using machine learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices. RESULTS We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data sharing statements, with 21 (46%) indicating data were available on request to the authors and seven declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their model to be used in practice. The use of reporting guidelines was rare: eight studies (18%) mentioning using a reporting guideline, with 4 (10%) using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis statement, 1 (2%) using Minimum Information About Clinical Artificial Intelligence Modeling and Consolidated Standards Of Reporting Trials-Artificial Intelligence, 1 (2%) using Strengthening The Reporting Of Observational Studies In Epidemiology, 1 (2%) using Standards for Reporting Diagnostic Accuracy Studies, and 1 (2%) using Transparent Reporting of Evaluations with Nonrandomized Designs. CONCLUSION The adoption of open science principles in oncology studies developing prognostic models using machine learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science are needed for prediction research in oncology.
Collapse
Affiliation(s)
- Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom.
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA; Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, United Kingdom
| | - Patricia Logullo
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Jennifer A de Beyer
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Michael M Schlussel
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
8
|
Yang Z, Zhou D, Huang J. Identifying Explainable Machine Learning Models and a Novel SFRP2 + Fibroblast Signature as Predictors for Precision Medicine in Ovarian Cancer. Int J Mol Sci 2023; 24:16942. [PMID: 38069266 PMCID: PMC10706905 DOI: 10.3390/ijms242316942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023] Open
Abstract
Ovarian cancer (OC) is a type of malignant tumor with a consistently high mortality rate. The diagnosis of early-stage OC and identification of functional subsets in the tumor microenvironment are essential to the development of patient management strategies. However, the development of robust models remains unsatisfactory. We aimed to utilize artificial intelligence and single-cell analysis to address this issue. Two independent datasets were screened from the Gene Expression Omnibus (GEO) database and processed to obtain overlapping differentially expressed genes (DEGs) in stage II-IV vs. stage I diseases. Three explainable machine learning algorithms were integrated to construct models that could determine the tumor stage and extract important characteristic genes as diagnostic biomarkers. Correlations between cancer-associated fibroblast (CAF) infiltration and characteristic gene expression were analyzed using TIMER2.0 and their relationship with survival rates was comprehensively explored via the Kaplan-Meier plotter (KM-plotter) online database. The specific expression of characteristic genes in fibroblast subsets was investigated through single-cell analysis. A novel fibroblast subset signature was explored to predict immune checkpoint inhibitor (ICI) response and oncogene mutation through Tumor Immune Dysfunction and Exclusion (TIDE) and artificial neural network algorithms, respectively. We found that Support Vector Machine-Shapley Additive Explanations (SVM-SHAP), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) successfully diagnosed early-stage OC (stage I). The area under the receiver operating characteristic curves (AUCs) of these models exceeded 0.990. Their overlapping characteristic gene, secreted frizzled-related protein 2 (SFRP2), was a risk factor that affected the overall survival of OC patients with stage II-IV disease (log-rank test: p < 0.01) and was specifically expressed in a fibroblast subset. Finally, the SFRP2+ fibroblast signature served as a novel predictor in evaluating ICI response and exploring pan-cancer tumor protein P53 (TP53) mutation (AUC = 0.853, 95% confidence interval [CI]: 0.829-0.877). In conclusion, the models based on SVM-SHAP, XGBoost, and RF enabled the early detection of OC for clinical decision making, and SFRP2+ fibroblast signature used in diagnostic models can inform OC treatment selection and offer pan-cancer TP53 mutation detection.
Collapse
Affiliation(s)
| | | | - Jun Huang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
9
|
Zuo D, Yang L, Jin Y, Qi H, Liu Y, Ren L. Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Med Inform Decis Mak 2023; 23:276. [PMID: 38031071 PMCID: PMC10688055 DOI: 10.1186/s12911-023-02377-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/17/2023] [Indexed: 12/01/2023] Open
Abstract
Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer.
Collapse
Affiliation(s)
- Duo Zuo
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Lexin Yang
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Yu Jin
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- Tongji University Cancer Center, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Huan Qi
- China Mobile Group Tianjin Company Limited, Tianjin, 300308, China
| | - Yahui Liu
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Li Ren
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China.
- National Clinical Research Center for Cancer, Tianjin, 300060, China.
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China.
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China.
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China.
| |
Collapse
|