1
|
Shen M, Zhang Y, Zhan R, Du T, Shen P, Lu X, Liu S, Guo R, Shen X. Predicting the risk of cardiovascular disease in adults exposed to heavy metals: Interpretable machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 290:117570. [PMID: 39721423 DOI: 10.1016/j.ecoenv.2024.117570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/16/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024]
Abstract
Machine learning exhibits excellent performance in terms of predictive power. We aimed to construct an interpretable machine learning model utilizing National Health and Nutrition Examination Survey data to investigate the relationship between heavy metal exposure and cardiovascular disease (CVD). A total of 4600 adults were included in the analysis. The Least Absolute Shrinkage and Selection Operator regression method was employed to select relevant feature variables. Subsequently, six machine learning models were constructed, including random forest, decision tree, gradient boosting decision tree, k-nearest neighbor, support vector machine, and AdaBoost algorithms. Feature importance analysis, partial dependence plot, and shapley additive explanations were integrated to enhance the interpretability of the CVD prediction model. Among all models, the random forest exhibited the best performance, with an accuracy of 90 %, an area under the curve of 0.85, and an F1 score of 0.86. Urine cadmium (Cd), blood lead (Pb), urine thallium (Tl), and urine tungsten (W) were identified as the most significant predictors of CVD, with importance scores of 0.062, 0.057, 0.051, and 0.050, respectively. At the overall level, higher levels of urine Cd, blood Pb, and urine W were associated with an increased risk of CVD, whereas a lower level of urine Tl was linked to a reduced CVD risk. Additionally, the analysis of synergistic effects revealed that Cd was the predominant determinant of CVD risk. The random forest-based CVD prediction model demonstrated excellent predictive power and provided valuable insights for personalized patient care and optimal resource allocation in populations exposed to heavy metals.
Collapse
Affiliation(s)
- Meiyue Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Yine Zhang
- Ningxia Center for Disease Control and Prevention, Yinchuan, China
| | | | - Tingwei Du
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Peixuan Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Xiaochuan Lu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Shengnan Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China; Ningxia Center for Disease Control and Prevention, Yinchuan, China; Qingdao Haici Hospital, Qingdao 266033, China
| | - Rongrong Guo
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China
| | - Xiaoli Shen
- Department of Epidemiology and Health Statistics, School of Public Health, Qingdao University, Qingdao 266071, China.
| |
Collapse
|
2
|
Lee N, Jeon K, Park MJ, Song W, Jeong S. Predicting survival in patients with SARS-CoV-2 based on cytokines and soluble immune checkpoint regulators. Front Cell Infect Microbiol 2024; 14:1397297. [PMID: 39654974 PMCID: PMC11625743 DOI: 10.3389/fcimb.2024.1397297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 10/31/2024] [Indexed: 12/12/2024] Open
Abstract
Background Coronavirus disease 2019 (COVID-19) has been widespread for over four years and has progressed to an endemic stage. Accordingly, the evaluation of host immunity in infected patients and the development of markers for prognostic prediction in the early stages have been emphasized. Soluble immune checkpoints (sICs), which regulate T cell activity, have been reported as promising biomarkers of viral infections. Methods In this study, quantitative values of 17 sICs and 16 cytokines (CKs) were measured using the Luminex multiplex assay. A total of 148 serum samples from 100 patients with COVID-19 were collected and the levels were compared between survivors vs. non-survivors and pneumonic vs. non-pneumonic conditions groups. The impact of these markers on overall survival were analyzed using a machine learning algorithm. Results sICs, including sCD27, sCD40, herpes virus entry mediator (sHVEM), T-cell immunoglobulin and mucin-domain containing-3 (sTIM-3), and Toll-like receptor 2 (sTLR-2) and CKs, including chemokine CC motif ligand 2 (CCL2), interleukin-6 (IL-6), IL-8, IL-10, IL-13, granulocyte-macrophage colony-stimulating factor (GM-CSF), and tumor necrosis factor-α (TNF- α), were statistically significantly increased in the non-survivors compared to those of in the survivors. IL-6 showed the highest area under the receiver-operating curve (0.844, 95% CI = 0.751-0.913) to discriminate non-survival, with a sensitivity of 78.9% and specificity of 82.4%. In Kaplan-Meier analysis, patients with procalcitonin over 0.25 ng/mL, C-reactive protein (CRP) over 41.0 mg/dL, neutrophil-to-lymphocyte ratio over 18.97, sCD27 over 3828.8 pg/mL, sCD40 over 1283.6 pg/mL, and IL-6 over 21.6 pg/mL showed poor survival (log-rank test). In the decision tree analysis, IL-6, sTIM-3, and sCD40 levels had a strong impact on survival. Moreover, IL-6, CD40, and CRP levels were important to predict the probability of 90-d mortality using the SHapley Additive exPlanations method. Conclusion sICs and CKs, especially IL-6, sCD27, sCD40, and sTIM-3 are expected to be useful in predicting patient outcomes when used in combination with existing markers.
Collapse
Affiliation(s)
- Nuri Lee
- Department of Laboratory Medicine, Hallym University College of Medicine, Kangnam Sacred Heart Hospital, Seoul, Republic of Korea
| | - Kibum Jeon
- Department of Laboratory Medicine, Hallym University College of Medicine, Hangang Sacred Heart Hospital, Seoul, Republic of Korea
| | - Min-Jeong Park
- Department of Laboratory Medicine, Hallym University College of Medicine, Kangnam Sacred Heart Hospital, Seoul, Republic of Korea
| | - Wonkeun Song
- Department of Laboratory Medicine, Hallym University College of Medicine, Kangnam Sacred Heart Hospital, Seoul, Republic of Korea
| | - Seri Jeong
- Department of Laboratory Medicine, Hallym University College of Medicine, Kangnam Sacred Heart Hospital, Seoul, Republic of Korea
| |
Collapse
|
3
|
Sorayaie Azar A, Samimi T, Tavassoli G, Naemi A, Rahimi B, Hadianfard Z, Wiil UK, Nazarbaghi S, Bagherzadeh Mohasefi J, Lotfnezhad Afshar H. Predicting stroke severity of patients using interpretable machine learning algorithms. Eur J Med Res 2024; 29:547. [PMID: 39538301 PMCID: PMC11562860 DOI: 10.1186/s40001-024-02147-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Stroke is a significant global health concern, ranking as the second leading cause of death and placing a substantial financial burden on healthcare systems, particularly in low- and middle-income countries. Timely evaluation of stroke severity is crucial for predicting clinical outcomes, with standard assessment tools being the Rapid Arterial Occlusion Evaluation (RACE) and the National Institutes of Health Stroke Scale (NIHSS). This study aims to utilize Machine Learning (ML) algorithms to predict stroke severity using these two distinct scales. METHODS We conducted this study using two datasets collected from hospitals in Urmia, Iran, corresponding to stroke severity assessments based on RACE and NIHSS. Seven ML algorithms were applied, including K-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Hyperparameter tuning was performed using grid search to optimize model performance, and SHapley Additive Explanations (SHAP) were used to interpret the contribution of individual features. RESULTS Among the models, the RF achieved the highest performance, with accuracies of 92.68% for the RACE dataset and 91.19% for the NIHSS dataset. The Area Under the Curve (AUC) was 92.02% and 97.86% for the RACE and NIHSS datasets, respectively. The SHAP analysis identified triglyceride levels, length of hospital stay, and age as critical predictors of stroke severity. CONCLUSIONS This study is the first to apply ML models to the RACE and NIHSS scales for predicting stroke severity. The use of SHAP enhances the interpretability of the models, increasing clinicians' trust in these ML algorithms. The best-performing ML model can be a valuable tool for assisting medical professionals in predicting stroke severity in clinical settings.
Collapse
Affiliation(s)
- Amir Sorayaie Azar
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
- Department of Computer Engineering, Urmia University, Urmia, Iran
| | - Tahereh Samimi
- Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran
- Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran
| | - Ghanbar Tavassoli
- Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran
- Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran
- Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
| | - Amin Naemi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Bahlol Rahimi
- Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran
- Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran
| | - Zahra Hadianfard
- Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Surena Nazarbaghi
- Department of Neurology, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
| | - Jamshid Bagherzadeh Mohasefi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark.
- Department of Computer Engineering, Urmia University, Urmia, Iran.
| | - Hadi Lotfnezhad Afshar
- Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran.
- Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran.
| |
Collapse
|
4
|
Park SW, Park YL, Lee EG, Chae H, Park P, Choi DW, Choi YH, Hwang J, Ahn S, Kim K, Kim WJ, Kong SY, Jung SY, Kim HJ. Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning. Cancers (Basel) 2024; 16:3799. [PMID: 39594754 PMCID: PMC11592669 DOI: 10.3390/cancers16223799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 11/06/2024] [Accepted: 11/09/2024] [Indexed: 11/28/2024] Open
Abstract
Background/Objectives: Breast cancer is the most common cancer in women worldwide, requiring strategic efforts to reduce its mortality. This study aimed to develop a predictive classification model for breast cancer mortality using real-world data, including various clinical features. Methods: A total of 11,286 patients with breast cancer from the National Cancer Center were included in this study. The mortality rate of the total sample was approximately 6.2%. Propensity score matching was used to reduce bias. Several machine learning models, including extreme gradient boosting, were applied to 31 clinical features. To enhance model interpretability, we used the SHapley Additive exPlanations method. ML analyses were also performed on the samples, excluding patients who developed other cancers after breast cancer. Results: Among the ML models, the XGB model exhibited the highest discriminatory power, with an area under the curve of 0.8722 and a specificity of 0.9472. Key predictors of the mortality classification model included occurrence in other organs, age at diagnosis, N stage, T stage, curative radiation treatment, and Ki-67(%). Even after excluding patients who developed other cancers after breast cancer, the XGB model remained the best-performing, with an AUC of 0.8518 and a specificity of 0.9766. Additionally, the top predictors from SHAP were similar to the results for the overall sample. Conclusions: Our models provided excellent predictions of breast cancer mortality using real-world data from South Korea. Explainable artificial intelligence, such as SHAP, validated the clinical applicability and interpretability of these models.
Collapse
Affiliation(s)
- Sang Won Park
- Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.W.P.)
- Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Ye-Lin Park
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Eun-Gyeong Lee
- Department of Surgery, Center of Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea
| | - Heejung Chae
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
- Department of Medical Oncology, Center for Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea
| | - Phillip Park
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Dong-Woo Choi
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Yeon Ho Choi
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Juyeon Hwang
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Seohyun Ahn
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Keunkyun Kim
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| | - Woo Jin Kim
- Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.W.P.)
- Department of Internal Medicine, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea
- Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea
| | - Sun-Young Kong
- Targeted Therapy Branch, Research Institute, National Cancer Center, Goyang 10408, Republic of Korea
- Department of Laboratory Medicine, Hospital, National Cancer Center, Goyang 10408, Republic of Korea
| | - So-Youn Jung
- Department of Surgery, Center of Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea
| | - Hyun-Jin Kim
- Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
| |
Collapse
|
5
|
El-Latif EIA, El-Dosuky M, Darwish A, Hassanien AE. A deep learning approach for ovarian cancer detection and classification based on fuzzy deep learning. Sci Rep 2024; 14:26463. [PMID: 39488573 PMCID: PMC11531531 DOI: 10.1038/s41598-024-75830-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 10/08/2024] [Indexed: 11/04/2024] Open
Abstract
Different oncologists make their own decisions about the detection and classification of the type of ovarian cancer from histopathological whole slide images. However, it is necessary to have an automated system that is more accurate and standardized for decision-making, which is essential for early detection of ovarian cancer. To help doctors, an automated detection and classification of ovarian cancer system is proposed. This model starts by extracting the main features from the histopathology images based on the ResNet-50 model to detect and classify the cancer. Then, recursive feature elimination based on a decision tree is introduced to remove unnecessary features extracted during the feature extraction process. Adam optimizers were implemented to optimize the network's weights during training data. Finally, the advantages of combining deep learning and fuzzy logic are combined to classify the images of ovarian cancer. The dataset consists of 288 hematoxylin and eosin (H&E) stained whole slides with clinical information from 78 patients. H&E-stained Whole Slide Images (WSIs), including 162 effective and 126 invalid WSIs were obtained from different tissue blocks of post-treatment specimens. Experimental results can diagnose ovarian cancer with a potential accuracy of 98.99%, sensitivity of 99%, specificity of 98.96%, and F1-score of 98.99%. The results show promising results indicating the potential of using fuzzy deep-learning classifiers for predicting ovarian cancer.
Collapse
Affiliation(s)
| | - Mohamed El-Dosuky
- Computer Science Department, Arab East Colleges, Riyadh, Saudi Arabia
- Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Ashraf Darwish
- Faculty of Science, Helwan University, Cairo, Egypt
- Scientific Research school of Egypt (SRSEG), Cairo, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
- Scientific Research school of Egypt (SRSEG), Cairo, Egypt
| |
Collapse
|
6
|
Qu Z, Wang Y, Guo D, He G, Sui C, Duan Y, Zhang X, Meng H, Lan L, Liu X. Comparison of deep learning models to traditional Cox regression in predicting survival of colon cancer: Based on the SEER database. J Gastroenterol Hepatol 2024; 39:1816-1826. [PMID: 38725241 DOI: 10.1111/jgh.16598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 04/08/2024] [Accepted: 04/21/2024] [Indexed: 10/01/2024]
Abstract
BACKGROUND AND AIM In this study, a deep learning algorithm was used to predict the survival rate of colon cancer (CC) patients, and compared its performance with traditional Cox regression. METHODS In this population-based cohort study, we used the characteristics of patients diagnosed with CC between 2010 and 2015 from the Surveillance, Epidemiology and End Results (SEER) database. The population was randomized into a training set (n = 10 596, 70%) and a test set (n = 4536, 30%). Brier scores, area under the (AUC) receiver operating characteristic curve and calibration curves were used to compare the performance of the three most popular deep learning models, namely, artificial neural networks (ANN), deep neural networks (DNN), and long-short term memory (LSTM) neural networks with Cox proportional hazard (CPH) model. RESULTS In the independent test set, the Brier values of ANN, DNN, LSTM and CPH were 0.155, 0.149, 0.148, and 0.170, respectively. The AUC values were 0.906 (95% confidence interval [CI] 0.897-0.916), 0.908 (95% CI 0.899-0.918), 0.910 (95% CI 0.901-0.919), and 0.793 (95% CI 0.769-0.816), respectively. Deep learning showed superior promising results than CPH in predicting CC specific survival. CONCLUSIONS Deep learning showed potential advantages over traditional CPH models in terms of prognostic assessment and treatment recommendations. LSTM exhibited optimal predictive accuracy and has the ability to provide reliable information on individual survival and treatment recommendations for CC patients.
Collapse
Affiliation(s)
- Zihan Qu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Yashan Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Dingjie Guo
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Guangliang He
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Chuanying Sui
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Yuqing Duan
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Xin Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Hengyu Meng
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Linwei Lan
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| | - Xin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, China
| |
Collapse
|
7
|
Ayyoubzadeh SM, Ahmadi M, Yazdipour AB, Ghorbani‐Bidkorpeh F, Ahmadi M. Prediction of ovarian cancer using artificial intelligence tools. Health Sci Rep 2024; 7:e2203. [PMID: 38946777 PMCID: PMC11211920 DOI: 10.1002/hsr2.2203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/05/2024] [Accepted: 06/10/2024] [Indexed: 07/02/2024] Open
Abstract
Purpose Ovarian cancer is a common type of cancer and a leading cause of death in women. Therefore, accurate and fast prediction of ovarian tumors is crucial. One of the appropriate and precise methods for predicting and diagnosing this cancer is to build a model based on artificial intelligence methods. These methods provide a tool for predicting ovarian cancer according to the characteristics and conditions of each person. Method In this study, a data set included records related to 171 cases of benign ovarian tumors, and 178 records related to cases of ovarian cancer were analyzed. The data set contains the records of blood test results and tumor markers of the patients. After data preprocessing, including removing outliers and replacing missing values, the weight of the effective factors was determined using information gain indices and the Gini index. In the next step, predictive models were created using random forest (RF), support vector machine (SVM), decision trees (DT), and artificial neural network (ANN) models. The performance of these models was evaluated using the 10-fold cross-validation method using the indicators of specificity, sensitivity, accuracy, and the area under the receiver operating characteristic curve. Finally, by comparing the performance of the models, the best predictive model of ovarian cancer was selected. Results The most important predictive factors were HE4, CA125, and NEU. The RF model was identified as the best predictive model, with an accuracy of more than 86%. The predictive accuracy of DT, SVM, and ANN models was estimated as 82.91%, 85.25%, and 79.35%, respectively. Various artificial intelligence (AI) tools can be used with high accuracy and sensitivity in predicting ovarian cancer. Conclusion Therefore, the use of these tools can help specialists and patients with early, easier, and less expensive diagnosis of ovarian cancer. Future studies can leverage AI to integrate image data with serum biomarkers, thereby facilitating the creation of novel models and advancing the diagnosis and treatment of ovarian cancer.
Collapse
Affiliation(s)
- Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical SciencesTehran University of Medical SciencesTehranIran
- Health Information Management Research CenterTehran University of Medical SciencesTehranIran
| | - Marjan Ahmadi
- Department of Obstetrics and GynecologyTehran University of Medical SciencesTehranIran
| | - Alireza Banaye Yazdipour
- Department of Health Information Management, School of Allied Medical SciencesTehran University of Medical SciencesTehranIran
- Students' Scientific Research Center (SSRC)Tehran University of Medical SciencesTehranIran
- Department of Health Information Technology, School of Paramedical and Rehabilitation SciencesMashhad University of Medical SciencesMashhadIran
| | - Fatemeh Ghorbani‐Bidkorpeh
- Department of Pharmaceutics and Pharmaceutical Nanotechnology, School of PharmacyShahid Beheshti University of Medical SciencesTehranIran
| | - Mahnaz Ahmadi
- Medical Nanotechnology and Tissue Engineering Research CenterShahid Beheshti University of Medical SciencesTehranIran
| |
Collapse
|
8
|
Sun J, Shao S, Wan H, Wu X, Feng J, Gao Q, Qu W, Xie L. Prediction models for postoperative recurrence of non-lactating mastitis based on machine learning. BMC Med Inform Decis Mak 2024; 24:106. [PMID: 38649879 PMCID: PMC11036744 DOI: 10.1186/s12911-024-02499-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 04/03/2024] [Indexed: 04/25/2024] Open
Abstract
OBJECTIVES This study aims to build a machine learning (ML) model to predict the recurrence probability for postoperative non-lactating mastitis (NLM) by Random Forest (RF) and XGBoost algorithms. It can provide the ability to identify the risk of NLM recurrence and guidance in clinical treatment plan. METHODS This study was conducted on inpatients who were admitted to the Mammary Department of Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine between July 2019 to December 2021. Inpatient data follow-up has been completed until December 2022. Ten features were selected in this study to build the ML model: age, body mass index (BMI), number of abortions, presence of inverted nipples, extent of breast mass, white blood cell count (WBC), neutrophil to lymphocyte ratio (NLR), albumin-globulin ratio (AGR) and triglyceride (TG) and presence of intraoperative discharge. We used two ML approaches (RF and XGBoost) to build models and predict the NLM recurrence risk of female patients. Totally 258 patients were randomly divided into a training set and a test set according to a 75%-25% proportion. The model performance was evaluated based on Accuracy, Precision, Recall, F1-score and AUC. The Shapley Additive Explanations (SHAP) method was used to interpret the model. RESULTS There were 48 (18.6%) NLM patients who experienced recurrence during the follow-up period. Ten features were selected in this study to build the ML model. For the RF model, BMI is the most important influence factor and for the XGBoost model is intraoperative discharge. The results of tenfold cross-validation suggest that both the RF model and the XGBoost model have good predictive performance, but the XGBoost model has a better performance than the RF model in our study. The trends of SHAP values of all features in our models are consistent with the trends of these features' clinical presentation. The inclusion of these ten features in the model is necessary to build practical prediction models for recurrence. CONCLUSIONS The results of tenfold cross-validation and SHAP values suggest that the models have predictive ability. The trend of SHAP value provides auxiliary validation in our models and makes it have more clinical significance.
Collapse
Affiliation(s)
- Jiaye Sun
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Shijun Shao
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Hua Wan
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China.
| | - Xueqing Wu
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China.
| | - Jiamei Feng
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Qingqian Gao
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Wenchao Qu
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| | - Lu Xie
- Department of Mammary, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, 200021, Shanghai, China
| |
Collapse
|
9
|
Nopour R. Screening ovarian cancer by using risk factors: machine learning assists. Biomed Eng Online 2024; 23:18. [PMID: 38347611 PMCID: PMC10863117 DOI: 10.1186/s12938-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/06/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND AND AIM Ovarian cancer (OC) is a prevalent and aggressive malignancy that poses a significant public health challenge. The lack of preventive strategies for OC increases morbidity, mortality, and other negative consequences. Screening OC through risk prediction could be leveraged as a powerful strategy for preventive purposes that have not received much attention. So, this study aimed to leverage machine learning approaches as predictive assistance solutions to screen high-risk groups of OC and achieve practical preventive purposes. MATERIALS AND METHODS As this study is data-driven and retrospective in nature, we leveraged 1516 suspicious OC women data from one concentrated database belonging to six clinical settings in Sari City from 2015 to 2019. Six machine learning (ML) algorithms, including XG-Boost, Random Forest (RF), J-48, support vector machine (SVM), K-nearest neighbor (KNN), and artificial neural network (ANN) were leveraged to construct prediction models for OC. To choose the best model for predicting OC, we compared various prediction models built using the area under the receiver characteristic operator curve (AU-ROC). RESULTS Current experimental results revealed that the XG-Boost with AU-ROC = 0.93 (0.95 CI = [0.91-0.95]) was recognized as the best-performing model for predicting OC. CONCLUSIONS ML approaches possess significant predictive efficiency and interoperability to achieve powerful preventive strategies leveraging OC screening high-risk groups.
Collapse
Affiliation(s)
- Raoof Nopour
- Department of Health Information Management, Student Research Committee, School of Health Management and Information Sciences Branch, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
10
|
Lin L, Ding L, Fu Z, Zhang L. Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization. PLoS One 2024; 19:e0296402. [PMID: 38330052 PMCID: PMC10852291 DOI: 10.1371/journal.pone.0296402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 12/12/2023] [Indexed: 02/10/2024] Open
Abstract
BACKGROUND To construct several prediction models for the risk of stroke in coronary artery disease (CAD) patients receiving coronary revascularization based on machine learning methods. METHODS In total, 5757 CAD patients receiving coronary revascularization admitted to ICU in Medical Information Mart for Intensive Care IV (MIMIC-IV) were included in this cohort study. All the data were randomly split into the training set (n = 4029) and testing set (n = 1728) at 7:3. Pearson correlation analysis and least absolute shrinkage and selection operator (LASSO) regression model were applied for feature screening. Variables with Pearson correlation coefficient<9 were included, and the regression coefficients were set to 0. Features more closely related to the outcome were selected from the 10-fold cross-validation, and features with non-0 Coefficent were retained and included in the final model. The predictive values of the models were evaluated by sensitivity, specificity, area under the curve (AUC), accuracy, and 95% confidence interval (CI). RESULTS The Catboost model presented the best predictive performance with the AUC of 0.831 (95%CI: 0.811-0.851) in the training set, and 0.760 (95%CI: 0.722-0.798) in the testing set. The AUC of the logistic regression model was 0.789 (95%CI: 0.764-0.814) in the training set and 0.731 (95%CI: 0.686-0.776) in the testing set. The results of Delong test revealed that the predictive value of the Catboost model was significantly higher than the logistic regression model (P<0.05). Charlson Comorbidity Index (CCI) was the most important variable associated with the risk of stroke in CAD patients receiving coronary revascularization. CONCLUSION The Catboost model was the optimal model for predicting the risk of stroke in CAD patients receiving coronary revascularization, which might provide a tool to quickly identify CAD patients who were at high risk of postoperative stroke.
Collapse
Affiliation(s)
- Lulu Lin
- Department of Neurology, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Li Ding
- Department of Neurology, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Zhongguo Fu
- Department of Neurology, Shenyang First People’s Hospital, Shenyang, Liaoning, China
| | - Lijiao Zhang
- Department of Cardiology, The Second Hospital of Dalian Medical University, Dalian, Liaoning, China
| |
Collapse
|
11
|
Babaei Rikan S, Sorayaie Azar A, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Wiil UK. Survival prediction of glioblastoma patients using modern deep learning and machine learning techniques. Sci Rep 2024; 14:2371. [PMID: 38287149 PMCID: PMC10824760 DOI: 10.1038/s41598-024-53006-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 01/25/2024] [Indexed: 01/31/2024] Open
Abstract
In this study, we utilized data from the Surveillance, Epidemiology, and End Results (SEER) database to predict the glioblastoma patients' survival outcomes. To assess dataset skewness and detect feature importance, we applied Pearson's second coefficient test of skewness and the Ordinary Least Squares method, respectively. Using two sampling strategies, holdout and five-fold cross-validation, we developed five machine learning (ML) models alongside a feed-forward deep neural network (DNN) for the multiclass classification and regression prediction of glioblastoma patient survival. After balancing the classification and regression datasets, we obtained 46,340 and 28,573 samples, respectively. Shapley additive explanations (SHAP) were then used to explain the decision-making process of the best model. In both classification and regression tasks, as well as across holdout and cross-validation sampling strategies, the DNN consistently outperformed the ML models. Notably, the accuracy were 90.25% and 90.22% for holdout and five-fold cross-validation, respectively, while the corresponding R2 values were 0.6565 and 0.6622. SHAP analysis revealed the importance of age at diagnosis as the most influential feature in the DNN's survival predictions. These findings suggest that the DNN holds promise as a practical auxiliary tool for clinicians, aiding them in optimal decision-making concerning the treatment and care trajectories for glioblastoma patients.
Collapse
Affiliation(s)
| | | | - Amin Naemi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | | | - Habibollah Pirnejad
- Erasmus School of Health Policy and Management (ESHPM), Erasmus University Rotterdam, Rotterdam, The Netherlands.
- Patient Safety Research Center, Clinical Research Institute, Urmia University of Medical Sciences, Urmia, Iran.
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
12
|
Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, Riley RD, Schlussel MM. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. J Clin Epidemiol 2024; 165:111199. [PMID: 37898461 DOI: 10.1016/j.jclinepi.2023.10.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]
Abstract
OBJECTIVE To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine learning methods in the field of oncology. STUDY DESIGN AND SETTING We conducted a systematic review, searching the MEDLINE database between December 1, 2022, and December 31, 2022, for studies developing a multivariable prognostic model using machine learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices. RESULTS We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data sharing statements, with 21 (46%) indicating data were available on request to the authors and seven declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their model to be used in practice. The use of reporting guidelines was rare: eight studies (18%) mentioning using a reporting guideline, with 4 (10%) using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis statement, 1 (2%) using Minimum Information About Clinical Artificial Intelligence Modeling and Consolidated Standards Of Reporting Trials-Artificial Intelligence, 1 (2%) using Strengthening The Reporting Of Observational Studies In Epidemiology, 1 (2%) using Standards for Reporting Diagnostic Accuracy Studies, and 1 (2%) using Transparent Reporting of Evaluations with Nonrandomized Designs. CONCLUSION The adoption of open science principles in oncology studies developing prognostic models using machine learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science are needed for prediction research in oncology.
Collapse
Affiliation(s)
- Gary S Collins
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom.
| | - Rebecca Whittle
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Garrett S Bullock
- Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, NC, USA; Centre for Sport, Exercise and Osteoarthritis Research Versus Arthritis, University of Oxford, Oxford, United Kingdom
| | - Patricia Logullo
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Paula Dhiman
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Jennifer A de Beyer
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| | - Richard D Riley
- Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Michael M Schlussel
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Centre for Statistics in Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
13
|
Yang Z, Zhou D, Huang J. Identifying Explainable Machine Learning Models and a Novel SFRP2 + Fibroblast Signature as Predictors for Precision Medicine in Ovarian Cancer. Int J Mol Sci 2023; 24:16942. [PMID: 38069266 PMCID: PMC10706905 DOI: 10.3390/ijms242316942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023] Open
Abstract
Ovarian cancer (OC) is a type of malignant tumor with a consistently high mortality rate. The diagnosis of early-stage OC and identification of functional subsets in the tumor microenvironment are essential to the development of patient management strategies. However, the development of robust models remains unsatisfactory. We aimed to utilize artificial intelligence and single-cell analysis to address this issue. Two independent datasets were screened from the Gene Expression Omnibus (GEO) database and processed to obtain overlapping differentially expressed genes (DEGs) in stage II-IV vs. stage I diseases. Three explainable machine learning algorithms were integrated to construct models that could determine the tumor stage and extract important characteristic genes as diagnostic biomarkers. Correlations between cancer-associated fibroblast (CAF) infiltration and characteristic gene expression were analyzed using TIMER2.0 and their relationship with survival rates was comprehensively explored via the Kaplan-Meier plotter (KM-plotter) online database. The specific expression of characteristic genes in fibroblast subsets was investigated through single-cell analysis. A novel fibroblast subset signature was explored to predict immune checkpoint inhibitor (ICI) response and oncogene mutation through Tumor Immune Dysfunction and Exclusion (TIDE) and artificial neural network algorithms, respectively. We found that Support Vector Machine-Shapley Additive Explanations (SVM-SHAP), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) successfully diagnosed early-stage OC (stage I). The area under the receiver operating characteristic curves (AUCs) of these models exceeded 0.990. Their overlapping characteristic gene, secreted frizzled-related protein 2 (SFRP2), was a risk factor that affected the overall survival of OC patients with stage II-IV disease (log-rank test: p < 0.01) and was specifically expressed in a fibroblast subset. Finally, the SFRP2+ fibroblast signature served as a novel predictor in evaluating ICI response and exploring pan-cancer tumor protein P53 (TP53) mutation (AUC = 0.853, 95% confidence interval [CI]: 0.829-0.877). In conclusion, the models based on SVM-SHAP, XGBoost, and RF enabled the early detection of OC for clinical decision making, and SFRP2+ fibroblast signature used in diagnostic models can inform OC treatment selection and offer pan-cancer TP53 mutation detection.
Collapse
Affiliation(s)
| | | | - Jun Huang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
14
|
Zuo D, Yang L, Jin Y, Qi H, Liu Y, Ren L. Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Med Inform Decis Mak 2023; 23:276. [PMID: 38031071 PMCID: PMC10688055 DOI: 10.1186/s12911-023-02377-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/17/2023] [Indexed: 12/01/2023] Open
Abstract
Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer.
Collapse
Affiliation(s)
- Duo Zuo
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Lexin Yang
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Yu Jin
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- Tongji University Cancer Center, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Huan Qi
- China Mobile Group Tianjin Company Limited, Tianjin, 300308, China
| | - Yahui Liu
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Li Ren
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China.
- National Clinical Research Center for Cancer, Tianjin, 300060, China.
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China.
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China.
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China.
| |
Collapse
|