Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sorayaie Azar A, Babaei Rikan S, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Bagherzadeh Mohasefi M, Wiil UK. Application of machine learning techniques for predicting survival in ovarian cancer. BMC Med Inform Decis Mak 2022;22:345. [PMID: 36585641 PMCID: PMC9801354 DOI: 10.1186/s12911-022-02087-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 12/15/2022] [Indexed: 12/31/2022] Open

For:	Sorayaie Azar A, Babaei Rikan S, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Bagherzadeh Mohasefi M, Wiil UK. Application of machine learning techniques for predicting survival in ovarian cancer. BMC Med Inform Decis Mak 2022;22:345. [PMID: 36585641 PMCID: PMC9801354 DOI: 10.1186/s12911-022-02087-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 12/15/2022] [Indexed: 12/31/2022] Open

Number

Cited by Other Article(s)

Shen M, Zhang Y, Zhan R, Du T, Shen P, Lu X, Liu S, Guo R, Shen X. Predicting the risk of cardiovascular disease in adults exposed to heavy metals: Interpretable machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024;290:117570. [PMID: 39721423 DOI: 10.1016/j.ecoenv.2024.117570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/16/2024] [Accepted: 12/17/2024] [Indexed: 12/28/2024]

Lee N, Jeon K, Park MJ, Song W, Jeong S. Predicting survival in patients with SARS-CoV-2 based on cytokines and soluble immune checkpoint regulators. Front Cell Infect Microbiol 2024;14:1397297. [PMID: 39654974 PMCID: PMC11625743 DOI: 10.3389/fcimb.2024.1397297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 10/31/2024] [Indexed: 12/12/2024] Open

Abstract

Background

Coronavirus disease 2019 (COVID-19) has been widespread for over four years and has progressed to an endemic stage. Accordingly, the evaluation of host immunity in infected patients and the development of markers for prognostic prediction in the early stages have been emphasized. Soluble immune checkpoints (sICs), which regulate T cell activity, have been reported as promising biomarkers of viral infections.

Methods

In this study, quantitative values of 17 sICs and 16 cytokines (CKs) were measured using the Luminex multiplex assay. A total of 148 serum samples from 100 patients with COVID-19 were collected and the levels were compared between survivors vs. non-survivors and pneumonic vs. non-pneumonic conditions groups. The impact of these markers on overall survival were analyzed using a machine learning algorithm.

Results

sICs, including sCD27, sCD40, herpes virus entry mediator (sHVEM), T-cell immunoglobulin and mucin-domain containing-3 (sTIM-3), and Toll-like receptor 2 (sTLR-2) and CKs, including chemokine CC motif ligand 2 (CCL2), interleukin-6 (IL-6), IL-8, IL-10, IL-13, granulocyte-macrophage colony-stimulating factor (GM-CSF), and tumor necrosis factor-α (TNF- α), were statistically significantly increased in the non-survivors compared to those of in the survivors. IL-6 showed the highest area under the receiver-operating curve (0.844, 95% CI = 0.751-0.913) to discriminate non-survival, with a sensitivity of 78.9% and specificity of 82.4%. In Kaplan-Meier analysis, patients with procalcitonin over 0.25 ng/mL, C-reactive protein (CRP) over 41.0 mg/dL, neutrophil-to-lymphocyte ratio over 18.97, sCD27 over 3828.8 pg/mL, sCD40 over 1283.6 pg/mL, and IL-6 over 21.6 pg/mL showed poor survival (log-rank test). In the decision tree analysis, IL-6, sTIM-3, and sCD40 levels had a strong impact on survival. Moreover, IL-6, CD40, and CRP levels were important to predict the probability of 90-d mortality using the SHapley Additive exPlanations method.

Conclusion

sICs and CKs, especially IL-6, sCD27, sCD40, and sTIM-3 are expected to be useful in predicting patient outcomes when used in combination with existing markers.

Collapse

Sorayaie Azar A, Samimi T, Tavassoli G, Naemi A, Rahimi B, Hadianfard Z, Wiil UK, Nazarbaghi S, Bagherzadeh Mohasefi J, Lotfnezhad Afshar H. Predicting stroke severity of patients using interpretable machine learning algorithms. Eur J Med Res 2024;29:547. [PMID: 39538301 PMCID: PMC11562860 DOI: 10.1186/s40001-024-02147-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 11/05/2024] [Indexed: 11/16/2024] Open

Abstract

BACKGROUND

Stroke is a significant global health concern, ranking as the second leading cause of death and placing a substantial financial burden on healthcare systems, particularly in low- and middle-income countries. Timely evaluation of stroke severity is crucial for predicting clinical outcomes, with standard assessment tools being the Rapid Arterial Occlusion Evaluation (RACE) and the National Institutes of Health Stroke Scale (NIHSS). This study aims to utilize Machine Learning (ML) algorithms to predict stroke severity using these two distinct scales.

METHODS

We conducted this study using two datasets collected from hospitals in Urmia, Iran, corresponding to stroke severity assessments based on RACE and NIHSS. Seven ML algorithms were applied, including K-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), and Artificial Neural Network (ANN). Hyperparameter tuning was performed using grid search to optimize model performance, and SHapley Additive Explanations (SHAP) were used to interpret the contribution of individual features.

RESULTS

Among the models, the RF achieved the highest performance, with accuracies of 92.68% for the RACE dataset and 91.19% for the NIHSS dataset. The Area Under the Curve (AUC) was 92.02% and 97.86% for the RACE and NIHSS datasets, respectively. The SHAP analysis identified triglyceride levels, length of hospital stay, and age as critical predictors of stroke severity.

CONCLUSIONS

This study is the first to apply ML models to the RACE and NIHSS scales for predicting stroke severity. The use of SHAP enhances the interpretability of the models, increasing clinicians' trust in these ML algorithms. The best-performing ML model can be a valuable tool for assisting medical professionals in predicting stroke severity in clinical settings.

Collapse

Affiliation(s)

Amir Sorayaie Azar SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark Department of Computer Engineering, Urmia University, Urmia, Iran
Tahereh Samimi Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran
Ghanbar Tavassoli Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran
Amin Naemi SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Bahlol Rahimi Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran
Zahra Hadianfard Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran
Uffe Kock Wiil SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
Surena Nazarbaghi Department of Neurology, School of Medicine, Urmia University of Medical Sciences, Urmia, Iran
Jamshid Bagherzadeh Mohasefi SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark. Department of Computer Engineering, Urmia University, Urmia, Iran.
Hadi Lotfnezhad Afshar Department of Health Information Technology, Urmia University of Medical Sciences, Urmia, Iran. Health and Biomedical Informatics Research Center, Urmia University of Medical Sciences, Urmia, Iran.

Collapse

Park SW, Park YL, Lee EG, Chae H, Park P, Choi DW, Choi YH, Hwang J, Ahn S, Kim K, Kim WJ, Kong SY, Jung SY, Kim HJ. Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning. Cancers (Basel) 2024;16:3799. [PMID: 39594754 PMCID: PMC11592669 DOI: 10.3390/cancers16223799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 11/06/2024] [Accepted: 11/09/2024] [Indexed: 11/28/2024] Open

Affiliation(s)

Sang Won Park Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.W.P.) Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea
Ye-Lin Park Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Eun-Gyeong Lee Department of Surgery, Center of Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea
Heejung Chae Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.) Department of Medical Oncology, Center for Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea
Phillip Park Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Dong-Woo Choi Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Yeon Ho Choi Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Juyeon Hwang Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Seohyun Ahn Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Keunkyun Kim Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)
Woo Jin Kim Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea; (S.W.P.) Department of Internal Medicine, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea
Sun-Young Kong Targeted Therapy Branch, Research Institute, National Cancer Center, Goyang 10408, Republic of Korea Department of Laboratory Medicine, Hospital, National Cancer Center, Goyang 10408, Republic of Korea
So-Youn Jung Department of Surgery, Center of Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea
Hyun-Jin Kim Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea; (Y.-L.P.)

Collapse

El-Latif EIA, El-Dosuky M, Darwish A, Hassanien AE. A deep learning approach for ovarian cancer detection and classification based on fuzzy deep learning. Sci Rep 2024;14:26463. [PMID: 39488573 PMCID: PMC11531531 DOI: 10.1038/s41598-024-75830-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 10/08/2024] [Indexed: 11/04/2024] Open

Qu Z, Wang Y, Guo D, He G, Sui C, Duan Y, Zhang X, Meng H, Lan L, Liu X. Comparison of deep learning models to traditional Cox regression in predicting survival of colon cancer: Based on the SEER database. J Gastroenterol Hepatol 2024;39:1816-1826. [PMID: 38725241 DOI: 10.1111/jgh.16598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 04/08/2024] [Accepted: 04/21/2024] [Indexed: 10/01/2024]

Ayyoubzadeh SM, Ahmadi M, Yazdipour AB, Ghorbani‐Bidkorpeh F, Ahmadi M. Prediction of ovarian cancer using artificial intelligence tools. Health Sci Rep 2024;7:e2203. [PMID: 38946777 PMCID: PMC11211920 DOI: 10.1002/hsr2.2203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/05/2024] [Accepted: 06/10/2024] [Indexed: 07/02/2024] Open

Abstract

Purpose

Ovarian cancer is a common type of cancer and a leading cause of death in women. Therefore, accurate and fast prediction of ovarian tumors is crucial. One of the appropriate and precise methods for predicting and diagnosing this cancer is to build a model based on artificial intelligence methods. These methods provide a tool for predicting ovarian cancer according to the characteristics and conditions of each person.

Method

In this study, a data set included records related to 171 cases of benign ovarian tumors, and 178 records related to cases of ovarian cancer were analyzed. The data set contains the records of blood test results and tumor markers of the patients. After data preprocessing, including removing outliers and replacing missing values, the weight of the effective factors was determined using information gain indices and the Gini index. In the next step, predictive models were created using random forest (RF), support vector machine (SVM), decision trees (DT), and artificial neural network (ANN) models. The performance of these models was evaluated using the 10-fold cross-validation method using the indicators of specificity, sensitivity, accuracy, and the area under the receiver operating characteristic curve. Finally, by comparing the performance of the models, the best predictive model of ovarian cancer was selected.

Results

The most important predictive factors were HE4, CA125, and NEU. The RF model was identified as the best predictive model, with an accuracy of more than 86%. The predictive accuracy of DT, SVM, and ANN models was estimated as 82.91%, 85.25%, and 79.35%, respectively. Various artificial intelligence (AI) tools can be used with high accuracy and sensitivity in predicting ovarian cancer.

Conclusion

Therefore, the use of these tools can help specialists and patients with early, easier, and less expensive diagnosis of ovarian cancer. Future studies can leverage AI to integrate image data with serum biomarkers, thereby facilitating the creation of novel models and advancing the diagnosis and treatment of ovarian cancer.

Collapse

Sun J, Shao S, Wan H, Wu X, Feng J, Gao Q, Qu W, Xie L. Prediction models for postoperative recurrence of non-lactating mastitis based on machine learning. BMC Med Inform Decis Mak 2024;24:106. [PMID: 38649879 PMCID: PMC11036744 DOI: 10.1186/s12911-024-02499-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 04/03/2024] [Indexed: 04/25/2024] Open

Abstract

OBJECTIVES

This study aims to build a machine learning (ML) model to predict the recurrence probability for postoperative non-lactating mastitis (NLM) by Random Forest (RF) and XGBoost algorithms. It can provide the ability to identify the risk of NLM recurrence and guidance in clinical treatment plan.

METHODS

This study was conducted on inpatients who were admitted to the Mammary Department of Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine between July 2019 to December 2021. Inpatient data follow-up has been completed until December 2022. Ten features were selected in this study to build the ML model: age, body mass index (BMI), number of abortions, presence of inverted nipples, extent of breast mass, white blood cell count (WBC), neutrophil to lymphocyte ratio (NLR), albumin-globulin ratio (AGR) and triglyceride (TG) and presence of intraoperative discharge. We used two ML approaches (RF and XGBoost) to build models and predict the NLM recurrence risk of female patients. Totally 258 patients were randomly divided into a training set and a test set according to a 75%-25% proportion. The model performance was evaluated based on Accuracy, Precision, Recall, F1-score and AUC. The Shapley Additive Explanations (SHAP) method was used to interpret the model.

RESULTS

There were 48 (18.6%) NLM patients who experienced recurrence during the follow-up period. Ten features were selected in this study to build the ML model. For the RF model, BMI is the most important influence factor and for the XGBoost model is intraoperative discharge. The results of tenfold cross-validation suggest that both the RF model and the XGBoost model have good predictive performance, but the XGBoost model has a better performance than the RF model in our study. The trends of SHAP values of all features in our models are consistent with the trends of these features' clinical presentation. The inclusion of these ten features in the model is necessary to build practical prediction models for recurrence.

CONCLUSIONS

The results of tenfold cross-validation and SHAP values suggest that the models have predictive ability. The trend of SHAP value provides auxiliary validation in our models and makes it have more clinical significance.

Collapse

Nopour R. Screening ovarian cancer by using risk factors: machine learning assists. Biomed Eng Online 2024;23:18. [PMID: 38347611 PMCID: PMC10863117 DOI: 10.1186/s12938-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/06/2024] [Indexed: 02/15/2024] Open

Lin L, Ding L, Fu Z, Zhang L. Machine learning-based models for prediction of the risk of stroke in coronary artery disease patients receiving coronary revascularization. PLoS One 2024;19:e0296402. [PMID: 38330052 PMCID: PMC10852291 DOI: 10.1371/journal.pone.0296402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 12/12/2023] [Indexed: 02/10/2024] Open

Abstract

BACKGROUND

To construct several prediction models for the risk of stroke in coronary artery disease (CAD) patients receiving coronary revascularization based on machine learning methods.

METHODS

In total, 5757 CAD patients receiving coronary revascularization admitted to ICU in Medical Information Mart for Intensive Care IV (MIMIC-IV) were included in this cohort study. All the data were randomly split into the training set (n = 4029) and testing set (n = 1728) at 7:3. Pearson correlation analysis and least absolute shrinkage and selection operator (LASSO) regression model were applied for feature screening. Variables with Pearson correlation coefficient<9 were included, and the regression coefficients were set to 0. Features more closely related to the outcome were selected from the 10-fold cross-validation, and features with non-0 Coefficent were retained and included in the final model. The predictive values of the models were evaluated by sensitivity, specificity, area under the curve (AUC), accuracy, and 95% confidence interval (CI).

RESULTS

The Catboost model presented the best predictive performance with the AUC of 0.831 (95%CI: 0.811-0.851) in the training set, and 0.760 (95%CI: 0.722-0.798) in the testing set. The AUC of the logistic regression model was 0.789 (95%CI: 0.764-0.814) in the training set and 0.731 (95%CI: 0.686-0.776) in the testing set. The results of Delong test revealed that the predictive value of the Catboost model was significantly higher than the logistic regression model (P<0.05). Charlson Comorbidity Index (CCI) was the most important variable associated with the risk of stroke in CAD patients receiving coronary revascularization.

CONCLUSION

The Catboost model was the optimal model for predicting the risk of stroke in CAD patients receiving coronary revascularization, which might provide a tool to quickly identify CAD patients who were at high risk of postoperative stroke.

Collapse

Babaei Rikan S, Sorayaie Azar A, Naemi A, Bagherzadeh Mohasefi J, Pirnejad H, Wiil UK. Survival prediction of glioblastoma patients using modern deep learning and machine learning techniques. Sci Rep 2024;14:2371. [PMID: 38287149 PMCID: PMC10824760 DOI: 10.1038/s41598-024-53006-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 01/25/2024] [Indexed: 01/31/2024] Open

Collins GS, Whittle R, Bullock GS, Logullo P, Dhiman P, de Beyer JA, Riley RD, Schlussel MM. Open science practices need substantial improvement in prognostic model studies in oncology using machine learning. J Clin Epidemiol 2024;165:111199. [PMID: 37898461 DOI: 10.1016/j.jclinepi.2023.10.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/06/2023] [Accepted: 10/20/2023] [Indexed: 10/30/2023]

Abstract

OBJECTIVE

To describe the frequency of open science practices in a contemporary sample of studies developing prognostic models using machine learning methods in the field of oncology.

STUDY DESIGN AND SETTING

We conducted a systematic review, searching the MEDLINE database between December 1, 2022, and December 31, 2022, for studies developing a multivariable prognostic model using machine learning methods (as defined by the authors) in oncology. Two authors independently screened records and extracted open science practices.

RESULTS

We identified 46 publications describing the development of a multivariable prognostic model. The adoption of open science principles was poor. Only one study reported availability of a study protocol, and only one study was registered. Funding statements and conflicts of interest statements were common. Thirty-five studies (76%) provided data sharing statements, with 21 (46%) indicating data were available on request to the authors and seven declaring data sharing was not applicable. Two studies (4%) shared data. Only 12 studies (26%) provided code sharing statements, including 2 (4%) that indicated the code was available on request to the authors. Only 11 studies (24%) provided sufficient information to allow their model to be used in practice. The use of reporting guidelines was rare: eight studies (18%) mentioning using a reporting guideline, with 4 (10%) using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis statement, 1 (2%) using Minimum Information About Clinical Artificial Intelligence Modeling and Consolidated Standards Of Reporting Trials-Artificial Intelligence, 1 (2%) using Strengthening The Reporting Of Observational Studies In Epidemiology, 1 (2%) using Standards for Reporting Diagnostic Accuracy Studies, and 1 (2%) using Transparent Reporting of Evaluations with Nonrandomized Designs.

CONCLUSION

The adoption of open science principles in oncology studies developing prognostic models using machine learning methods is poor. Guidance and an increased awareness of benefits and best practices of open science are needed for prediction research in oncology.

Collapse

Yang Z, Zhou D, Huang J. Identifying Explainable Machine Learning Models and a Novel SFRP2⁺ Fibroblast Signature as Predictors for Precision Medicine in Ovarian Cancer. Int J Mol Sci 2023;24:16942. [PMID: 38069266 PMCID: PMC10706905 DOI: 10.3390/ijms242316942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023] Open

Abstract

Ovarian cancer (OC) is a type of malignant tumor with a consistently high mortality rate. The diagnosis of early-stage OC and identification of functional subsets in the tumor microenvironment are essential to the development of patient management strategies. However, the development of robust models remains unsatisfactory. We aimed to utilize artificial intelligence and single-cell analysis to address this issue. Two independent datasets were screened from the Gene Expression Omnibus (GEO) database and processed to obtain overlapping differentially expressed genes (DEGs) in stage II-IV vs. stage I diseases. Three explainable machine learning algorithms were integrated to construct models that could determine the tumor stage and extract important characteristic genes as diagnostic biomarkers. Correlations between cancer-associated fibroblast (CAF) infiltration and characteristic gene expression were analyzed using TIMER2.0 and their relationship with survival rates was comprehensively explored via the Kaplan-Meier plotter (KM-plotter) online database. The specific expression of characteristic genes in fibroblast subsets was investigated through single-cell analysis. A novel fibroblast subset signature was explored to predict immune checkpoint inhibitor (ICI) response and oncogene mutation through Tumor Immune Dysfunction and Exclusion (TIDE) and artificial neural network algorithms, respectively. We found that Support Vector Machine-Shapley Additive Explanations (SVM-SHAP), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) successfully diagnosed early-stage OC (stage I). The area under the receiver operating characteristic curves (AUCs) of these models exceeded 0.990. Their overlapping characteristic gene, secreted frizzled-related protein 2 (SFRP2), was a risk factor that affected the overall survival of OC patients with stage II-IV disease (log-rank test: p < 0.01) and was specifically expressed in a fibroblast subset. Finally, the SFRP2+ fibroblast signature served as a novel predictor in evaluating ICI response and exploring pan-cancer tumor protein P53 (TP53) mutation (AUC = 0.853, 95% confidence interval [CI]: 0.829-0.877). In conclusion, the models based on SVM-SHAP, XGBoost, and RF enabled the early detection of OC for clinical decision making, and SFRP2+ fibroblast signature used in diagnostic models can inform OC treatment selection and offer pan-cancer TP53 mutation detection.

Collapse

Zuo D, Yang L, Jin Y, Qi H, Liu Y, Ren L. Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Med Inform Decis Mak 2023;23:276. [PMID: 38031071 PMCID: PMC10688055 DOI: 10.1186/s12911-023-02377-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/17/2023] [Indexed: 12/01/2023] Open

Abstract

Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer.

Collapse

Affiliation(s)

Duo Zuo Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China National Clinical Research Center for Cancer, Tianjin, 300060, China Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
Lexin Yang Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China National Clinical Research Center for Cancer, Tianjin, 300060, China Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
Yu Jin Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China Tongji University Cancer Center, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
Huan Qi China Mobile Group Tianjin Company Limited, Tianjin, 300308, China
Yahui Liu Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China National Clinical Research Center for Cancer, Tianjin, 300060, China Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
Li Ren Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China. National Clinical Research Center for Cancer, Tianjin, 300060, China. Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China. Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China. Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China.

Collapse