Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Du M, Haag DG, Lynch JW, Mittinty MN. Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database. Cancers (Basel) 2020;12:E2802. [PMID: 33003533 DOI: 10.3390/cancers12102802] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 09/23/2020] [Accepted: 09/27/2020] [Indexed: 12/24/2022] Open

For:	Du M, Haag DG, Lynch JW, Mittinty MN. Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database. Cancers (Basel) 2020;12:E2802. [PMID: 33003533 DOI: 10.3390/cancers12102802] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 09/23/2020] [Accepted: 09/27/2020] [Indexed: 12/24/2022] Open

Number

Cited by Other Article(s)

Li X, Wang Z, Zhao W, Shi R, Zhu Y, Pan H, Wang D. Machine learning algorithm for predict the in-hospital mortality in critically ill patients with congestive heart failure combined with chronic kidney disease. Ren Fail 2024;46:2315298. [PMID: 38357763 PMCID: PMC10877653 DOI: 10.1080/0886022x.2024.2315298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024] Open

Affiliation(s)

Xunliang Li Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
Zhijuan Wang Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
Wenman Zhao Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
Rui Shi Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
Yuyu Zhu Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
Haifeng Pan Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, China Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, China
Deguang Wang Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Hefei, China

Collapse

Kolasseri AE, B V. Comparative study of machine learning and statistical survival models for enhancing cervical cancer prognosis and risk factor assessment using SEER data. Sci Rep 2024;14:22203. [PMID: 39333298 PMCID: PMC11437206 DOI: 10.1038/s41598-024-72790-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Accepted: 09/10/2024] [Indexed: 09/29/2024] Open

Peng C, Peng L, Yang F, Yu H, Chen Q, Guo Y, Xu S, Jin Z. The prediction of the survival in patients with severe trauma during prehospital care: Analyses based on NTDB database. Eur J Trauma Emerg Surg 2024;50:1599-1609. [PMID: 38483558 DOI: 10.1007/s00068-024-02484-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 02/19/2024] [Indexed: 10/08/2024]

Gan T, Guan H, Li P, Huang X, Li Y, Zhang R, Li T. Risk prediction models for cardiovascular events in hemodialysis patients: A systematic review. Semin Dial 2024;37:101-109. [PMID: 37743062 DOI: 10.1111/sdi.13181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 06/25/2023] [Accepted: 09/10/2023] [Indexed: 09/26/2023]

Park SW, Yeo NY, Kang S, Ha T, Kim TH, Lee D, Kim D, Choi S, Kim M, Lee D, Kim D, Kim WJ, Lee SJ, Heo YJ, Moon DH, Han SS, Kim Y, Choi HS, Oh DK, Lee SY, Park M, Lim CM, Heo J. Early Prediction of Mortality for Septic Patients Visiting Emergency Room Based on Explainable Machine Learning: A Real-World Multicenter Study. J Korean Med Sci 2024;39:e53. [PMID: 38317451 PMCID: PMC10843974 DOI: 10.3346/jkms.2024.39.e53] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 12/05/2023] [Indexed: 02/07/2024] Open

Abstract

BACKGROUND

Worldwide, sepsis is the leading cause of death in hospitals. If mortality rates in patients with sepsis can be predicted early, medical resources can be allocated efficiently. We constructed machine learning (ML) models to predict the mortality of patients with sepsis in a hospital emergency department.

METHODS

This study prospectively collected nationwide data from an ongoing multicenter cohort of patients with sepsis identified in the emergency department. Patients were enrolled from 19 hospitals between September 2019 and December 2020. For acquired data from 3,657 survivors and 1,455 deaths, six ML models (logistic regression, support vector machine, random forest, extreme gradient boosting [XGBoost], light gradient boosting machine, and categorical boosting [CatBoost]) were constructed using fivefold cross-validation to predict mortality. Through these models, 44 clinical variables measured on the day of admission were compared with six sequential organ failure assessment (SOFA) components (PaO2/FIO2 [PF], platelets (PLT), bilirubin, cardiovascular, Glasgow Coma Scale score, and creatinine). The confidence interval (CI) was obtained by performing 10,000 repeated measurements via random sampling of the test dataset. All results were explained and interpreted using Shapley's additive explanations (SHAP).

RESULTS

Of the 5,112 participants, CatBoost exhibited the highest area under the curve (AUC) of 0.800 (95% CI, 0.756-0.840) using clinical variables. Using the SOFA components for the same patient, XGBoost exhibited the highest AUC of 0.678 (95% CI, 0.626-0.730). As interpreted by SHAP, albumin, lactate, blood urea nitrogen, and international normalization ratio were determined to significantly affect the results. Additionally, PF and PLTs in the SOFA component significantly influenced the prediction results.

CONCLUSION

Newly established ML-based models achieved good prediction of mortality in patients with sepsis. Using several clinical variables acquired at the baseline can provide more accurate results for early predictions than using SOFA components. Additionally, the impact of each variable was identified.

Collapse

Affiliation(s)

Sang Won Park Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon, Korea Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon, Korea
Na Young Yeo Department of Medical Bigdata Convergence, Kangwon National University, Chuncheon, Korea
Seonguk Kang Department of Convergence Security, Kangwon National University, Chuncheon, Korea
Taejun Ha Department of Biomedical Research Institute, Kangwon National University Hospital, Chuncheon, Korea
Tae-Hoon Kim University-Industry Cooperation Foundation, Kangwon National University, Chuncheon, Korea
DooHee Lee Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea
Dowon Kim Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea
Seheon Choi Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea
Minkyu Kim Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea
DongHoon Lee Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea
DoHyeon Kim Department of Research and Development, ZIOVISION Co. Ltd., Chuncheon, Korea
Woo Jin Kim Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon, Korea Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea
Seung-Joon Lee Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea
Yeon-Jeong Heo Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea
Da Hye Moon Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea
Seon-Sook Han Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea
Yoon Kim University-Industry Cooperation Foundation, Kangwon National University, Chuncheon, Korea Department of Computer Science and Engineering, Kangwon National University, Chuncheon, Korea
Hyun-Soo Choi University-Industry Cooperation Foundation, Kangwon National University, Chuncheon, Korea Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, Korea
Dong Kyu Oh Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
Su Yeon Lee Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
MiHyeon Park Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
Chae-Man Lim Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
Jeongwon Heo Department of Internal Medicine, Kangwon National University Hospital, Chuncheon, Korea Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon, Korea.

Collapse

Li S, Yi H, Leng Q, Wu Y, Mao Y. New perspectives on cancer clinical research in the era of big data and machine learning. Surg Oncol 2024;52:102009. [PMID: 38215544 DOI: 10.1016/j.suronc.2023.102009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/16/2023] [Indexed: 01/14/2024]

Chen C, Zhang W, Yan G, Tang C. Identifying metabolic dysfunction-associated steatotic liver disease in patients with hypertension and pre-hypertension: An interpretable machine learning approach. Digit Health 2024;10:20552076241233135. [PMID: 38389508 PMCID: PMC10883118 DOI: 10.1177/20552076241233135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open

Abstract

Objective

Metabolic dysfunction-associated steatotic liver disease (MASLD) is one of the most prevalent liver diseases and is associated with pre-hypertension and hypertension. Our research aims to develop interpretable machine learning (ML) models to accurately identify MASLD in hypertensive and pre-hypertensive populations.

Methods

The dataset for 4722 hypertensive and pre-hypertensive patients is from subjects in the NAGALA study. Six ML models, including the decision tree, K-nearest neighbor, gradient boosting, naive Bayes, support vector machine, and random forest (RF) models, were used in this study. The optimal model was constructed according to the performances of models evaluated by K-fold cross-validation (k = 5), the area under the receiver operating characteristic curve (AUC), average precision (AP), accuracy, sensitivity, specificity, and F1. Shapley additive explanation (SHAP) values were employed for both global and local interpretation of the model results.

Results

The prevalence of MASLD in hypertensive and pre-hypertensive patients was 44.3% (362 cases) and 28.3% (1107 cases), respectively. The RF model outperformed the other five models with an AUC of 0.889, AP of 0.800, accuracy of 0.819, sensitivity of 0.816, specificity of 0.821, and F1 of 0.729. According to the SHAP analysis, the top five important features were alanine aminotransferase, body mass index, waist circumference, high-density lipoprotein cholesterol, and total cholesterol. Further analysis of the feature selection in the RF model revealed that incorporating all features leads to optimal model performance.

Conclusions

ML algorithms, especially RF algorithm, improve the accuracy of MASLD identification, and the global and local interpretation of the RF model results enables us to intuitively understand how various features affect the chances of MASLD in patients with hypertension and pre-hypertension.

Collapse

Peng ZH, Tian JH, Chen BH, Zhou HB, Bi H, He MX, Li MR, Zheng XY, Wang YW, Chong T, Li ZL. Development of machine learning prognostic models for overall survival of prostate cancer patients with lymph node-positive. Sci Rep 2023;13:18424. [PMID: 37891423 PMCID: PMC10611782 DOI: 10.1038/s41598-023-45804-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 10/24/2023] [Indexed: 10/29/2023] Open

Abstract

Prostate cancer (PCa) patients with lymph node involvement (LNI) constitute a single-risk group with varied prognoses. Existing studies on this group have focused solely on those who underwent prostatectomy (RP), using statistical models to predict prognosis. This study aimed to develop an easily accessible individual survival prediction tool based on multiple machine learning (ML) algorithms to predict survival probability for PCa patients with LNI. A total of 3280 PCa patients with LNI were identified from the Surveillance, Epidemiology, and End Results (SEER) database, covering the years 2000-2019. The primary endpoint was overall survival (OS). Gradient Boosting Survival Analysis (GBSA), Random Survival Forest (RSF), and Extra Survival Trees (EST) were used to develop prognosis models, which were compared to Cox regression. Discrimination was evaluated using the time-dependent areas under the receiver operating characteristic curve (time-dependent AUC) and the concordance index (c-index). Calibration was assessed using the time-dependent Brier score (time-dependent BS) and the integrated Brier score (IBS). Moreover, the beeswarm summary plot in SHAP (SHapley Additive exPlanations) was used to display the contribution of variables to the results. The 3280 patients were randomly split into a training cohort (n = 2624) and a validation cohort (n = 656). Nine variables including age at diagnosis, race, marital status, clinical T stage, prostate-specific antigen (PSA) level at diagnosis, Gleason Score (GS), number of positive lymph nodes, radical prostatectomy (RP), and radiotherapy (RT) were used to develop models. The mean time-dependent AUC for GBSA, RSF, and EST was 0.782 (95% confidence interval [CI] 0.779-0.783), 0.779 (95% CI 0.776-0.780), and 0.781 (95% CI 0.778-0.782), respectively, which were higher than the Cox regression model of 0.770 (95% CI 0.769-0.773). Additionally, all models demonstrated almost similar calibration, with low IBS. A web-based prediction tool was developed using the best-performing GBSA, which is accessible at https://pengzihexjtu-pca-n1.streamlit.app/ . ML algorithms showed better performance compared with Cox regression and we developed a web-based tool, which may help to guide patient treatment and follow-up.

Collapse

Yang X, Qiu H, Wang L, Wang X. Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study. J Med Internet Res 2023;25:e44417. [PMID: 37883174 PMCID: PMC10636616 DOI: 10.2196/44417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/22/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open

Abstract

BACKGROUND

Machine learning (ML) methods have shown great potential in predicting colorectal cancer (CRC) survival. However, the ML models introduced thus far have mainly focused on binary outcomes and have not considered the time-to-event nature of this type of modeling.

OBJECTIVE

This study aims to evaluate the performance of ML approaches for modeling time-to-event survival data and develop transparent models for predicting CRC-specific survival.

METHODS

The data set used in this retrospective cohort study contains information on patients who were newly diagnosed with CRC between December 28, 2012, and December 27, 2019, at West China Hospital, Sichuan University. We assessed the performance of 6 representative ML models, including random survival forest (RSF), gradient boosting machine (GBM), DeepSurv, DeepHit, neural net-extended time-dependent Cox (or Cox-Time), and neural multitask logistic regression (N-MTLR) in predicting CRC-specific survival. Multiple imputation by chained equations method was applied to handle missing values in variables. Multivariable analysis and clinical experience were used to select significant features associated with CRC survival. Model performance was evaluated in stratified 5-fold cross-validation repeated 5 times by using the time-dependent concordance index, integrated Brier score, calibration curves, and decision curves. The SHapley Additive exPlanations method was applied to calculate feature importance.

RESULTS

A total of 2157 patients with CRC were included in this study. Among the 6 time-to-event ML models, the DeepHit model exhibited the best discriminative ability (time-dependent concordance index 0.789, 95% CI 0.779-0.799) and the RSF model produced better-calibrated survival estimates (integrated Brier score 0.096, 95% CI 0.094-0.099), but these are not statistically significant. Additionally, the RSF, GBM, DeepSurv, Cox-Time, and N-MTLR models have comparable predictive accuracy to the Cox Proportional Hazards model in terms of discrimination and calibration. The calibration curves showed that all the ML models exhibited good 5-year survival calibration. The decision curves for CRC-specific survival at 5 years showed that all the ML models, especially RSF, had higher net benefits than default strategies of treating all or no patients at a range of clinically reasonable risk thresholds. The SHapley Additive exPlanations method revealed that R0 resection, tumor-node-metastasis staging, and the number of positive lymph nodes were important factors for 5-year CRC-specific survival.

CONCLUSIONS

This study showed the potential of applying time-to-event ML predictive algorithms to help predict CRC-specific survival. The RSF, GBM, Cox-Time, and N-MTLR algorithms could provide nonparametric alternatives to the Cox Proportional Hazards model in estimating the survival probability of patients with CRC. The transparent time-to-event ML models help clinicians to more accurately predict the survival rate for these patients and improve patient outcomes by enabling personalized treatment plans that are informed by explainable ML models.

Collapse

Xia K, Chen D, Jin S, Yi X, Luo L. Prediction of lung papillary adenocarcinoma-specific survival using ensemble machine learning models. Sci Rep 2023;13:14827. [PMID: 37684259 PMCID: PMC10491759 DOI: 10.1038/s41598-023-40779-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/16/2023] [Indexed: 09/10/2023] Open

Abstract

Accurate prognostic prediction is crucial for treatment decision-making in lung papillary adenocarcinoma (LPADC). The aim of this study was to predict cancer-specific survival in LPADC using ensemble machine learning and classical Cox regression models. Moreover, models were evaluated to provide recommendations based on quantitative data for personalized treatment of LPADC. Data of patients diagnosed with LPADC (2004-2018) were extracted from the Surveillance, Epidemiology, and End Results database. The set of samples was randomly divided into the training and validation sets at a ratio of 7:3. Three ensemble models were selected, namely gradient boosting survival (GBS), random survival forest (RSF), and extra survival trees (EST). In addition, Cox proportional hazards (CoxPH) regression was used to construct the prognostic models. The Harrell's concordance index (C-index), integrated Brier score (IBS), and area under the time-dependent receiver operating characteristic curve (time-dependent AUC) were used to evaluate the performance of the predictive models. A user-friendly web access panel was provided to easily evaluate the model for the prediction of survival and treatment recommendations. A total of 3615 patients were randomly divided into the training and validation cohorts (n = 2530 and 1085, respectively). The extra survival trees, RSF, GBS, and CoxPH models showed good discriminative ability and calibration in both the training and validation cohorts (mean of time-dependent AUC: > 0.84 and > 0.82; C-index: > 0.79 and > 0.77; IBS: < 0.16 and < 0.17, respectively). The RSF and GBS models were more consistent than the CoxPH model in predicting long-term survival. We implemented the developed models as web applications for deployment into clinical practice (accessible through https://shinyshine-820-lpaprediction-model-z3ubbu.streamlit.app/ ). All four prognostic models showed good discriminative ability and calibration. The RSF and GBS models exhibited the highest effectiveness among all models in predicting the long-term cancer-specific survival of patients with LPADC. This approach may facilitate the development of personalized treatment plans and prediction of prognosis for LPADC.

Collapse

Hao Y, Liang D, Zhang S, Wu S, Li D, Wang Y, Shi M, He Y. Machine learning for predicting the survival in osteosarcoma patients: Analysis based on American and Hebei Province cohort. BIOMOLECULES & BIOMEDICINE 2023;23:883-893. [PMID: 36967662 PMCID: PMC10494842 DOI: 10.17305/bb.2023.8804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/23/2023] [Accepted: 03/23/2023] [Indexed: 06/18/2023]

Xiu Y, Jiang C, Zhang S, Yu X, Qiao K, Huang Y. Prediction of nonsentinel lymph node metastasis in breast cancer patients based on machine learning. World J Surg Oncol 2023;21:244. [PMID: 37563717 PMCID: PMC10416453 DOI: 10.1186/s12957-023-03109-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 07/12/2023] [Indexed: 08/12/2023] Open

Pan X, Feng T, Liu C, Savjani RR, Chin RK, Sharon Qi X. A survival prediction model via interpretable machine learning for patients with oropharyngeal cancer following radiotherapy. J Cancer Res Clin Oncol 2023;149:6813-6825. [PMID: 36807760 DOI: 10.1007/s00432-023-04644-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 02/08/2023] [Indexed: 02/21/2023]

Abstract

PURPOSE

To explore interpretable machine learning (ML) methods, with the hope of adding more prognosis value, for predicting survival for patients with Oropharyngeal-Cancer (OPC).

METHODS

A cohort of 427 OPC patients (Training 341, Test 86) from TCIA database was analyzed. Radiomic features of gross-tumor-volume (GTV) extracted from planning CT using Pyradiomics, and HPV p16 status, etc. patient characteristics were considered as potential predictors. A multi-level dimension reduction algorithm consisting of Least-Absolute-Selection-Operator (Lasso) and Sequential-Floating-Backward-Selection (SFBS) was proposed to effectively remove redundant/irrelevant features. The interpretable model was constructed by quantifying the contribution of each feature to the Extreme-Gradient-Boosting (XGBoost) decision by Shapley-Additive-exPlanations (SHAP) algorithm.

RESULTS

The Lasso-SFBS algorithm proposed in this study finally selected 14 features, and our prediction model achieved an area-under-ROC-curve (AUC) of 0.85 on the test dataset based on this feature set. The ranking of the contribution values calculated by SHAP shows that the top predictors that were most correlated with survival were ECOG performance status, wavelet-LLH_firstorder_Mean, chemotherapy, wavelet-LHL_glcm_InverseVariance, tumor size. Those patients who had chemotherapy, with positive HPV p16 status, and lower ECOG performance status, tended to have higher SHAP scores and longer survival; who had an older age at diagnosis, heavy drinking and smoking pack year history, tended to lower SHAP scores and shorter survival.

CONCLUSION

We demonstrated predictive values of combined patient characteristics and imaging features for the overall survival of OPC patients. The multi-level dimension reduction algorithm can reliably identify the most plausible predictors that are mostly associated with overall survival. The interpretable patient-specific survival prediction model, capturing correlations of each predictor and clinical outcome, was developed to facilitate clinical decision-making for personalized treatment.

Collapse

Yi X, Xu W, Tang G, Zhang L, Wang K, Luo H, Zhou X. Individual risk and prognostic value prediction by machine learning for distant metastasis in pulmonary sarcomatoid carcinoma: a large cohort study based on the SEER database and the Chinese population. Front Oncol 2023;13:1105224. [PMID: 37434968 PMCID: PMC10332636 DOI: 10.3389/fonc.2023.1105224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 06/06/2023] [Indexed: 07/13/2023] Open

Liu Y, Wu Z, Feng Y, Gao J, Wang B, Lian C, Diao B. Integration analysis of single-cell and spatial transcriptomics reveal the cellular heterogeneity landscape in glioblastoma and establish a polygenic risk model. Front Oncol 2023;13:1109037. [PMID: 37397378 PMCID: PMC10308022 DOI: 10.3389/fonc.2023.1109037] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 05/31/2023] [Indexed: 07/04/2023] Open

Abstract

Background

Glioblastoma (GBM) is adults' most common and fatally malignant brain tumor. The heterogeneity is the leading cause of treatment failure. However, the relationship between cellular heterogeneity, tumor microenvironment, and GBM progression is still elusive.

Methods

Integrated analysis of single-cell RNA sequencing (scRNA-seq) and spatial transcriptome sequencing (stRNA-seq) of GBM were conducted to analyze the spatial tumor microenvironment. We investigated the subpopulation heterogeneity of malignant cells through gene set enrichment analyses, cell communications analyses, and pseudotime analyses. Significantly changed genes of the pseudotime analysis were screened to create a tumor progress-related gene risk score (TPRGRS) using Cox regression algorithms in the bulkRNA-sequencing(bulkRNA-seq) dataset. We combined the TPRGRS and clinical characteristics to predict the prognosis of patients with GBM. Furthermore, functional analysis was applied to uncover the underlying mechanisms of the TPRGRS.

Results

GBM cells were accurately charted to their spatial locations and uncovered their spatial colocalization. The malignant cells were divided into five clusters with transcriptional and functional heterogeneity, including unclassified malignant cells and astrocyte-like, mesenchymal-like, oligodendrocytes-progenitor-like, and neural-progenitor-like malignant cells. Cell-cell communications analysis in scRNA-seq and stRNA-seq identified ligand-receptor pairs of the CXCL, EGF, FGF, and MIF signaling pathways as bridges implying that tumor microenvironment may cause malignant cells' transcriptomic adaptability and disease progression. Pseudotime analysis showed the differentiation trajectory of GBM cells from proneural to mesenchymal transition and identified genes or pathways that affect cell differentiation. TPRGRS could successfully divide patients with GBM in three datasets into high- and low-risk groups, which was proved to be a prognostic factor independent of routine clinicopathological characteristics. Functional analysis revealed the TPRGRS associated with growth factor binding, cytokine activity, signaling receptor activator activity functions, and oncogenic pathways. Further analysis revealed the association of the TPRGRS with gene mutations and immunity in GBM. Finally, the external datasets and qRT-PCR verified high expressions of the TPRGRS mRNAs in GBM cells.

Conclusion

Our study provides novel insights into heterogeneity in GBM based on scRNA-seq and stRNA-seq data. Moreover, our study proposed a malignant cell transition-based TPRGRS through integrated analysis of bulkRNA-seq and scRNA-seq data, combined with the routine clinicopathological evaluation of tumors, which may provide more personalized drug regimens for GBM patients.

Collapse

Chen W, Zhou B, Jeon CY, Xie F, Lin YC, Butler RK, Zhou Y, Luong TQ, Lustigova E, Pisegna JR, Wu BU. Machine learning versus regression for prediction of sporadic pancreatic cancer. Pancreatology 2023;23:396-402. [PMID: 37130760 PMCID: PMC10406388 DOI: 10.1016/j.pan.2023.04.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 04/10/2023] [Accepted: 04/23/2023] [Indexed: 05/04/2023]

Li X, Wu R, Zhao W, Shi R, Zhu Y, Wang Z, Pan H, Wang D. Machine learning algorithm to predict mortality in critically ill patients with sepsis-associated acute kidney injury. Sci Rep 2023;13:5223. [PMID: 36997585 PMCID: PMC10063657 DOI: 10.1038/s41598-023-32160-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 03/23/2023] [Indexed: 04/01/2023] Open

Affiliation(s)

Xunliang Li Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
Ruijuan Wu Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
Wenman Zhao Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
Rui Shi Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
Yuyu Zhu Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
Zhijuan Wang Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
Haifeng Pan Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China. Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, People's Republic of China. Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei, People's Republic of China.
Deguang Wang Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China. Institute of Kidney Disease, Inflammation and Immunity Mediated Diseases, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China.

Collapse

Sun H, Wu S, Li S, Jiang X. Which model is better in predicting the survival of laryngeal squamous cell carcinoma?: Comparison of the random survival forest based on machine learning algorithms to Cox regression: analyses based on SEER database. Medicine (Baltimore) 2023;102:e33144. [PMID: 36897699 PMCID: PMC9997795 DOI: 10.1097/md.0000000000033144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/10/2023] [Indexed: 03/11/2023] Open

Ruan Z, Quan Q, Wang Q, Jiang J, Peng R. New Staging System and Prognostic Model for Malignant Phyllodes Tumor Patients without Distant Metastasis: A Development and Validation Study. J Clin Med 2023;12:jcm12051889. [PMID: 36902676 PMCID: PMC10003404 DOI: 10.3390/jcm12051889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 02/12/2023] [Accepted: 02/16/2023] [Indexed: 03/08/2023] Open

Park SB, Kim KU, Park YW, Hwang JH, Lim CH. Application of 18 F-fluorodeoxyglucose PET/CT radiomic features and machine learning to predict early recurrence of non-small cell lung cancer after curative-intent therapy. Nucl Med Commun 2023;44:161-168. [PMID: 36458424 DOI: 10.1097/mnm.0000000000001646] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Tran TT, Lee J, Gunathilake M, Kim J, Kim SY, Cho H, Kim J. A comparison of machine learning models and Cox proportional hazards models regarding their ability to predict the risk of gastrointestinal cancer based on metabolic syndrome and its components. Front Oncol 2023;13:1049787. [PMID: 36937438 PMCID: PMC10018751 DOI: 10.3389/fonc.2023.1049787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 01/20/2023] [Indexed: 03/06/2023] Open

Sim R, Chong CW, Loganadan NK, Adam NL, Hussein Z, Lee SWH. Comparison of a chronic kidney disease predictive model for type 2 diabetes mellitus in Malaysia using Cox regression versus machine learning approach. Clin Kidney J 2022;16:549-559. [PMID: 36865020 PMCID: PMC9972828 DOI: 10.1093/ckj/sfac252] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Indexed: 12/12/2022] Open

Machine Learning Algorithms for Prediction of Survival by Stress Echocardiography in Chronic Coronary Syndromes. J Pers Med 2022;12:jpm12091523. [PMID: 36143307 PMCID: PMC9504503 DOI: 10.3390/jpm12091523] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 09/13/2022] [Accepted: 09/13/2022] [Indexed: 11/28/2022] Open

Peng J, Lu Y, Chen L, Qiu K, Chen F, Liu J, Xu W, Zhang W, Zhao Y, Yu Z, Ren J. The prognostic value of machine learning techniques versus cox regression model for head and neck cancer. Methods 2022;205:123-132. [PMID: 35798257 DOI: 10.1016/j.ymeth.2022.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 05/18/2022] [Accepted: 07/01/2022] [Indexed: 10/17/2022] Open

Suresh K, Severn C, Ghosh D. Survival prediction models: an introduction to discrete-time modeling. BMC Med Res Methodol 2022;22:207. [PMID: 35883032 PMCID: PMC9316420 DOI: 10.1186/s12874-022-01679-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 07/08/2022] [Indexed: 12/05/2022] Open

Abstract

Background

Prediction models for time-to-event outcomes are commonly used in biomedical research to obtain subject-specific probabilities that aid in making important clinical care decisions. There are several regression and machine learning methods for building these models that have been designed or modified to account for the censoring that occurs in time-to-event data. Discrete-time survival models, which have often been overlooked in the literature, provide an alternative approach for predictive modeling in the presence of censoring with limited loss in predictive accuracy. These models can take advantage of the range of nonparametric machine learning classification algorithms and their available software to predict survival outcomes.

Methods

Discrete-time survival models are applied to a person-period data set to predict the hazard of experiencing the failure event in pre-specified time intervals. This framework allows for any binary classification method to be applied to predict these conditional survival probabilities. Using time-dependent performance metrics that account for censoring, we compare the predictions from parametric and machine learning classification approaches applied within the discrete time-to-event framework to those from continuous-time survival prediction models. We outline the process for training and validating discrete-time prediction models, and demonstrate its application using the open-source R statistical programming environment.

Results

Using publicly available data sets, we show that some discrete-time prediction models achieve better prediction performance than the continuous-time Cox proportional hazards model. Random survival forests, a machine learning algorithm adapted to survival data, also had improved performance compared to the Cox model, but was sometimes outperformed by the discrete-time approaches. In comparing the binary classification methods in the discrete time-to-event framework, the relative performance of the different methods varied depending on the data set.

Conclusions

We present a guide for developing survival prediction models using discrete-time methods and assessing their predictive performance with the aim of encouraging their use in medical research settings. These methods can be applied to data sets that have continuous time-to-event outcomes and multiple clinical predictors. They can also be extended to accommodate new binary classification algorithms as they become available. We provide R code for fitting discrete-time survival prediction models in a github repository.

Supplementary Information

The online version contains supplementary material available at (10.1186/s12874-022-01679-6).

Collapse

Yue S, Li S, Huang X, Liu J, Hou X, Zhao Y, Niu D, Wang Y, Tan W, Wu J. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med 2022;20:215. [PMID: 35562803 PMCID: PMC9101823 DOI: 10.1186/s12967-022-03364-0] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/26/2022] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Acute kidney injury (AKI) is the most common and serious complication of sepsis, accompanied by high mortality and disease burden. The early prediction of AKI is critical for timely intervention and ultimately improves prognosis. This study aims to establish and validate predictive models based on novel machine learning (ML) algorithms for AKI in critically ill patients with sepsis.

METHODS

Data of patients with sepsis were extracted from the Medical Information Mart for Intensive Care III (MIMIC- III) database. Feature selection was performed using a Boruta algorithm. ML algorithms such as logistic regression (LR), k-nearest neighbors (KNN), support vector machine (SVM), decision tree, random forest, Extreme Gradient Boosting (XGBoost), and artificial neural network (ANN) were applied for model construction by utilizing tenfold cross-validation. The performances of these models were assessed in terms of discrimination, calibration, and clinical application. Moreover, the discrimination of ML-based models was compared with those of Sequential Organ Failure Assessment (SOFA) and the customized Simplified Acute Physiology Score (SAPS) II model.

RESULTS

A total of 3176 critically ill patients with sepsis were included for analysis, of which 2397 cases (75.5%) developed AKI during hospitalization. A total of 36 variables were selected for model construction. The models of LR, KNN, SVM, decision tree, random forest, ANN, XGBoost, SOFA and SAPS II score were established and obtained area under the receiver operating characteristic curves of 0.7365, 0.6637, 0.7353, 0.7492, 0.7787, 0.7547, 0.821, 0.6457 and 0.7015, respectively. The XGBoost model had the best predictive performance in terms of discrimination, calibration, and clinical application among all models.

CONCLUSION

The ML models can be reliable tools for predicting AKI in septic patients. The XGBoost model has the best predictive performance, which can be used to assist clinicians in identifying high-risk patients and implementing early interventions to reduce mortality.

Collapse

Affiliation(s)

Suru Yue Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Shasha Li Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Xueying Huang Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Jie Liu Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Xuefei Hou Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Yumei Zhao Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Dongdong Niu Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Yufeng Wang Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China
Wenkai Tan Department of Gastroenterology, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.
Jiayuan Wu Clinical Research Service Center, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China. .,Collaborative Innovation Engineering Technology Research Center of Clinical Medical Big Data Cloud Service in Medical Consortium of West Guangdong Province, The Affiliated Hospital of Guangdong Medical University, Zhanjiang, 524001, Guangdong Province, China.

Collapse

Using Explainable Machine Learning to Explore the Impact of Synoptic Reporting on Prostate Cancer. ALGORITHMS 2022. [DOI: 10.3390/a15020049] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Yang CH, Chen YS, Moi SH, Chen JB, Wang L, Chuang LY. Machine learning approaches for the mortality risk assessment of patients undergoing hemodialysis. Ther Adv Chronic Dis 2022;13:20406223221119617. [PMID: 36062293 PMCID: PMC9434675 DOI: 10.1177/20406223221119617] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 07/27/2022] [Indexed: 11/15/2022] Open

Abstract

Introduction:

Mortality is a major primary endpoint for long-term hemodialysis (HD) patients. The clinical status of HD patients generally relies on longitudinal clinical observations such as monthly laboratory examinations and physical examinations.

Methods:

A total of 829 HD patients who met the inclusion criteria were analyzed. All patients were tracked from January 2009 to December 2013. Taken together, this study performed full-adjusted-Cox proportional hazards (CoxPH), stepwise-CoxPH, random survival forest (RSF)-CoxPH, and whale optimization algorithm (WOA)-CoxPH model for the all-cause mortality risk assessment in HD patients. The model performance between proposed selections of CoxPH models were evaluated using concordance index.

Results:

The WOA-CoxPH model obtained the highest concordance index compared with RSF-CoxPH and typical selection CoxPH model. The eight significant parameters obtained from the WOA-CoxPH model, including age, diabetes mellitus (DM), hemoglobin (Hb), albumin, creatinine (Cr), potassium (K), Kt/V, and cardiothoracic ratio, have also showed significant survival difference between low- and high-risk characteristics in single-factor analysis. By integrating the risk characteristics of each single factor, patients who obtained seven or more risk characteristics of eight selected parameters were dichotomized as high-risk subgroup, and remaining is considered as low-risk subgroup. The integrated low- and high-risk subgroup showed greater discrepancy compared with each single risk factor selected by WOA-CoxPH model.

Conclusion:

The study findings revealed WOA-CoxPH model could provide better risk assessment performance compared with RSF-CoxPH and typical selection CoxPH model in the HD patients. In summary, patients who had seven or more risk characteristics of eight selected parameters were at potentially increased risk of all-cause mortality in HD population.

Collapse

A comparative study of forest methods for time-to-event data: variable selection and predictive performance. BMC Med Res Methodol 2021;21:193. [PMID: 34563138 PMCID: PMC8465777 DOI: 10.1186/s12874-021-01386-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 09/02/2021] [Indexed: 11/17/2022] Open

Abstract

Background

As a hot method in machine learning field, the forests approach is an attractive alternative approach to Cox model. Random survival forests (RSF) methodology is the most popular survival forests method, whereas its drawbacks exist such as a selection bias towards covariates with many possible split points. Conditional inference forests (CIF) methodology is known to reduce the selection bias via a two-step split procedure implementing hypothesis tests as it separates the variable selection and splitting, but its computation costs too much time. Random forests with maximally selected rank statistics (MSR-RF) methodology proposed recently seems to be a great improvement on RSF and CIF.

Methods

In this paper we used simulation study and real data application to compare prediction performances and variable selection performances among three survival forests methods, including RSF, CIF and MSR-RF. To evaluate the performance of variable selection, we combined all simulations to calculate the frequency of ranking top of the variable importance measures of the correct variables, where higher frequency means better selection ability. We used Integrated Brier Score (IBS) and c-index to measure the prediction accuracy of all three methods. The smaller IBS value, the greater the prediction.

Results

Simulations show that three forests methods differ slightly in prediction performance. MSR-RF and RSF might perform better than CIF when there are only continuous or binary variables in the datasets.

For variable selection performance,

When there are multiple categorical variables in the datasets, the selection frequency of RSF seems to be lowest in most cases. MSR-RF and CIF have higher selection rates, and CIF perform well especially with the interaction term.

The fact that correlation degree of the variables has little effect on the selection frequency indicates that three forest methods can handle data with correlation.

When there are only continuous variables in the datasets, MSR-RF perform better. When there are only binary variables in the datasets, RSF and MSR-RF have more advantages than CIF.

When the variable dimension increases, MSR-RF and RSF seem to be more robustthan CIF

Conclusions

All three methods show advantages in prediction performances and variable selection performances under different situations. The recent proposed methodology MSR-RF possess practical value and is well worth popularizing. It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-021-01386-8.

Collapse

Quist J, Taylor L, Staaf J, Grigoriadis A. Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification. Cancers (Basel) 2021;13:991. [PMID: 33673506 PMCID: PMC7956671 DOI: 10.3390/cancers13050991] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 02/16/2021] [Accepted: 02/20/2021] [Indexed: 11/16/2022] Open