1
|
Alinia S, Asghari-Jafarabadi M, Mahmoudi L, Norouzi S, Safari M, Roshanaei G. Survival prediction and prognostic factors in colorectal cancer after curative surgery: insights from cox regression and neural networks. Sci Rep 2023; 13:15675. [PMID: 37735621 PMCID: PMC10514146 DOI: 10.1038/s41598-023-42926-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/16/2023] [Indexed: 09/23/2023] Open
Abstract
Medical research frequently relies on Cox regression to analyze the survival distribution of cancer patients. Nonetheless, in specific scenarios, neural networks hold the potential to serve as a robust alternative. In this study, we aim to scrutinize the effectiveness of Cox regression and neural network models in assessing the survival outcomes of patients who have undergone treatment for colorectal cancer. We conducted a retrospective study on 284 colorectal cancer patients who underwent surgery at Imam Khomeini clinic in Hamadan between 2001 and 2017. The data was used to train both Cox regression and neural network models, and their predictive accuracy was compared using diagnostic measures such as sensitivity, specificity, positive predictive value, accuracy, negative predictive value, and area under the receiver operating characteristic curve. The analyses were performed using STATA 17 and R4.0.4 software. The study revealed that the best neural network model had a sensitivity of 74.5% (95% CI 61.0-85.0), specificity of 83.3% (65.3-94.4), positive predictive value of 89.1% (76.4-96.4), negative predictive value of 64.1% (47.2-78.8), AUC of 0.79 (0.70-0.88), and accuracy of 0.776 for death prediction. For recurrence, the best neural network model had a sensitivity of 88.1% (74.4-96.0%), specificity of 83.7% (69.3-93.2%), positive predictive value of 84.1% (69.9-93.4%), negative predictive value of 87.8% (73.8-95.9%), AUC of 0.86 (0.78-0.93), and accuracy of 0.859. The Cox model had comparable results, with a sensitivity of 73.6% (64.8-81.2) and 85.5% (78.3-91.0), specificity of 89.6% (83.8-93.8) and 98.0% (94.4-99.6), positive predictive value of 84.0% (75.6-90.4) and 97.4% (92.6-99.5), negative predictive value of 82.0% (75.6-90.4) and 88.8% (0.83-93.1), AUC of 0.82 (0.77-0.86) and 0.92 (0.89-0.95), and accuracy of 0.88 and 0.92 for death and recurrence prediction, respectively. In conclusion, the study found that both Cox regression and neural network models are effective in predicting early recurrence and death in patients with colorectal cancer after curative surgery. The neural network model showed slightly better sensitivity and negative predictive value for death, while the Cox model had better specificity and positive predictive value for recurrence. Overall, both models demonstrated high accuracy and AUC, indicating their usefulness in predicting these outcomes.
Collapse
Affiliation(s)
- Shayeste Alinia
- Department of Statistics and Epidemiology, School of Medicine, Zanjan University of Medical Sciences, Mahdavi Blvd, Zanjan, 4513956111, Iran
| | - Mohammad Asghari-Jafarabadi
- Faculty of Health, Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Golgasht St. Attar E Neshabouri St., Tabriz, 5166614711, Iran.
- Cabrini Research, Cabrini Health, Malvern, VIC, 3144, Australia.
- Faculty of Medicine, Nursing and Health Sciences, School of Public Health and Preventative Medicine, Monash University, Melbourne, VIC, 3004, Australia.
- Department of Psychiatry, Faculty of Medicine, Nursing and Health Sciences, School of Clinical Sciences, Monash University, Clayton, VIC, 3168, Australia.
| | - Leila Mahmoudi
- Department of Statistics and Epidemiology, School of Medicine, Zanjan University of Medical Sciences, Mahdavi Blvd, Zanjan, 4513956111, Iran.
| | - Solmaz Norouzi
- Department of Statistics and Epidemiology, School of Medicine, Zanjan University of Medical Sciences, Mahdavi Blvd, Zanjan, 4513956111, Iran
| | - Maliheh Safari
- Department of Biostatistics, School of Medicine, Arak University of Medical Sciences, Arak, Iran
| | - Ghodratollah Roshanaei
- Department of Biostatistics, Modeling of Non-Communicable Diseases Research Center, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
2
|
Hao Y, Liang D, Zhang S, Wu S, Li D, Wang Y, Shi M, He Y. Machine learning for predicting the survival in osteosarcoma patients: Analysis based on American and Hebei Province cohort. BIOMOLECULES & BIOMEDICINE 2023; 23:883-893. [PMID: 36967662 PMCID: PMC10494842 DOI: 10.17305/bb.2023.8804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/23/2023] [Accepted: 03/23/2023] [Indexed: 06/18/2023]
Abstract
Osteosarcoma, a rare malignant tumor, has a poor prognosis. This study aimed to find the best prognostic model for osteosarcoma. There were 2912 patients included from the SEER database and 225 patients from Hebei Province. Patients from the SEER database (2008-2015) were included in the development dataset. Patients from the SEER database (2004-2007) and Hebei Province cohort were included in the external test datasets. The Cox model and three tree-based machine learning algorithms (survival tree [ST], random survival forest [RSF] and gradient boosting machine [GBM]) were used to develop the prognostic models by 10-fold cross-validation with 200 iterations. Additionally, performance of models in the multivariable group was compared with the TNM group. The 3-year and 5-year cancer specific survival (CSS) were 72.71% and 65.92% in the development dataset, respectively. The predictive ability in the multivariable group was superior to that in the TNM group. The calibration curves and consistency in the multivariable group were superior to those in the TNM group. The Cox and RSF models performed better than the ST and GBM models. A nomogram was constructed to predict the 3-year and 5-year CSS of osteosarcoma patients. The RSF model can be used as a nonparametric alternative to the Cox model. The constructed nomogram based on the Cox model can provide reference for clinicians to formulate specific therapeutic decisions both in America and China.
Collapse
Affiliation(s)
- Yahui Hao
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Di Liang
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Shuo Zhang
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Siqi Wu
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Daojuan Li
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Yingying Wang
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Miaomiao Shi
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| | - Yutong He
- Cancer Institute, The Fourth Hospital of Hebei Medical University/The Tumor Hospital of Hebei Province, Shijiazhuang, China
| |
Collapse
|
3
|
Kantidakis G, Putter H, Litière S, Fiocco M. Statistical models versus machine learning for competing risks: development and validation of prognostic models. BMC Med Res Methodol 2023; 23:51. [PMID: 36829145 PMCID: PMC9951458 DOI: 10.1186/s12874-023-01866-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting). METHODS A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years. RESULTS Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated. CONCLUSIONS Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.
Collapse
Affiliation(s)
- Georgios Kantidakis
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. .,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands. .,Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, 1200, Brussels, Belgium.
| | - Hein Putter
- Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Saskia Litière
- Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, 1200, Brussels, Belgium
| | - Marta Fiocco
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.,Trial and Data Center, Princess Máxima Center for pediatric oncology (PMC), Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
4
|
'Small Data' for big insights in ecology. Trends Ecol Evol 2023:S0169-5347(23)00019-8. [PMID: 36797167 DOI: 10.1016/j.tree.2023.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 01/18/2023] [Accepted: 01/25/2023] [Indexed: 02/17/2023]
Abstract
Big Data science has significantly furthered our understanding of complex systems by harnessing large volumes of data, generated at high velocity and in great variety. However, there is a risk that Big Data collection is prioritised to the detriment of 'Small Data' (data with few observations). This poses a particular risk to ecology where Small Data abounds. Machine learning experts are increasingly looking to Small Data to drive the next generation of innovation, leading to development in methods for Small Data such as transfer learning, knowledge graphs, and synthetic data. Meanwhile, meta-analysis and causal reasoning approaches are evolving to provide new insights from Small Data. These advances should add value to high-quality Small Data catalysing future insights for ecology.
Collapse
|
5
|
Neural Networks for Survival Prediction in Medicine Using Prognostic Factors: A Review and Critical Appraisal. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1176060. [PMID: 36238497 PMCID: PMC9553343 DOI: 10.1155/2022/1176060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 08/26/2022] [Accepted: 09/13/2022] [Indexed: 11/17/2022]
Abstract
Survival analysis deals with the expected duration of time until one or more events of interest occur. Time to the event of interest may be unobserved, a phenomenon commonly known as right censoring, which renders the analysis of these data challenging. Over the years, machine learning algorithms have been developed and adapted to right-censored data. Neural networks have been repeatedly employed to build clinical prediction models in healthcare with a focus on cancer and cardiology. We present the first ever attempt at a large-scale review of survival neural networks (SNNs) with prognostic factors for clinical prediction in medicine. This work provides a comprehensive understanding of the literature (24 studies from 1990 to August 2021, global search in PubMed). Relevant manuscripts are classified as methodological/technical (novel methodology or new theoretical model; 13 studies) or applications (11 studies). We investigate how researchers have used neural networks to fit survival data for prediction. There are two methodological trends: either time is added as part of the input features and a single output node is specified, or multiple output nodes are defined for each time interval. A critical appraisal of model aspects that should be designed and reported more carefully is performed. We identify key characteristics of prediction models (i.e., number of patients/predictors, evaluation measures, calibration), and compare ANN's predictive performance to the Cox proportional hazards model. The median sample size is 920 patients, and the median number of predictors is 7. Major findings include poor reporting (e.g., regarding missing data, hyperparameters) as well as inaccurate model development/validation. Calibration is neglected in more than half of the studies. Cox models are not developed to their full potential and claims for the performance of SNNs are exaggerated. Light is shed on the current state of art of SNNs in medicine with prognostic factors. Recommendations are made for the reporting of clinical prediction models. Limitations are discussed, and future directions are proposed for researchers who seek to develop existing methodology.
Collapse
|