1
|
Chen KA, Goffredo P, Hu D, Joisa CU, Guillem JG, Gomez SM, Kapadia MR. Estimating Risk of Locoregional Failure and Overall Survival in Anal Cancer Following Chemoradiation: A Machine Learning Approach. J Gastrointest Surg 2023; 27:1925-1935. [PMID: 37407899 PMCID: PMC10528925 DOI: 10.1007/s11605-023-05755-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/03/2023] [Indexed: 07/07/2023]
Abstract
BACKGROUND Optimal treatment of anal squamous cell carcinoma (ASCC) is definitive chemoradiation. Patients with persistent or recurrent disease require abdominoperineal resection (APR). Current models for predicting need for APR and overall survival are limited by low accuracy or small datasets. This study sought to use machine learning (ML) to develop more accurate models for locoregional failure and overall survival for ASCC. METHODS This study used the National Cancer Database from 2004-2018, divided into training, validation, and test sets. We included patients with stage I-III ASCC who underwent chemoradiation. Our primary outcomes were need for APR and 3-year overall survival. Random forest (RF), gradient boosting (XGB), and neural network (NN) ML-based models were developed and compared with logistic regression (LR). Accuracy was assessed using area under the receiver operating characteristic curve (AUROC). RESULTS APR was required in 5.3% (1,015/18,978) of patients. XGB performed best with AUROC of 0.813, compared with 0.691 for LR. Tumor size, lymphovascular invasion, and tumor grade showed the strongest influence on model predictions. Mortality was 23.6% (7,988/33,834). AUROC for XGB and LR were similar at 0.766 and 0.748, respectively. For this model, age, radiation dose, sex, and insurance status were the most influential variables. CONCLUSIONS We developed and internally validated machine learning-based models for predicting outcomes in ASCC and showed higher accuracy versus LR for locoregional failure, but not overall survival. After external validation, these models may assist clinicians with identifying patients with ASCC at high risk of treatment failure.
Collapse
Affiliation(s)
- Kevin A Chen
- Divison of Gastrointestinal Surgery, Department of Surgery, University of North Carolina at Chapel Hill, 100 Manning Drive, 4038 Burnett Womack Building, CB #7050, Chapel Hill, NC, 27599, USA
| | - Paolo Goffredo
- Division of Colon & Rectal Surgery, Department of Surgery, University of Minnesota, 420 Delaware St SE, MN, 55455, Minneapolis, USA
| | - David Hu
- Department of Biostatistics, UNC Gillings School of Global Public Health, University of North Carolina at Chapel Hill, 3101 McGavran-Greenberg Hall, CB #7420, Chapel Hill, NC, 27599-7420, USA
| | - Chinmaya U Joisa
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, 10202C Mary Ellen Jones Building, Chapel Hill, NC, 27599, USA
| | - Jose G Guillem
- Divison of Gastrointestinal Surgery, Department of Surgery, University of North Carolina at Chapel Hill, 100 Manning Drive, 4038 Burnett Womack Building, CB #7050, Chapel Hill, NC, 27599, USA
| | - Shawn M Gomez
- Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill, 10202C Mary Ellen Jones Building, Chapel Hill, NC, 27599, USA
| | - Muneera R Kapadia
- Divison of Gastrointestinal Surgery, Department of Surgery, University of North Carolina at Chapel Hill, 100 Manning Drive, 4038 Burnett Womack Building, CB #7050, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
2
|
Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN. PLoS One 2021; 16:e0250458. [PMID: 33905431 PMCID: PMC8078779 DOI: 10.1371/journal.pone.0250458] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 04/07/2021] [Indexed: 11/19/2022] Open
Abstract
Accurate prediction of cancer stage is important in that it enables more appropriate treatment for patients with cancer. Many measures or methods have been proposed for more accurate prediction of cancer stage, but recently, machine learning, especially deep learning-based methods have been receiving increasing attention, mostly owing to their good prediction accuracy in many applications. Machine learning methods can be applied to high throughput DNA mutation or RNA expression data to predict cancer stage. However, because the number of genes or markers generally exceeds 10,000, a considerable number of data samples is required to guarantee high prediction accuracy. To solve this problem of a small number of clinical samples, we used a Generative Adversarial Networks (GANs) to augment the samples. Because GANs are not effective with whole genes, we first selected significant genes using DNA mutation data and random forest feature ranking. Next, RNA expression data for selected genes were expanded using GANs. We compared the classification accuracies using original dataset and expanded datasets generated by proposed and existing methods, using random forest, Deep Neural Networks (DNNs), and 1-Dimensional Convolutional Neural Networks (1DCNN). When using the 1DCNN, the F1 score of GAN5 (a 5-fold increase in data) was improved by 39% in relation to the original data. Moreover, the results using only 30% of the data were better than those using all of the data. Our attempt is the first to use GAN for augmentation using numeric data for both DNA and RNA. The augmented datasets obtained using the proposed method demonstrated significantly increased classification accuracy for most cases. By using GAN and 1DCNN in the prediction of cancer stage, we confirmed that good results can be obtained even with small amounts of samples, and it is expected that a great deal of the cost and time required to obtain clinical samples will be reduced. The proposed sample augmentation method could also be applied for other purposes, such as prognostic prediction or cancer classification.
Collapse
|
3
|
Prediction of Colon Cancer Stages and Survival Period with Machine Learning Approach. Cancers (Basel) 2019; 11:cancers11122007. [PMID: 31842486 PMCID: PMC6966646 DOI: 10.3390/cancers11122007] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 12/01/2019] [Accepted: 12/09/2019] [Indexed: 12/11/2022] Open
Abstract
The prediction of tumor in the TNM staging (tumor, node, and metastasis) stage of colon cancer using the most influential histopathology parameters and to predict the five years disease-free survival (DFS) period using machine learning (ML) in clinical research have been studied here. From the colorectal cancer (CRC) registry of Chang Gung Memorial Hospital, Linkou, Taiwan, 4021 patients were selected for the analysis. Various ML algorithms were applied for the tumor stage prediction of the colon cancer by considering the Tumor Aggression Score (TAS) as a prognostic factor. Performances of different ML algorithms were evaluated using five-fold cross-validation, which is an effective way of the model validation. The accuracy achieved by the algorithms taking both cases of standard TNM staging and TNM staging with the Tumor Aggression Score was determined. It was observed that the Random Forest model achieved an F-measure of 0.89, when the Tumor Aggression Score was considered as an attribute along with the standard attributes normally used for the TNM stage prediction. We also found that the Random Forest algorithm outperformed all other algorithms, with an accuracy of approximately 84% and an area under the curve (AUC) of 0.82 ± 0.10 for predicting the five years DFS.
Collapse
|
4
|
Barda AJ, Ruiz VM, Gigliotti T, Tsui FR. An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance. JAMIA Open 2019; 2:197-204. [PMID: 30944914 PMCID: PMC6435008 DOI: 10.1093/jamiaopen/ooy063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Revised: 11/22/2018] [Accepted: 12/20/2018] [Indexed: 11/13/2022] Open
Abstract
Objectives We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes. Materials and Methods We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008–2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012). Results Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used. Discussion and Conclusion We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records.
Collapse
Affiliation(s)
- Amie J Barda
- Tsui Laboratory, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.,Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Victor M Ruiz
- Tsui Laboratory, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.,Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Tony Gigliotti
- Information Services Division, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Fuchiang Rich Tsui
- Tsui Laboratory, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.,Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,School of Computing Information, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.,Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| |
Collapse
|