1
|
Akbilgic O, Karabayir I, Gunturkun H, Pierre JF, Rashe AC, Thomas A. Machine learning analysis on American Gut Project microbiome data to identify subjects with cancer both with and without chemotherapy exposure. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.e14069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
e14069 Background: There is growing interest in the links between cancer and the gut microbiome. However, the effect of chemotherapy upon the gut microbiome remains unknown. We studied whether machine learning can: 1) accurately classify subjects with cancer vs healthy controls and 2) whether this classification model is affected by chemotherapy exposure status. Methods: We used the American Gut Project data to build a extreme gradient boosting (XGBoost) model to distinguish between subjects with cancer vs healthy controls using data on simple demographics and published microbiome. We then further explore the selected features for cancer subjects based on chemotherapy exposure. Results: The cohort included 7,685 subjects consisting of 561 subjects with cancer, 52.5% female, 87.3% White, and average age of 44.7 (SD 17.7). The binary outcome variable represents cancer status. Among 561 subjects with cancer, 94 of them were treated with chemotherapy agents before sampling of microbiomes. As predictors, there were four demographic variables (sex, race, age, BMI) and 1,812 operational taxonomic units (OTUs) each found in at least 2 subjects via RNA sequencing. We randomly split data into 80% training and 20% hidden test. We then built an XGBoost model with 5-fold cross-validation using only training data yielding an AUC (with 95% CI) of 0.79 (0.77, 0.80) and obtained the almost the same AUC on the hidden test data. Based on feature importance analysis, we identified 12 most important features (Age, BMI and 12 OTUs; 4C0d-2, Brachyspirae, Methanosphaera, Geodermatophilaceae, Bifidobacteriaceae, Slackia, Staphylococcus, Acidaminoccus, Devosia, Proteus) and rebuilt a model using only these features and obtained AUC of 0.80 (0.77, 0.83) on the hidden test data. The average predicted probabilities for controls, cancer patients who were exposed to chemotherapy, and cancer patients who were not were 0.071 (0.070,0.073), 0.125 (0.110, 0.140), 0.156 (0.148, 0.164), respectively. There was no statistically significant difference on levels of these 12 OTUs between cancer subjects treated with and without chemotherapy. Conclusions: Machine learning achieved a moderately high accuracy identifying patients’ cancer status based on microbiome. Despite the literature on microbiome and chemotherapy interaction, the levels of 12 OTUs used in our model were not significantly different for cancer patients with or without chemotherapy exposure. Testing this model on other large population databases is needed for broader validation.
Collapse
Affiliation(s)
| | | | | | | | | | - Alexandra Thomas
- Wake Forest Baptist Medical Center Comprehensive Cancer Center, Winston Salem, NC
| |
Collapse
|