Tang Y, Li S, Zhu L, Yao L, Li J, Sun X, Liu Y, Zhang Y, Fu X. Improve clinical feature-based bladder cancer survival prediction models through integration with gene expression profiles and machine learning techniques.
Heliyon 2024;
10:e38242. [PMID:
39524931 PMCID:
PMC11546448 DOI:
10.1016/j.heliyon.2024.e38242]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 08/13/2024] [Accepted: 09/20/2024] [Indexed: 11/16/2024] Open
Abstract
Background
Bladder cancer (BCa), one of the most common cancers worldwide, is characterized by high rates of recurrence, progression, and mortality. Machine learning algorithms offer promising advancements in enhancing predictive models. This study aims to develop robust machine learning models for predicting BCa survival using clinical and gene expression data.
Methods
Clinical data from BCa patients were obtained from the Surveillance, Epidemiology, and End Results database. Cox proportional hazards regression models assessed the association between clinical variables and overall survival. Machine learning algorithms, including logistic regression, random forest, XGBoost, decision tree, and LightGBM, were employed to predict survival at 1, 3, and 5 years. The TAGO database, combined with the data from The Cancer Genome Atlas and four databases from the Gene Expression Omnibus, which have available genomic data and clinical data, were selected. Gene expression data were transformed into gene sets data, and the performance of models based on clinical data and gene sets data and their combination were compared. Furthermore, the impact of model-derived scores on overall survival was evaluated.
Results
Among 138,741 BCa patients with available clinical data, key independent predictors of survival included age, race, marital status, surgery, chemotherapy, radiation, and TNM stages. Clinical data machine learning (CML) models used these clinical predictors to achieve AUC values of 0.860, 0.821, and 0.804 in the testing sets for predicting survival at 1, 3, and 5 years, respectively. In the TAGO database, which has 863 patients with clinical and genomic data, the integrated clinical and gene expression machine learning model (IML) outperformed the CML and gene expression machine learning (GML) models in survival prediction. Patients with higher IML and GML model scores exhibited poorer survival outcomes.
Conclusions
This study successfully identifies key clinical and genomic predictors, a significant step forward in BCa research. The development of predictive models for BCa survival underscores the potential of integrated data approaches in improving BCa management and treatment strategies.
Collapse