1
|
Ciaramella A, Di Nardo E, Terracciano D, Conte L, Febbraio F, Cimmino A. A new biomarker panel of ultraconserved long non-coding RNAs for bladder cancer prognosis by a machine learning based methodology. BMC Bioinformatics 2023; 23:569. [PMID: 36879192 PMCID: PMC9987036 DOI: 10.1186/s12859-023-05167-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 01/25/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Recent studies have indicated that a special class of long non-coding RNAs (lncRNAs), namely Transcribed-Ultraconservative Regions are transcribed from specific DNA regions (T-UCRs), 100[Formula: see text] conserved in human, mouse, and rat genomes. This is noticeable, as lncRNAs are usually poorly conserved. Despite their peculiarities, T-UCRs remain very understudied in many diseases, including cancer and, yet, it is known that dysregulation of T-UCRs is associated with cancer as well as with human neurological, cardiovascular, and developmental pathologies. We have recently reported the T-UCR uc.8+ as a potential prognostic biomarker in bladder cancer. RESULTS The aim of this work is to develop a methodology, based on machine learning techniques, for the selection of a predictive signature panel for bladder cancer onset. To this end, we analyzed the expression profiles of T-UCRs from surgically removed normal and bladder cancer tissues, by using custom expression microarray. Bladder tissue samples from 24 bladder cancer patients (12 Low Grade and 12 High Grade), with complete clinical data, and 17 control samples from normal bladder epithelium were analysed. After the selection of preferentially expressed and statistically significant T-UCRs, we adopted an ensemble of statistical and machine learning based approaches (i.e., logistic regression, Random Forest, XGBoost and LASSO) for ranking the most important diagnostic molecules. We identified a signature panel of 13 selected T-UCRs with altered expression profiles in cancer, able to efficiently discriminate between normal and bladder cancer patient samples. Also, using this signature panel, we classified bladder cancer patients in four groups, each characterized by a different survival extent. As expected, the group including only Low Grade bladder cancer patients had greater overall survival than patients with the majority of High Grade bladder cancer. However, a specific signature of deregulated T-UCRs identifies sub-types of bladder cancer patients with different prognosis regardless of the bladder cancer Grade. CONCLUSIONS Here we present the results for the classification of bladder cancer (Low and High Grade) patient samples and normal bladder epithelium controls by using a machine learning application. The T-UCR's panel can be used for learning an eXplainable Artificial Intelligent model and develop a robust decision support system for bladder cancer early diagnosis providing urinary T-UCRs data of new patients. The use of this system instead of the current methodology will result in a non-invasive approach, reducing uncomfortable procedures (such as cystoscopy) for the patients. Overall, these results raise the possibility of new automatic systems, which could help the RNA-based prognosis and/or the cancer therapy in bladder cancer patients, and demonstrate the successful application of Artificial Intelligence to the definition of an independent prognostic biomarker panel.
Collapse
Affiliation(s)
- Angelo Ciaramella
- Department of Science and Technology, University of Naples "Parthenope", Centro Direzionale, Isola C4, 80143, Naples, Italy.
| | - Emanuel Di Nardo
- Department of Science and Technology, University of Naples "Parthenope", Centro Direzionale, Isola C4, 80143, Naples, Italy.,Department of Computer Science, University of Milan, Via Celoria, 18, 20133, Milan, Italy
| | - Daniela Terracciano
- Department of Translational Medical Science, University of Naples "Federico II", Via Pansini 5, 80131, Naples, Italy
| | - Lia Conte
- Department of Experimental Urology, Radboud University Medical Center, Geert Grooteplein-Zuid 10, 6525 GA, Nijmegen, The Netherlands
| | - Ferdinando Febbraio
- Institute of Biochemistry and Cell Biology, CNR, Via Pietro Castellino 111, 80131, Naples, Italy.
| | - Amelia Cimmino
- Institute of Genetics and Biophysics, CNR, Via Pietro Castellino 111, 80131, Naples, Italy
| |
Collapse
|
2
|
Doan LMT, Angione C, Occhipinti A. Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer. Methods Mol Biol 2023; 2553:325-393. [PMID: 36227551 DOI: 10.1007/978-1-0716-2617-7_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Breast cancer is one of the most common cancers in women worldwide, which causes an enormous number of deaths annually. However, early diagnosis of breast cancer can improve survival outcomes enabling simpler and more cost-effective treatments. The recent increase in data availability provides unprecedented opportunities to apply data-driven and machine learning methods to identify early-detection prognostic factors capable of predicting the expected survival and potential sensitivity to treatment of patients, with the final aim of enhancing clinical outcomes. This tutorial presents a protocol for applying machine learning models in survival analysis for both clinical and transcriptomic data. We show that integrating clinical and mRNA expression data is essential to explain the multiple biological processes driving cancer progression. Our results reveal that machine-learning-based models such as random survival forests, gradient boosted survival model, and survival support vector machine can outperform the traditional statistical methods, i.e., Cox proportional hazard model. The highest C-index among the machine learning models was recorded when using survival support vector machine, with a value 0.688, whereas the C-index recorded using the Cox model was 0.677. Shapley Additive Explanation (SHAP) values were also applied to identify the feature importance of the models and their impact on the prediction outcomes.
Collapse
Affiliation(s)
- Le Minh Thao Doan
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK
- National Horizons Centre, Teesside University, Darlington, UK
| | - Annalisa Occhipinti
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- National Horizons Centre, Teesside University, Darlington, UK.
| |
Collapse
|
3
|
Ke C, Bandyopadhyay D, Acunzo M, Winn R. Gene Screening in High-Throughput Right-Censored Lung Cancer Data. ONCO 2022; 2:305-318. [PMID: 37066112 PMCID: PMC10100230 DOI: 10.3390/onco2040017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Background Advances in sequencing technologies have allowed collection of massive genome-wide information that substantially advances lung cancer diagnosis and prognosis. Identifying influential markers for clinical endpoints of interest has been an indispensable and critical component of the statistical analysis pipeline. However, classical variable selection methods are not feasible or reliable for high-throughput genetic data. Our objective is to propose a model-free gene screening procedure for high-throughput right-censored data, and to develop a predictive gene signature for lung squamous cell carcinoma (LUSC) with the proposed procedure. Methods A gene screening procedure was developed based on a recently proposed independence measure. The Cancer Genome Atlas (TCGA) data on LUSC was then studied. The screening procedure was conducted to narrow down the set of influential genes to 378 candidates. A penalized Cox model was then fitted to the reduced set, which further identified a 6-gene signature for LUSC prognosis. The 6-gene signature was validated on datasets from the Gene Expression Omnibus. Results Both model-fitting and validation results reveal that our method selected influential genes that lead to biologically sensible findings as well as better predictive performance, compared to existing alternatives. According to our multivariable Cox regression analysis, the 6-gene signature was indeed a significant prognostic factor (p-value < 0.001) while controlling for clinical covariates. Conclusions Gene screening as a fast dimension reduction technique plays an important role in analyzing high-throughput data. The main contribution of this paper is to introduce a fundamental yet pragmatic model-free gene screening approach that aids statistical analysis of right-censored cancer data, and provide a lateral comparison with other available methods in the context of LUSC.
Collapse
Affiliation(s)
- Chenlu Ke
- Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Dipankar Bandyopadhyay
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23284, USA
- Correspondence: ; Tel.: +1-804-827-2058
| | - Mario Acunzo
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Robert Winn
- Massey Cancer Center, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
4
|
Fan TD, Bei DK, Li SW. Nomogram Models Based on the Gene Expression in Prediction of Breast Cancer Bone Metastasis. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:8431946. [PMID: 36046013 PMCID: PMC9424032 DOI: 10.1155/2022/8431946] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 06/01/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022]
Abstract
Objective The aim of this study is to design a weighted co-expression network and build gene expression signature-based nomogram (GESBN) models for predicting the likelihood of bone metastasis in breast cancer (BC) patients. Methods Dataset GSE124647 was used as a training set, while GSE16446, GSE45255, and GSE14020 were taken as validation sets. In the training cohort, the limma package in R was adopted to obtain differentially expressed genes (DEGs) between BC nonbone metastasis and bone metastasis patients, which were used for functional enrichment analysis. After weighted co-expression network analysis (WGCNA), univariate Cox regression and Kaplan-Meier plotter analyses were performed to screen potential prognosis-related genes. Then, GESBN models were constructed and evaluated. The prognostic value of the GESBN models was investigated in the GSE124647 dataset, which was validated in GSE16446 and GSE45255 datasets. Further, the expression levels of genes in the models were explored in the training set, which was validated in GSE14020. Finally, the expression and prognostic value of hub genes in BC were explored. Results A total of 1858 DEGs were obtained. The WGCNA result showed that the blue module was most significantly related to bone metastasis and prognosis. After survival analyses, GAJ1, SLC24A3, ITGBL1, and SLC44A1 were subjected to construct a GESBN model for overall survival (OS). While GJA1, IGFBP6, MDFI, TGFBI, ANXA2, and SLC24A3 were subjected to build a GESBN model for progression-free survival (PFS). Kaplan-Meier plotter and receiver operating characteristic analyses presented the reliable prediction ability of the models. Cox regression analysis further revealed that GESBN models were independent prognostic predictors for OS and PFS in BC patients. Besides, GJA1, IGFBP6, ITGBL1, SLC44A1, and TGFBI expressions were significantly different between the two groups in GSE124647 and GSE14020. The hub genes had a significant impact on patient prognosis. Conclusion Both the four-gene signature and six-gene signature could accurately predict patient prognosis, which may provide novel treatment insights for BC bone metastasis.
Collapse
Affiliation(s)
- Teng-di Fan
- Department of Orthopedics, Ningbo Medical Center Lihuili Hospital, Ningbo 315040, Zhejiang, China
| | - Di-kai Bei
- Department of Orthopedics, Ningbo Medical Center Lihuili Hospital, Ningbo 315040, Zhejiang, China
| | - Song-wei Li
- Department of Orthopedics, Ningbo Medical Center Lihuili Hospital, Ningbo 315040, Zhejiang, China
| |
Collapse
|
5
|
Vijayakumar S, Magazzù G, Moon P, Occhipinti A, Angione C. A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling. Methods Mol Biol 2022; 2399:87-122. [PMID: 35604554 DOI: 10.1007/978-1-0716-1831-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM .
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Giuseppe Magazzù
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Pradip Moon
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK
| | - Annalisa Occhipinti
- Computational Systems Biology and Data Analytics Research Group, Middlebrough, UK
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| | - Claudio Angione
- Computational Systems Biology and Data Analytics Research Group, Teesside University, Middlebrough, UK.
- Centre for Digital Innovation, Teesside University, Middlesbrough, UK.
- Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.
| |
Collapse
|
6
|
Zhao C, Lou Y, Wang Y, Wang D, Tang L, Gao X, Zhang K, Xu W, Liu T, Xiao J. A gene expression signature-based nomogram model in prediction of breast cancer bone metastases. Cancer Med 2018; 8:200-208. [PMID: 30575323 PMCID: PMC6346244 DOI: 10.1002/cam4.1932] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 11/24/2018] [Accepted: 11/27/2018] [Indexed: 12/14/2022] Open
Abstract
Breast cancer is prone to form bone metastases and subsequent skeletal‐related events (SREs) dramatically decrease patients’ quality of life and survival. Prediction and early management of bone lesions are valuable; however, proper prognostic models are inadequate. In the current study, we reviewed a total of 572 breast cancer patients in three microarray data sets including 191 bone metastases and 381 metastases‐free. Gene set enrichment analysis (GSEA) indicated less aggressive and low‐grade features of patients with bone metastases compared with metastases‐free ones, while luminal subtypes are more prone to form bone metastases. Five bone metastases‐related genes (KRT23, REEP1, SPIB, ALDH3B2, and GLDC) were identified and subjected to construct a gene expression signature‐based nomogram (GESBN) model. The model performed well in both training and testing sets for evaluation of breast cancer bone metastases (BCBM). Clinically, the model may help in prediction of early bone metastases, prevention and management of SREs, and even help to prolong survivals for patients with BCBM. The five‐gene GESBN model showed some implications as molecular diagnostic markers and therapeutic targets. Furthermore, our study also provided a way for analysis of tumor organ‐specific metastases. To the best of our knowledge, this is the first published model focused on tumor organ‐specific metastases.
Collapse
Affiliation(s)
- Chenglong Zhao
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Yan Lou
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Yao Wang
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Dongsheng Wang
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Liang Tang
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Xin Gao
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Kun Zhang
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Wei Xu
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Tielong Liu
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| | - Jianru Xiao
- Spine Tumor Center, Department of Orthopedic Oncology, Changzheng Hospital, Second Military Medical University, Shanghai, China
| |
Collapse
|