1
|
Buosi S, Timilsina M, Torrente M, Provencio M, Fey D, Nováček V. Boosting predictive models and augmenting patient data with relevant genomic and pathway information. Comput Biol Med 2024; 174:108398. [PMID: 38608322 DOI: 10.1016/j.compbiomed.2024.108398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/07/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024]
Abstract
The recurrence of low-stage lung cancer poses a challenge due to its unpredictable nature and diverse patient responses to treatments. Personalized care and patient outcomes heavily rely on early relapse identification, yet current predictive models, despite their potential, lack comprehensive genetic data. This inadequacy fuels our research focus-integrating specific genetic information, such as pathway scores, into clinical data. Our aim is to refine machine learning models for more precise relapse prediction in early-stage non-small cell lung cancer. To address the scarcity of genetic data, we employ imputation techniques, leveraging publicly available datasets such as The Cancer Genome Atlas (TCGA), integrating pathway scores into our patient cohort from the Cancer Long Survivor Artificial Intelligence Follow-up (CLARIFY) project. Through the integration of imputed pathway scores from the TCGA dataset with clinical data, our approach achieves notable strides in predicting relapse among a held-out test set of 200 patients. By training machine learning models on enriched knowledge graph data, inclusive of triples derived from pathway score imputation, we achieve a promising precision of 82% and specificity of 91%. These outcomes highlight the potential of our models as supplementary tools within tumour, node, and metastasis (TNM) classification systems, offering improved prognostic capabilities for lung cancer patients. In summary, our research underscores the significance of refining machine learning models for relapse prediction in early-stage non-small cell lung cancer. Our approach, centered on imputing pathway scores and integrating them with clinical data, not only enhances predictive performance but also demonstrates the promising role of machine learning in anticipating relapse and ultimately elevating patient outcomes.
Collapse
Affiliation(s)
- Samuele Buosi
- Data Science Institute, University of Galway, University Road, H91 TK33, Co. Galway, Ireland.
| | - Mohan Timilsina
- Data Science Institute, University of Galway, University Road, H91 TK33, Co. Galway, Ireland
| | - Maria Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, C. Joaquín Rodrigo, 1, Majadahonda, Madrid, 28222, Spain
| | - Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, C. Joaquín Rodrigo, 1, Majadahonda, Madrid, 28222, Spain
| | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Co. Dublin, Ireland
| | - Vít Nováček
- Data Science Institute, University of Galway, University Road, H91 TK33, Co. Galway, Ireland; Faculty of Informatics, Masaryk University, Botanická 68a, 60200, Czech Republic; Masaryk Memorial Cancer Institute, Žlutý kopec 7, 65653, Czech Republic
| |
Collapse
|
2
|
Pačínková A, Popovici V. Using empirical biological knowledge to infer regulatory networks from multi-omics data. BMC Bioinformatics 2022; 23:351. [PMID: 35996085 PMCID: PMC9396869 DOI: 10.1186/s12859-022-04891-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 08/08/2022] [Indexed: 12/13/2022] Open
Abstract
Background Integration of multi-omics data can provide a more complex view of the biological system consisting of different interconnected molecular components, the crucial aspect for developing novel personalised therapeutic strategies for complex diseases. Various tools have been developed to integrate multi-omics data. However, an efficient multi-omics framework for regulatory network inference at the genome level that incorporates prior knowledge is still to emerge. Results We present IntOMICS, an efficient integrative framework based on Bayesian networks. IntOMICS systematically analyses gene expression, DNA methylation, copy number variation and biological prior knowledge to infer regulatory networks. IntOMICS complements the missing biological prior knowledge by so-called empirical biological knowledge, estimated from the available experimental data. Regulatory networks derived from IntOMICS provide deeper insights into the complex flow of genetic information on top of the increasing accuracy trend compared to a published algorithm designed exclusively for gene expression data. The ability to capture relevant crosstalks between multi-omics modalities is verified using known associations in microsatellite stable/instable colon cancer samples. Additionally, IntOMICS performance is compared with two algorithms for multi-omics regulatory network inference that can also incorporate prior knowledge in the inference framework. IntOMICS is also applied to detect potential predictive biomarkers in microsatellite stable stage III colon cancer samples. Conclusions We provide IntOMICS, a framework for multi-omics data integration using a novel approach to biological knowledge discovery. IntOMICS is a powerful resource for exploratory systems biology and can provide valuable insights into the complex mechanisms of biological processes that have a vital role in personalised medicine. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04891-9.
Collapse
Affiliation(s)
- Anna Pačínková
- RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic. .,Faculty of Informatics, Masaryk University, Botanicka 68a, Brno, Czech Republic.
| | - Vlad Popovici
- RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| |
Collapse
|
3
|
Joshi P, Basso B, Wang H, Hong SH, Giardina C, Shin DG. rPAC: Route based pathway analysis for cohorts of gene expression data sets. Methods 2022; 198:76-87. [PMID: 34628030 PMCID: PMC8792230 DOI: 10.1016/j.ymeth.2021.10.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 09/09/2021] [Accepted: 10/04/2021] [Indexed: 02/03/2023] Open
Abstract
Pathway analysis is a popular method aiming to derive biological interpretation from high-throughput gene expression studies. However, existing methods focus mostly on identifying which pathway or pathways could have been perturbed, given differential gene expression patterns. In this paper, we present a novel pathway analysis framework, namely rPAC, which decomposes each signaling pathway route into two parts, the upstream portion of a transcription factor (TF) block and the downstream portion from the TF block and generates a pathway route perturbation analysis scheme examining disturbance scores assigned to both parts together. This rPAC scoring is further applied to a cohort of gene expression data sets which produces two summary metrics, "Proportion of Significance" (PS) and "Average Route Score" (ARS), as quantitative measures discerning perturbed pathway routes within and/or between cohorts. To demonstrate rPAC's scoring competency, we first used a large amount of simulated data and compared the method's performance against those by conventional methods in terms of power curve. Next, we performed a case study involving three epithelial cancer data sets from The Cancer Genome Atlas (TCGA). The rPAC method revealed specific pathway routes as potential cancer type signatures. A deeper pathway analysis of sub-groups (i.e., age groups in COAD or cancer sub-types in BRCA) resulted in pathway routes that are known to be associated with the sub-groups. In addition, multiple previously uncharacterized pathways routes were identified, potentially suggesting that rPAC is better in deciphering etiology of a disease than conventional methods particularly in isolating routes and sections of perturbed pathways in a finer granularity.
Collapse
Affiliation(s)
- Pujan Joshi
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
| | - Brent Basso
- Molecular and Cell Biology Department, University of Connecticut, Storrs, CT, USA
| | - Honglin Wang
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Seung-Hyun Hong
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Charles Giardina
- Molecular and Cell Biology Department, University of Connecticut, Storrs, CT, USA
| | - Dong-Guk Shin
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
4
|
Comprehensive Analysis of Mutation-Based and Expressed Genes-Based Pathways in Head and Neck Squamous Cell Carcinoma. Processes (Basel) 2021. [DOI: 10.3390/pr9050792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Over- or under-expression of mRNA results from genetic alterations. Comprehensive pathway analyses based on mRNA expression are as important as single gene level mutations. This study aimed to compare the mutation- and mRNA expression-based signaling pathways in head and neck squamous cell carcinoma (HNSCC) and to match these with potential drug or druggable pathways. Altogether, 93 recurrent/metastatic HNSCC patients were enrolled. We performed targeted gene sequencing using Illumina HiSeq-2500 for NGS, and nanostring nCounter® for mRNA expression; mRNA expression was classified into over- or under-expression groups based on the expression. We investigated mutational and nanostring data using the CBSJukebox® system, which is a big-data driven platform to analyze druggable pathways, genes, and protein-protein interaction. We calculated a Treatment Benefit Prediction Score (TBPS) to identify suitable drugs. By mapping the high score interaction genes to identify druggable pathways, we found highly related signaling pathways with mutations. Based on the mRNA expression and interaction gene scoring model, several pathways were found to be associated with over- and under-expression. Mutation-based pathways were associated with mRNA under-expressed genes-based pathways. These results suggest that HNSCCs are mainly caused by the loss-of-function mutations. TBPS found several matching drugs such as immune checkpoint inhibitors, EGFR inhibitors, and FGFR inhibitors.
Collapse
|
5
|
Zhao Y, Shin DG. Deep Pathway Analysis V2.0: A Pathway Analysis Framework Incorporating Multi-Dimensional Omics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:373-385. [PMID: 31603796 DOI: 10.1109/tcbb.2019.2945959] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Pathway analysis is essential in cancer research particularly when scientists attempt to derive interpretation from genome-wide high-throughput experimental data. If pathway information is organized into a network topology, its use in interpreting omics data can become very powerful. In this paper, we propose a topology-based pathway analysis method, called DPA V2.0, which can combine multiple heterogeneous omics data types in its analysis. In this method, each pathway route is encoded as a Bayesian network which is initialized with a sequence of conditional probabilities specifically designed to encode directionality of regulatory relationships defined in the pathway. Unlike other topology-based pathway tools, DPA is capable of identifying pathway routes as representatives of perturbed regulatory signals. We demonstrate the effectiveness of our model by applying it to two well-established TCGA data sets, namely, breast cancer study (BRCA) and ovarian cancer study (OV). The analysis combines mRNA-seq, mutation, copy number variation, and phosphorylation data publicly available for both TCGA data sets. We performed survival analysis and patient subtype analysis and the analysis outcomes revealed the anticipated strengths of our model. We hope that the availability of our model encourages wet lab scientists to generate extra data sets to reap the benefits of using multiple data types in pathway analysis. The majority of pathways distinguished can be confirmed by biological literature. Moreover, the proportion of correctly indentified pathways is 10 percent higher than previous work where only mRNA-seq and mutation data is incorporated for breast cancer patients. Consequently, such an in-depth pathway analysis incorporating more diverse data can give rise to the accuracy of perturbed pathway detection.
Collapse
|
6
|
Zhao Y, Piekos S, Hoang TH, Shin DG. A framework using topological pathways for deeper analysis of transcriptome data. BMC Genomics 2020; 21:834. [PMID: 32138666 PMCID: PMC7057456 DOI: 10.1186/s12864-019-6155-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 09/30/2019] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Pathway analysis is one of the later stage data analysis steps essential in interpreting high-throughput gene expression data. We propose a set of algorithms which given gene expression data can recognize which portion of sub-pathways are actively utilized in the biological system being studied. The degree of activation is measured by conditional probability of the input expression data based on the Bayesian Network model constructed from the topological pathway. RESULTS We demonstrate the effectiveness of our pathway analysis method by conducting two case studies. The first one applies our method to a well-studied temporal microarray data set for the cell cycle using the KEGG Cell Cycle pathway. Our method closely reproduces the biological claims associated with the data sets, but unlike the original work ours can produce how pathway routes interact with each other above and beyond merely identifying which pathway routes are involved in the process. The second study applies the method to the p53 mutation microarray data to perform a comparative study. CONCLUSIONS We show that our method achieves comparable performance against all other pathway analysis systems included in this study in identifying p53 altered pathways. Our method could pave a new way of carrying out next generation pathway analysis.
Collapse
Affiliation(s)
- Yue Zhao
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, 06269 USA
| | - Stephanie Piekos
- Department of Pharmaceutical Sciences, University of Connecticut, 69 North Eagleville Road, Unit 3092, Storrs, USA
| | - Tham H. Hoang
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, 06269 USA
| | - Dong-Guk Shin
- Computer Science and Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, 06269 USA
| |
Collapse
|
7
|
Lyudovyk O, Shen Y, Tatonetti NP, Hsiao SJ, Mansukhani MM, Weng C. Pathway analysis of genomic pathology tests for prognostic cancer subtyping. J Biomed Inform 2019; 98:103286. [PMID: 31499184 PMCID: PMC7136846 DOI: 10.1016/j.jbi.2019.103286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 09/01/2019] [Accepted: 09/05/2019] [Indexed: 10/26/2022]
Abstract
Genomic test results collected during the provision of medical care and stored in Electronic Health Record (EHR) systems represent an opportunity for clinical research into disease heterogeneity and clinical outcomes. In this paper, we evaluate the use of genomic test reports ordered for cancer patients in order to derive cancer subtypes and to identify biological pathways predictive of poor survival outcomes. A novel method is proposed to calculate patient similarity based on affected biological pathways rather than gene mutations. We demonstrate that this approach identifies subtypes of prognostic value and biological pathways linked to survival, with implications for precision treatment selection and a better understanding of the underlying disease. We also share lessons learned regarding the opportunities and challenges of secondary use of observational genomic data to conduct such research.
Collapse
Affiliation(s)
- Olga Lyudovyk
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Yufeng Shen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Susan J Hsiao
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Mahesh M Mansukhani
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|
8
|
Thibodeau A, Shin DG. TriPOINT: a software tool to prioritize important genes in pathways and their non-coding regulators. Bioinformatics 2019; 35:2686-2689. [PMID: 30566622 PMCID: PMC6662310 DOI: 10.1093/bioinformatics/bty998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 11/05/2018] [Accepted: 12/17/2018] [Indexed: 11/14/2022] Open
Abstract
Summary Current approaches for pathway analyses focus on representing gene expression levels on graph representations of pathways and conducting pathway enrichment among differentially expressed genes. However, gene expression levels by themselves do not reflect the overall picture as non-coding factors play an important role to regulate gene expression. To incorporate these non-coding factors into pathway analyses and to systematically prioritize genes in a pathway we introduce a new software: Triangulation of Perturbation Origins and Identification of Non-Coding Targets. Triangulation of Perturbation Origins and Identification of Non-Coding Targets is a pathway analysis tool, implemented in Java that identifies the significance of a gene under a condition (e.g. a disease phenotype) by studying graph representations of pathways, analyzing upstream and downstream gene interactions and integrating non-coding regions that may be regulating gene expression levels. Availability and implementation The TriPOINT open source software is freely available at https://github.uconn.edu/ajt06004/TriPOINT under the GPL v3.0 license. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Asa Thibodeau
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA
| | - Dong-Guk Shin
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
9
|
Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2019. [DOI: 10.29220/csam.2019.26.4.411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
10
|
BioTarget: A Computational Framework Identifying Cancer Type Specific Transcriptional Targets of Immune Response Pathways. Sci Rep 2019; 9:9029. [PMID: 31227749 PMCID: PMC6588588 DOI: 10.1038/s41598-019-45304-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 06/03/2019] [Indexed: 01/04/2023] Open
Abstract
Transcriptome data can provide information on signaling pathways active in cancers, but new computational tools are needed to more accurately quantify pathway activity and identify tissue-specific pathway features. We developed a computational method called “BioTarget” that incorporates ChIP-seq data into cellular pathway analysis. This tool relates the expression of transcription factor TF target genes (based on ChIP-seq data) with the status of upstream signaling components for an accurate quantification of pathway activity. This analysis also reveals TF targets expressed in specific contexts/tissues. We applied BioTarget to assess the activity of TBX21 and GATA3 pathways in cancers. TBX21 and GATA3 are TF regulators that control the differentiation of T cells into Th1 and Th2 helper cells that mediate cell-based and humoral immune responses, respectively. Since tumor immune responses can impact cancer progression, the significance of our pathway scores should be revealed by effective patient stratification. We found that low Th1/Th2 activity ratios were associated with a significantly poorer survival of stomach and breast cancer patients, whereas an unbalanced Th1/Th2 response was correlated with poorer survival of colon cancer patients. Lung adenocarcinoma and lung squamous cell carcinoma patients had the lowest survival rates when both Th1 and Th2 responses were high. Our method also identified context-specific target genes for TBX21 and GATA3. Applying the BioTarget tool to BCL6, a TF associated with germinal center lymphocytes, we observed that patients with an active BCL6 pathway had significantly improved survival for breast, colon, and stomach cancer. Our findings support the effectiveness of the BioTarget tool for transcriptome analysis and point to interesting associations between some immune-response pathways and cancer progression.
Collapse
|
11
|
Han YC, Lin CM, Chen TT. RNA-Seq analysis of differentially expressed genes relevant to innate and adaptive immunity in cecropin P1 transgenic rainbow trout (Oncorhynchus mykiss). BMC Genomics 2018; 19:760. [PMID: 30340506 PMCID: PMC6195682 DOI: 10.1186/s12864-018-5141-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 10/05/2018] [Indexed: 01/01/2023] Open
Abstract
Background In the past years, our laboratory successfully generated transgenic rainbow trout bearing cecropin P1 transgene. These fish exhibited resistant characteristic to infection by Aeromonas salmonicida, Infectious Hematopoietic Necrosis Virus (IHNV) and Ceratomyxa shasta (a parasitic pathogen). Previously, treating rainbow trout macrophage cells (RTS-11) with cecropin B, pleurocidin and CF17, respectively, resulted in elevated expression of two pro-inflammatory genes, e.g. cyclooxygenase-2 (cox-2) and interleukin-1β (il-1β). In addition, a profiling of global gene expression by 44 k salmonid microarray analysis was conducted, and the results showed that immune relevant processes have been perturbed in cecopin P1 transgenic rainbow trout. Therefore, we hypothesized that cecropin P1 may not only eliminate pathogens directly, but also modulate the host immune systems, leading to increased resistance against pathogen infections. To confirm this hypothesis, we performed de novo mRNA deep sequencing (RNA-Seq) to analyze the transcriptomic expression profiles in three immune competent tissues of cecropin P1 transgenic rainbow trout. Results De novo sequencing of mRNA of the rainbow trout spleen, liver and kidney tissues were conducted by second-generation Illumina system, followed by Trinity assembly. Tissue specific unigenes were obtained, and annotated according to the Gene Ontology (GO) and the Nucleotide Basic Local Alignment Search Tool (BLAST). Over 2000 differentially expressed genes (DEGs) were determined by normalized ratio of Reads Per Kilobase of transcript per million mapped reads (RPKM) among the transgenic and non-transgenic fish in a tissue specific manner, and there were 82 DEGs in common among the three tissues. In addition, the enrichment analysis according to Gene Ontology Biological Process (GO:BP), and Kyoto Encyclopedia of Genes and Genomes (KEGG) based pathway analysis associated with innate/adaptive immunity of fish were also performed to illustrate the altered immune-related functions in each tissue. Conclusions According to the RNA-Seq data, the correlations between alteration of gene expression profiles and the functional perturbations of the host immune processes were revealed. In comparison with the results of cDNA microarray analysis conducted by Lo et al., the overall results supported our hypothesis that the gene product of cecropin P1 transgene may not only directly eliminate pathogens, but also modulate the host immune system. Results of this study present valuable genetic information for Oncorhynchus mykiss, and will benefit future studies on the immunology of this fish species. Electronic supplementary material The online version of this article (10.1186/s12864-018-5141-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yueh-Chiang Han
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269, USA
| | - Chun-Mean Lin
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269, USA
| | - Thomas T Chen
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269, USA.
| |
Collapse
|
12
|
Integrative analysis of omics data. Methods 2017; 124:1-2. [DOI: 10.1016/j.ymeth.2017.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|