1
|
Subramanian A, Zakeri P, Mousa M, Alnaqbi H, Alshamsi FY, Bettoni L, Damiani E, Alsafar H, Saeys Y, Carmeliet P. Angiogenesis goes computational - The future way forward to discover new angiogenic targets? Comput Struct Biotechnol J 2022; 20:5235-5255. [PMID: 36187917 PMCID: PMC9508490 DOI: 10.1016/j.csbj.2022.09.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/09/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Multi-omics technologies are being increasingly utilized in angiogenesis research. Yet, computational methods have not been widely used for angiogenic target discovery and prioritization in this field, partly because (wet-lab) vascular biologists are insufficiently familiar with computational biology tools and the opportunities they may offer. With this review, written for vascular biologists who lack expertise in computational methods, we aspire to break boundaries between both fields and to illustrate the potential of these tools for future angiogenic target discovery. We provide a comprehensive survey of currently available computational approaches that may be useful in prioritizing candidate genes, predicting associated mechanisms, and identifying their specificity to endothelial cell subtypes. We specifically highlight tools that use flexible, machine learning frameworks for large-scale data integration and gene prioritization. For each purpose-oriented category of tools, we describe underlying conceptual principles, highlight interesting applications and discuss limitations. Finally, we will discuss challenges and recommend some guidelines which can help to optimize the process of accurate target discovery.
Collapse
Affiliation(s)
- Abhishek Subramanian
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Pooya Zakeri
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Centre for Brain and Disease Research, Flanders Institute for Biotechnology (VIB), Leuven, Belgium
- Department of Neurosciences and Leuven Brain Institute, KU Leuven, Leuven, Belgium
| | - Mira Mousa
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Halima Alnaqbi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Fatima Yousif Alshamsi
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Leo Bettoni
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ernesto Damiani
- Robotics and Intelligent Systems Institute, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Peter Carmeliet
- Laboratory of Angiogenesis & Vascular Metabolism, Center for Cancer Biology, VIB, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Metabolism, Department of Oncology, KU Leuven, Leuven, Belgium
- Laboratory of Angiogenesis & Vascular Heterogeneity, Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Center for Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| |
Collapse
|
2
|
Ge SX, Son EW, Yao R. iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018; 19:534. [PMID: 30567491 PMCID: PMC6299935 DOI: 10.1186/s12859-018-2486-6] [Citation(s) in RCA: 942] [Impact Index Per Article: 134.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 11/12/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND RNA-seq is widely used for transcriptomic profiling, but the bioinformatics analysis of resultant data can be time-consuming and challenging, especially for biologists. We aim to streamline the bioinformatic analyses of gene-level data by developing a user-friendly, interactive web application for exploratory data analysis, differential expression, and pathway analysis. RESULTS iDEP (integrated Differential Expression and Pathway analysis) seamlessly connects 63 R/Bioconductor packages, 2 web services, and comprehensive annotation and pathway databases for 220 plant and animal species. The workflow can be reproduced by downloading customized R code and related pathway files. As an example, we analyzed an RNA-Seq dataset of lung fibroblasts with Hoxa1 knockdown and revealed the possible roles of SP1 and E2F1 and their target genes, including microRNAs, in blocking G1/S transition. In another example, our analysis shows that in mouse B cells without functional p53, ionizing radiation activates the MYC pathway and its downstream genes involved in cell proliferation, ribosome biogenesis, and non-coding RNA metabolism. In wildtype B cells, radiation induces p53-mediated apoptosis and DNA repair while suppressing the target genes of MYC and E2F1, and leads to growth and cell cycle arrest. iDEP helps unveil the multifaceted functions of p53 and the possible involvement of several microRNAs such as miR-92a, miR-504, and miR-30a. In both examples, we validated known molecular pathways and generated novel, testable hypotheses. CONCLUSIONS Combining comprehensive analytic functionalities with massive annotation databases, iDEP ( http://ge-lab.org/idep/ ) enables biologists to easily translate transcriptomic and proteomic data into actionable insights.
Collapse
Affiliation(s)
- Steven Xijin Ge
- Department of Mathematics and Statistics, South Dakota State University, Box 2225, Brookings, SD 57007 USA
| | - Eun Wo Son
- Department of Mathematics and Statistics, South Dakota State University, Box 2225, Brookings, SD 57007 USA
| | - Runan Yao
- Department of Mathematics and Statistics, South Dakota State University, Box 2225, Brookings, SD 57007 USA
| |
Collapse
|
3
|
Jambusaria A, Klomp J, Hong Z, Rafii S, Dai Y, Malik AB, Rehman J. A computational approach to identify cellular heterogeneity and tissue-specific gene regulatory networks. BMC Bioinformatics 2018; 19:217. [PMID: 29940845 PMCID: PMC6019795 DOI: 10.1186/s12859-018-2190-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 05/04/2018] [Indexed: 01/26/2023] Open
Abstract
Background The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Results Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. Conclusions HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes. Electronic supplementary material The online version of this article (10.1186/s12859-018-2190-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ankit Jambusaria
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA.,Department of Bioengineering, The University of Illinois at Chicago, Chicago, IL, USA
| | - Jeff Klomp
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA
| | - Zhigang Hong
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA
| | - Shahin Rafii
- Division of Regenerative Medicine, Department of Medicine, Ansary Stem Cell Institute, Weill Cornell Medicine, New York, NY, USA
| | - Yang Dai
- Department of Bioengineering, The University of Illinois at Chicago, Chicago, IL, USA
| | - Asrar B Malik
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA.
| | - Jalees Rehman
- Department of Pharmacology, The University of Illinois College of Medicine, 835 S. Wolcott Ave. Rm. E403, Chicago, IL, 60612, USA. .,Department of Bioengineering, The University of Illinois at Chicago, Chicago, IL, USA. .,Division of Cardiology, Department of Medicine, The University of Illinois College of Medicine, Chicago, IL, USA.
| |
Collapse
|
4
|
Schmid F, Schmid M, Müssel C, Sträng JE, Buske C, Bullinger L, Kraus JM, Kestler HA. GiANT: gene set uncertainty in enrichment analysis. Bioinformatics 2016; 32:1891-4. [DOI: 10.1093/bioinformatics/btw030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/12/2016] [Indexed: 11/14/2022] Open
|
5
|
Yu X, Zeng T, Li G. Integrative enrichment analysis: a new computational method to detect dysregulated pathways in heterogeneous samples. BMC Genomics 2015; 16:918. [PMID: 26556243 PMCID: PMC4641376 DOI: 10.1186/s12864-015-2188-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2015] [Accepted: 11/02/2015] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Pathway enrichment analysis is a useful tool to study biology and biomedicine, due to its functional screening on well-defined biological procedures rather than separate molecules. The measurement of malfunctions of pathways with a phenotype change, e.g., from normal to diseased, is the key issue when applying enrichment analysis on a pathway. The differentially expressed genes (DEGs) are widely focused in conventional analysis, which is based on the great purity of samples. However, the disease samples are usually heterogeneous, so that, the genes with great differential expression variance (DEVGs) are becoming attractive and important to indicate the specific state of a biological system. In the context of differential expression variance, it is still a challenge to measure the enrichment or status of a pathway. To address this issue, we proposed Integrative Enrichment Analysis (IEA) based on a novel enrichment measurement. RESULTS The main competitive ability of IEA is to identify dysregulated pathways containing DEGs and DEVGs simultaneously, which are usually under-scored by other methods. Next, IEA provides two additional assistant approaches to investigate such dysregulated pathways. One is to infer the association among identified dysregulated pathways and expected target pathways by estimating pathway crosstalks. The other one is to recognize subtype-factors as dysregulated pathways associated to particular clinical indices according to the DEVGs' relative expressions rather than conventional raw expressions. Based on a previously established evaluation scheme, we found that, in particular cohorts (i.e., a group of real gene expression datasets from human patients), a few target disease pathways can be significantly high-ranked by IEA, which is more effective than other state-of-the-art methods. Furthermore, we present a proof-of-concept study on Diabetes to indicate: IEA rather than conventional ORA or GSEA can capture the under-estimated dysregulated pathways full of DEVGs and DEGs; these newly identified pathways could be significantly linked to prior-known disease pathways by estimated crosstalks; and many candidate subtype-factors recognized by IEA also have significant relation with the risk of subtypes of genotype-phenotype associations. CONCLUSIONS Totally, IEA supplies a new tool to carry on enrichment analysis in the complicate context of clinical application (i.e., heterogeneity of disease), as a necessary complementary and cooperative approach to conventional ones.
Collapse
Affiliation(s)
- Xiangtian Yu
- School of Mathematics, Shandong University, Jinan, 250100, China.
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Cell Building Level 3, YueYang Road 320, Shanghai, 200031, China.
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, 250100, China.
| |
Collapse
|
6
|
Reboiro-Jato M, Díaz F, Glez-Peña D, Fdez-Riverola F. A novel ensemble of classifiers that use biological relevant gene sets for microarray classification. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.01.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
7
|
Baier H, Schultz J. ISAAC - InterSpecies Analysing Application using Containers. BMC Bioinformatics 2014; 15:18. [PMID: 24428905 PMCID: PMC3897929 DOI: 10.1186/1471-2105-15-18] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 01/10/2014] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Information about genes, transcripts and proteins is spread over a wide variety of databases. Different tools have been developed using these databases to identify biological signals in gene lists from large scale analysis. Mostly, they search for enrichments of specific features. But, these tools do not allow an explorative walk through different views and to change the gene lists according to newly upcoming stories. RESULTS To fill this niche, we have developed ISAAC, the InterSpecies Analysing Application using Containers. The central idea of this web based tool is to enable the analysis of sets of genes, transcripts and proteins under different biological viewpoints and to interactively modify these sets at any point of the analysis. Detailed history and snapshot information allows tracing each action. Furthermore, one can easily switch back to previous states and perform new analyses. Currently, sets can be viewed in the context of genomes, protein functions, protein interactions, pathways, regulation, diseases and drugs. Additionally, users can switch between species with an automatic, orthology based translation of existing gene sets. As todays research usually is performed in larger teams and consortia, ISAAC provides group based functionalities. Here, sets as well as results of analyses can be exchanged between members of groups. CONCLUSIONS ISAAC fills the gap between primary databases and tools for the analysis of large gene lists. With its highly modular, JavaEE based design, the implementation of new modules is straight forward. Furthermore, ISAAC comes with an extensive web-based administration interface including tools for the integration of third party data. Thus, a local installation is easily feasible. In summary, ISAAC is tailor made for highly explorative interactive analyses of gene, transcript and protein sets in a collaborative environment.
Collapse
Affiliation(s)
- Herbert Baier
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, Würzburg 97074, Germany
| | - Jörg Schultz
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, Würzburg 97074, Germany
| |
Collapse
|
8
|
Delaleu N, Nguyen CQ, Tekle KM, Jonsson R, Peck AB. Transcriptional landscapes of emerging autoimmunity: transient aberrations in the targeted tissue's extracellular milieu precede immune responses in Sjögren's syndrome. Arthritis Res Ther 2013; 15:R174. [PMID: 24286337 PMCID: PMC3978466 DOI: 10.1186/ar4362] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 10/11/2013] [Indexed: 12/12/2022] Open
Abstract
Introduction Our understanding of autoimmunity is skewed considerably towards the late stages of overt disease and chronic inflammation. Defining the targeted organ’s role during emergence of autoimmune diseases is, however, critical in order to define their etiology, early and covert disease phases and delineate their molecular basis. Methods Using Sjögren’s syndrome (SS) as an exemplary rheumatic autoimmune disease and temporal global gene-expression profiling, we systematically mapped the transcriptional landscapes and chronological interrelationships between biological themes involving the salivary glands’ extracellular milieu. The time period studied spans from pre- to subclinical and ultimately to onset of overt disease in a well-defined model of spontaneous SS, the C57BL/6.NOD-Aec1Aec2 strain. In order to answer this aim of great generality, we developed a novel bioinformatics-based approach, which integrates comprehensive data analysis and visualization within interactive networks. The latter are computed by projecting the datasets as a whole on a priori-defined consensus-based knowledge. Results Applying these methodologies revealed extensive susceptibility loci-dependent aberrations in salivary gland homeostasis and integrity preceding onset of overt disease by a considerable amount of time. These alterations coincided with innate immune responses depending predominantly on genes located outside of the SS-predisposing loci Aec1 and Aec2. Following a period of transcriptional stability, networks mapping the onset of overt SS displayed, in addition to natural killer, T- and B-cell-specific gene patterns, significant reversals of focal adhesion, cell-cell junctions and neurotransmitter receptor-associated alterations that had prior characterized progression from pre- to subclinical disease. Conclusions This data-driven methodology advances unbiased assessment of global datasets an allowed comprehensive interpretation of complex alterations in biological states. Its application delineated a major involvement of the targeted organ during the emergence of experimental SS.
Collapse
|
9
|
Glez-Peña D, Lourenço A, López-Fernández H, Reboiro-Jato M, Fdez-Riverola F. Web scraping technologies in an API world. Brief Bioinform 2013; 15:788-97. [PMID: 23632294 DOI: 10.1093/bib/bbt026] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
Collapse
|
10
|
Ellis J, Goodswen S, Kennedy PJ, Bush S. The core mouse response to infection by neospora caninum defined by gene set enrichment analyses. Bioinform Biol Insights 2012; 6:187-202. [PMID: 23012496 PMCID: PMC3448498 DOI: 10.4137/bbi.s9954] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
In this study, the BALB/c and Qs mouse responses to infection by the parasite Neospora caninum were investigated in order to identify host response mechanisms. Investigation was done using gene set (enrichment) analyses of microarray data. GSEA, MANOVA, Romer, subGSE and SAM-GS were used to study the contrasts Neospora strain type, Mouse type (BALB/c and Qs) and time post infection (6 hours post infection and 10 days post infection). The analyses show that the major signal in the core mouse response to infection is from time post infection and can be defined by gene ontology terms Protein Kinase Activity, Cell Proliferation and Transcription Initiation. Several terms linked to signaling, morphogenesis, response and fat metabolism were also identified. At 10 days post infection, genes associated with fatty acid metabolism were identified as up regulated in expression. The value of gene set (enrichment) analyses in the analysis of microarray data is discussed.
Collapse
Affiliation(s)
- John Ellis
- School of Medical and Molecular Biosciences and the I3 Institute, University of Technology, Sydney, Broadway, Australia
| | | | | | | |
Collapse
|
11
|
Natarajan L, Pu M, Messer K. Exact statistical tests for the intersection of independent lists of genes. Ann Appl Stat 2012; 6:521-541. [PMID: 23335952 DOI: 10.1214/11-aoas510] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Public data repositories have enabled researchers to compare results across multiple genomic studies in order to replicate findings. A common approach is to first rank genes according to an hypothesis of interest within each study. Then, lists of the top-ranked genes within each study are compared across studies. Genes recaptured as highly ranked (usually above some threshold) in multiple studies are considered to be significant. However, this comparison strategy often remains informal, in that Type I error and false discovery rate are usually uncontrolled. In this paper, we formalize an inferential strategy for this kind of list-intersection discovery test. We show how to compute a p-value associated with a `recaptured' set of genes, using a closed-form Poisson approximation to the distribution of the size of the recaptured set. The distribution of the test statistic depends on the rank threshold and the number of studies within which a gene must be recaptured. We use a Poisson approximation to investigate operating characteristics of the test. We give practical guidance on how to design a bioinformatic list-intersection study with prespecified control of Type I error (at the set level) and false discovery rate (at the gene level). We show how choice of test parameters will affect the expected proportion of significant genes identified. We present a strategy for identifying optimal choice of parameters, depending on the particular alternative hypothesis which might hold. We illustrate our methods using prostate cancer gene-expression datasets from the curated Oncomine database.
Collapse
Affiliation(s)
- Loki Natarajan
- Division of Biostatistics and Bioinformatics UCSD School of Medicine Moores UCSD Cancer Center # 0901 University of California, La Jolla, CA 92093
| | | | | |
Collapse
|
12
|
Sonachalam M, Shen J, Huang H, Wu X. Systems biology approach to identify gene network signatures for colorectal cancer. Front Genet 2012; 3:80. [PMID: 22629282 PMCID: PMC3354560 DOI: 10.3389/fgene.2012.00080] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2012] [Accepted: 04/25/2012] [Indexed: 11/17/2022] Open
Abstract
In this work, we integrated prior knowledge from gene signatures and protein interactions with gene set enrichment analysis (GSEA), and gene/protein network modeling together to identify gene network signatures from gene expression microarray data. We demonstrated how to apply this approach into discovering gene network signatures for colorectal cancer (CRC) from microarray datasets. First, we used GSEA to analyze the microarray data through enriching differential genes in different CRC-related gene sets from two publicly available up-to-date gene set databases – Molecular Signatures Database (MSigDB) and Gene Signatures Database (GeneSigDB). Second, we compared the enriched gene sets through enrichment score, false-discovery rate, and nominal p-value. Third, we constructed an integrated protein–protein interaction (PPI) network through connecting these enriched genes by high-quality interactions from a human annotated and predicted protein interaction database, with a confidence score labeled for each interaction. Finally, we mapped differential gene expressions onto the constructed network to build a comprehensive network model containing visualized transcriptome and proteome data. The results show that although MSigDB has more CRC-relevant gene sets than GeneSigDB, the integrated PPI network connecting the enriched genes from both MSigDB and GeneSigDB can provide a more complete view for discovering gene network signatures. We also found several important sub-network signatures for CRC, such as TP53 sub-network, PCNA sub-network, and IL8 sub-network, corresponding to apoptosis, DNA repair, and immune response, respectively.
Collapse
|
13
|
GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis. FEBS Open Bio 2012; 2:76-82. [PMID: 23650583 PMCID: PMC3642118 DOI: 10.1016/j.fob.2012.04.003] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Revised: 04/04/2012] [Accepted: 04/12/2012] [Indexed: 12/12/2022] Open
Abstract
Most “omics” experiments require comprehensive interpretation of the biological meaning of gene lists. To address this requirement, a number of gene set analysis (GSA) tools have been developed. Although the biological value of GSA is strictly limited by the breadth of the gene sets used, very few methods exist for simultaneously analysing multiple publically available gene set databases. Therefore, we constructed GeneSetDB (http://genesetdb.auckland.ac.nz/haeremai.html), a comprehensive meta-database, which integrates 26 public databases containing diverse biological information with a particular focus on human disease and pharmacology. GeneSetDB enables users to search for gene sets containing a gene identifier or keyword, generate their own gene sets, or statistically test for enrichment of an uploaded gene list across all gene sets, and visualise gene set enrichment and overlap using a clustered heat map.
Collapse
|
14
|
Garcia-Reyero N, Habib T, Pirooznia M, Gust KA, Gong P, Warner C, Wilbanks M, Perkins E. Conserved toxic responses across divergent phylogenetic lineages: a meta-analysis of the neurotoxic effects of RDX among multiple species using toxicogenomics. ECOTOXICOLOGY (LONDON, ENGLAND) 2011; 20:580-594. [PMID: 21516383 DOI: 10.1007/s10646-011-0623-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/14/2011] [Indexed: 05/28/2023]
Abstract
At military training sites, a variety of pollutants such as hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), may contaminate the area originating from used munitions. Studies investigating the mechanism of toxicity of RDX have shown that it affects the central nervous system causing seizures in humans and animals. Environmental pollutants such as RDX have the potential to affect many different species, therefore it is important to establish how phylogenetically distant species may respond to these types of emerging pollutants. In this paper, we have used a transcriptional network approach to compare and contrast the neurotoxic effects of RDX among five phylogenetically disparate species: rat (Sprague-Dawley), Northern bobwhite quail (Colinus virginianus), fathead minnow (Pimephales promelas), earthworm (Eisenia fetida), and coral (Acropora formosa). Pathway enrichment analysis indicated a conservation of RDX impacts on pathways related to neuronal function in rat, Northern bobwhite quail, fathead minnows and earthworm, but not in coral. As evolutionary distance increased common responses decreased with impacts on energy and metabolism dominating effects in coral. A neurotransmission related transcriptional network based on whole rat brain responses to RDX exposure was used to identify functionally related modules of genes, components of which were conserved across species depending upon evolutionary distance. Overall, the meta-analysis using genomic data of the effects of RDX on several species suggested a common and conserved mode of action of the chemical throughout phylogenetically remote organisms.
Collapse
|
15
|
Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One 2010; 5:e13984. [PMID: 21085593 PMCID: PMC2981572 DOI: 10.1371/journal.pone.0013984] [Citation(s) in RCA: 1646] [Impact Index Per Article: 109.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2010] [Accepted: 10/20/2010] [Indexed: 12/13/2022] Open
Abstract
Background Gene-set enrichment analysis is a useful technique to help functionally characterize large gene lists, such as the results of gene expression experiments. This technique finds functionally coherent gene-sets, such as pathways, that are statistically over-represented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of gene-sets used by many current enrichment analysis software works against this ideal. Principal Findings To overcome gene-set redundancy and help in the interpretation of large gene lists, we developed “Enrichment Map”, a network-based visualization method for gene-set enrichment results. Gene-sets are organized in a network, where each set is a node and edges represent gene overlap between sets. Automated network layout groups related gene-sets into network clusters, enabling the user to quickly identify the major enriched functional themes and more easily interpret the enrichment results. Conclusions Enrichment Map is a significant advance in the interpretation of enrichment analysis. Any research project that generates a list of genes can take advantage of this visualization framework. Enrichment Map is implemented as a freely available and user friendly plug-in for the Cytoscape network visualization software (http://baderlab.org/Software/EnrichmentMap/).
Collapse
Affiliation(s)
- Daniele Merico
- Department of Molecular Genetics, Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- * E-mail: (GDB); (DM)
| | - Ruth Isserlin
- Department of Molecular Genetics, Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Oliver Stueker
- Department of Molecular Genetics, Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Andrew Emili
- Department of Molecular Genetics, Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
| | - Gary D. Bader
- Department of Molecular Genetics, Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- * E-mail: (GDB); (DM)
| |
Collapse
|
16
|
Zajac M, Gomez G, Benitez J, Martínez-Delgado B. Molecular signature of response and potential pathways related to resistance to the HSP90 inhibitor, 17AAG, in breast cancer. BMC Med Genomics 2010; 3:44. [PMID: 20920318 PMCID: PMC2959047 DOI: 10.1186/1755-8794-3-44] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Accepted: 10/04/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND HSP90 may be a favorable target for investigational therapy in breast cancer. In fact, the HSP90 inhibitor, 17AAG, currently has entered in phase II clinical trials as an anticancer agent in breast and other tumors. Since HSP90 inhibition leads to global depletion of oncogenic proteins involved in multiple pathways we applied global analysis using gene array technology to study new genes and pathways involved in the drug response in breast cancer. METHODS Gene expression profiling using Whole Human Genome Agilent array technology was applied to a total of six sensitive and two resistant breast cancer cell lines pre-treatment and treated with the 17AAG for 24 and 48 hours. RESULTS We have identified a common molecular signature of response to 17AAG composed of 35 genes which include novel pharmacodynamic markers of this drug. In addition, different patterns of HSP90 client transcriptional changes after 17AAG were identified associated to the sensitive cell lines, which could be useful to evaluate drug effectiveness. Finally, we have found differentially expressed pathways associated to resistance to 17AAG. We observed significant activation of NF-κB and MAPK pathways in resistant cells upon treatment, indicating that these pathways could be potentially targeted to overcome resistance. CONCLUSIONS Our study shows that global mRNA expression analysis is a useful strategy to examine molecular effects of drugs, which allowed us the discovery of new biomarkers of 17AAG activity and provided more insights into the complex mechanism of 17AAG resistance.
Collapse
Affiliation(s)
- Magdalena Zajac
- Human Genetics Group, Spanish National Cancer Centre, Madrid, Spain
| | | | | | | |
Collapse
|
17
|
Davis AP, King BL, Mockus S, Murphy CG, Saraceni-Richards C, Rosenstein M, Wiegers T, Mattingly CJ. The Comparative Toxicogenomics Database: update 2011. Nucleic Acids Res 2010; 39:D1067-72. [PMID: 20864448 PMCID: PMC3013756 DOI: 10.1093/nar/gkq813] [Citation(s) in RCA: 183] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the interaction of environmental chemicals with gene products, and their effects on human health. Biocurators at CTD manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the literature. These core data are then integrated to construct chemical–gene–disease networks and to predict many novel relationships using different types of associated data. Since 2009, we dramatically increased the content of CTD to 1.4 million chemical–gene–disease data points and added many features, statistical analyses and analytical tools, including GeneComps and ChemComps (to find comparable genes and chemicals that share toxicogenomic profiles), enriched Gene Ontology terms associated with chemicals, statistically ranked chemical–disease inferences, Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes or disease, and enhanced gene pathway data content, among other features. Together, this wealth of expanded chemical–gene–disease data continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases. CTD is freely available at http://ctd.mdibl.org.
Collapse
Affiliation(s)
- Allan Peter Davis
- Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Martinez P, Thanasoula M, Carlos AR, Gómez-López G, Tejera AM, Schoeftner S, Dominguez O, Pisano DG, Tarsounas M, Blasco MA. Mammalian Rap1 controls telomere function and gene expression through binding to telomeric and extratelomeric sites. Nat Cell Biol 2010; 12:768-80. [PMID: 20622869 PMCID: PMC3792482 DOI: 10.1038/ncb2081] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2010] [Accepted: 06/01/2010] [Indexed: 12/12/2022]
Abstract
Rap1 is a component of the shelterin complex at mammalian telomeres, but its in vivo role in telomere biology has remained largely unknown to date. Here we show that Rap1 deficiency is dispensable for telomere capping but leads to increased telomere recombination and fragility. We generated cells and mice deleted for Rap1; mice with Rap1 deletion in stratified epithelia were viable but had shorter telomeres and developed skin hyperpigmentation in adulthood. By performing chromatin immunoprecipitation coupled with ultrahigh-throughput sequencing, we found that Rap1 binds to both telomeres and to extratelomeric sites through the (TTAGGG)(2) consensus motif. Extratelomeric Rap1-binding sites were enriched at subtelomeric regions, in agreement with preferential deregulation of subtelomeric genes in Rap1-deficient cells. More than 70% of extratelomeric Rap1-binding sites were in the vicinity of genes, and 31% of the genes deregulated in Rap1-null cells contained Rap1-binding sites, suggesting a role for Rap1 in transcriptional control. These findings place a telomere protein at the interface between telomere function and transcriptional regulation.
Collapse
Affiliation(s)
- Paula Martinez
- Telomeres and Telomerase Group, Molecular Oncology Program, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro 3, Madrid, E-28029, Spain
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
In recent years, there has been an explosion in the range of software available for annotation enrichment analysis. Three classes of enrichment algorithms and their associated software implementations are introduced here. Their limitations and caveats are discussed, and direction for tool selection is given.
Collapse
Affiliation(s)
- Hannah Tipney
- Center for Computational Pharmacology, University of Colorado Denver, Aurora, CO 80045, USA
| | | |
Collapse
|