1
|
Temporal Transcriptomics of Gut Escherichia coli in Caenorhabditis elegans Models of Aging. Microbiol Spectr 2021; 9:e0049821. [PMID: 34523995 PMCID: PMC8557943 DOI: 10.1128/spectrum.00498-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Host-bacterial interactions over the course of aging are understudied due to complexities of the human microbiome and challenges of collecting samples that span a lifetime. To investigate the role of host-microbial interactions in aging, we performed transcriptomics using wild-type Caenorhabditis elegans (N2) and three long-lived mutants (daf-2, eat-2, and asm-3) fed Escherichia coli OP50 and sampled at days 5, 7.5, and 10 of adulthood. We found host age is a better predictor of the E. coli expression profiles than host genotype. Specifically, host age was associated with clustering (permutational multivariate analysis of variance [PERMANOVA], P = 0.001) and variation (Adonis, P = 0.001, R2 = 11.5%) among E. coli expression profiles, whereas host genotype was not (PERMANOVA, P > 0.05; Adonis, P > 0.05, R2 = 5.9%). Differential analysis of the E. coli transcriptome yielded 22 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and 100 KEGG genes enriched when samples were grouped by time point [LDA, linear discriminant analysis; log(LDA), ≥2; P ≤ 0.05], including several involved in biofilm formation. Coexpression analysis of host and bacterial genes yielded six modules of C. elegans genes that were coexpressed with one bacterial regulator gene over time. The three most significant bacterial regulators included genes relating to biofilm formation, lipopolysaccharide production, and thiamine biosynthesis. Age was significantly associated with clustering and variation among transcriptomic samples, supporting the idea that microbes are active and plastic within C. elegans throughout life. Coexpression analysis further revealed interactions between E. coli and C. elegans that occurred over time, building on a growing literature of host-microbial interactions. IMPORTANCE Previous research has reported effects of the microbiome on health span and life span of Caenorhabditis elegans, including interactions with evolutionarily conserved pathways in humans. We build on this literature by reporting the gene expression of Escherichia coli OP50 in wild-type (N2) and three long-lived mutants of C. elegans. The manuscript represents the first study, to our knowledge, to perform temporal host-microbial transcriptomics in the model organism C. elegans. Understanding changes to the microbial transcriptome over time is an important step toward elucidating host-microbial interactions and their potential relationship to aging. We found that age was significantly associated with clustering and variation among transcriptomic samples, supporting the idea that microbes are active and plastic within C. elegans throughout life. Coexpression analysis further revealed interactions between E. coli and C. elegans that occurred over time, which contributes to our growing knowledge about host-microbial interactions.
Collapse
|
2
|
Saint-André V. Computational biology approaches for mapping transcriptional regulatory networks. Comput Struct Biotechnol J 2021; 19:4884-4895. [PMID: 34522292 PMCID: PMC8426465 DOI: 10.1016/j.csbj.2021.08.028] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/16/2021] [Accepted: 08/16/2021] [Indexed: 12/13/2022] Open
Abstract
Transcriptional Regulatory Networks (TRNs) are mainly responsible for the cell-type- or cell-state-specific expression of gene sets from the same DNA sequence. However, so far there are no precise maps of TRNs available for each cell-type or cell-state, and no ideal tool to map those networks clearly and in full from biological samples. In this review, major approaches and tools to map TRNs from high-throughput data are presented, depending on the type of methods or data used to infer them, and their advantages and limitations are discussed. After summarizing the main principles defining the topology and structure–function relationships in TRNs, an overview of the extensive work done to map TRNs from bulk transcriptomic data will be presented by type of methodological approach. Most recent modellings of TRNs using other types of molecular data or integrating different data types, including single-cell RNA-sequencing and chromatin information, will then be discussed, before briefly concluding with improvements expected to come in the field.
Collapse
Affiliation(s)
- Violaine Saint-André
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, Paris, France
| |
Collapse
|
3
|
Gupta C, Ramegowda V, Basu S, Pereira A. Using Network-Based Machine Learning to Predict Transcription Factors Involved in Drought Resistance. Front Genet 2021; 12:652189. [PMID: 34249082 PMCID: PMC8264776 DOI: 10.3389/fgene.2021.652189] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/13/2021] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory networks underpin stress response pathways in plants. However, parsing these networks to prioritize key genes underlying a particular trait is challenging. Here, we have built the Gene Regulation and Association Network (GRAiN) of rice (Oryza sativa). GRAiN is an interactive query-based web-platform that allows users to study functional relationships between transcription factors (TFs) and genetic modules underlying abiotic-stress responses. We built GRAiN by applying a combination of different network inference algorithms to publicly available gene expression data. We propose a supervised machine learning framework that complements GRAiN in prioritizing genes that regulate stress signal transduction and modulate gene expression under drought conditions. Our framework converts intricate network connectivity patterns of 2160 TFs into a single drought score. We observed that TFs with the highest drought scores define the functional, structural, and evolutionary characteristics of drought resistance in rice. Our approach accurately predicted the function of OsbHLH148 TF, which we validated using in vitro protein-DNA binding assays and mRNA sequencing loss-of-function mutants grown under control and drought stress conditions. Our network and the complementary machine learning strategy lends itself to predicting key regulatory genes underlying other agricultural traits and will assist in the genetic engineering of desirable rice varieties.
Collapse
Affiliation(s)
- Chirag Gupta
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Venkategowda Ramegowda
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Supratim Basu
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Andy Pereira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| |
Collapse
|
4
|
Li Y, Ma L, Wu D, Chen G. Advances in bulk and single-cell multi-omics approaches for systems biology and precision medicine. Brief Bioinform 2021; 22:6189773. [PMID: 33778867 DOI: 10.1093/bib/bbab024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Revised: 12/31/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Multi-omics allows the systematic understanding of the information flow across different omics layers, while single omics can mainly reflect one aspect of the biological system. The advancement of bulk and single-cell sequencing technologies and related computational methods for multi-omics largely facilitated the development of system biology and precision medicine. Single-cell approaches have the advantage of dissecting cellular dynamics and heterogeneity, whereas traditional bulk technologies are limited to individual/population-level investigation. In this review, we first summarize the technologies for producing bulk and single-cell multi-omics data. Then, we survey the computational approaches for integrative analysis of bulk and single-cell multimodal data, respectively. Moreover, the databases and data storage for multi-omics, as well as the tools for visualizing multimodal data are summarized. We also outline the integration between bulk and single-cell data, and discuss the applications of multi-omics in precision medicine. Finally, we present the challenges and perspectives for multi-omics development.
Collapse
Affiliation(s)
| | - Lu Ma
- China Normal University, China
| | | | | |
Collapse
|
5
|
Gupta C, Ramegowda V, Basu S, Pereira A. Using Network-Based Machine Learning to Predict Transcription Factors Involved in Drought Resistance. Front Genet 2021. [PMID: 34249082 DOI: 10.1101/2020.04.29.068379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023] Open
Abstract
Gene regulatory networks underpin stress response pathways in plants. However, parsing these networks to prioritize key genes underlying a particular trait is challenging. Here, we have built the Gene Regulation and Association Network (GRAiN) of rice (Oryza sativa). GRAiN is an interactive query-based web-platform that allows users to study functional relationships between transcription factors (TFs) and genetic modules underlying abiotic-stress responses. We built GRAiN by applying a combination of different network inference algorithms to publicly available gene expression data. We propose a supervised machine learning framework that complements GRAiN in prioritizing genes that regulate stress signal transduction and modulate gene expression under drought conditions. Our framework converts intricate network connectivity patterns of 2160 TFs into a single drought score. We observed that TFs with the highest drought scores define the functional, structural, and evolutionary characteristics of drought resistance in rice. Our approach accurately predicted the function of OsbHLH148 TF, which we validated using in vitro protein-DNA binding assays and mRNA sequencing loss-of-function mutants grown under control and drought stress conditions. Our network and the complementary machine learning strategy lends itself to predicting key regulatory genes underlying other agricultural traits and will assist in the genetic engineering of desirable rice varieties.
Collapse
Affiliation(s)
- Chirag Gupta
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Venkategowda Ramegowda
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Supratim Basu
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Andy Pereira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| |
Collapse
|
6
|
Erola P, Björkegren JLM, Michoel T. Model-based clustering of multi-tissue gene expression data. Bioinformatics 2020; 36:1807-1813. [PMID: 31688915 PMCID: PMC7162352 DOI: 10.1093/bioinformatics/btz805] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Revised: 09/05/2019] [Accepted: 10/31/2019] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre (ICMC), Karolinska Institutet, Huddinge 141 57, Sweden
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen N-5020, Norway
| |
Collapse
|
7
|
Kimotho RN, Baillo EH, Zhang Z. Transcription factors involved in abiotic stress responses in Maize ( Zea mays L.) and their roles in enhanced productivity in the post genomics era. PeerJ 2019; 7:e7211. [PMID: 31328030 PMCID: PMC6622165 DOI: 10.7717/peerj.7211] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 05/26/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Maize (Zea mays L.) is a principal cereal crop cultivated worldwide for human food, animal feed, and more recently as a source of biofuel. However, as a direct consequence of water insufficiency and climate change, frequent occurrences of both biotic and abiotic stresses have been reported in various regions around the world, and recently, this has become a constant threat in increasing global maize yields. Plants respond to abiotic stresses by utilizing the activities of transcription factors (TFs), which are families of genes coding for specific TF proteins. TF target genes form a regulon that is involved in the repression/activation of genes associated with abiotic stress responses. Therefore, it is of utmost importance to have a systematic study on each TF family, the downstream target genes they regulate, and the specific TF genes involved in multiple abiotic stress responses in maize and other staple crops. METHOD In this review, the main TF families, the specific TF genes and their regulons that are involved in abiotic stress regulation will be briefly discussed. Great emphasis will be given on maize abiotic stress improvement throughout this review, although other examples from different plants like rice, Arabidopsis, wheat, and barley will be used. RESULTS We have described in detail the main TF families in maize that take part in abiotic stress responses together with their regulons. Furthermore, we have also briefly described the utilization of high-efficiency technologies in the study and characterization of TFs involved in the abiotic stress regulatory networks in plants with an emphasis on increasing maize production. Examples of these technologies include next-generation sequencing, microarray analysis, machine learning, and RNA-Seq. CONCLUSION In conclusion, it is expected that all the information provided in this review will in time contribute to the use of TF genes in the research, breeding, and development of new abiotic stress tolerant maize cultivars.
Collapse
Affiliation(s)
- Roy Njoroge Kimotho
- Key Laboratory of Agricultural Water Resources, Hebei Laboratory of Agricultural Water Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, Hebei, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Elamin Hafiz Baillo
- Key Laboratory of Agricultural Water Resources, Hebei Laboratory of Agricultural Water Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, Hebei, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhengbin Zhang
- Key Laboratory of Agricultural Water Resources, Hebei Laboratory of Agricultural Water Saving, Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang, Hebei, China
- University of Chinese Academy of Sciences, Beijing, China
- Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
8
|
Siahpirani AF, Chasman D, Roy S. Integrative Approaches for Inference of Genome-Scale Gene Regulatory Networks. Methods Mol Biol 2019; 1883:161-194. [PMID: 30547400 DOI: 10.1007/978-1-4939-8882-2_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Transcriptional regulatory networks specify the regulatory proteins of target genes that control the context-specific expression levels of genes. With our ability to profile the different types of molecular components of cells under different conditions, we are now uniquely positioned to infer regulatory networks in diverse biological contexts such as different cell types, tissues, and time points. In this chapter, we cover two main classes of computational methods to integrate different types of information to infer genome-scale transcriptional regulatory networks. The first class of methods focuses on integrative methods for specifically inferring connections between transcription factors and target genes by combining gene expression data with regulatory edge-specific knowledge. The second class of methods integrates upstream signaling networks with transcriptional regulatory networks by combining gene expression data with protein-protein interaction networks and proteomic datasets. We conclude with a section on practical applications of a network inference algorithm to infer a genome-scale regulatory network.
Collapse
Affiliation(s)
- Alireza Fotuhi Siahpirani
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.,Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA
| | - Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA. .,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
9
|
Huynh-Thu VA, Geurts P. Unsupervised Gene Network Inference with Decision Trees and Random Forests. Methods Mol Biol 2019; 1883:195-215. [PMID: 30547401 DOI: 10.1007/978-1-4939-8882-2_8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference.
Collapse
Affiliation(s)
- Vân Anh Huynh-Thu
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium.
| | - Pierre Geurts
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
| |
Collapse
|
10
|
Erola P, Bonnet E, Michoel T. Learning Differential Module Networks Across Multiple Experimental Conditions. Methods Mol Biol 2019; 1883:303-321. [PMID: 30547406 DOI: 10.1007/978-1-4939-8882-2_13] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Module network inference is a statistical method to reconstruct gene regulatory networks, which uses probabilistic graphical models to learn modules of coregulated genes and their upstream regulatory programs from genome-wide gene expression and other omics data. Here, we review the basic theory of module network inference, present protocols for common gene regulatory network reconstruction scenarios based on the Lemon-Tree software, and show, using human gene expression data, how the software can also be applied to learn differential module networks across multiple experimental conditions.
Collapse
Affiliation(s)
- Pau Erola
- Division of Genetics and Genomics, Roslin Institute, University of Edinburgh, Midlothian, Scotland, UK
| | - Eric Bonnet
- Centre National de Recherche en Génomique Humaine, Institut de Biologie François Jacob, Direction de la Recherche Fondamentale, CEA, Evry, France
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, University of Edinburgh, Midlothian, Scotland, UK.
- Current Address: Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
11
|
|
12
|
Lu Y, Zhou X, Nardini C. Dissection of the module network implementation "LemonTree": enhancements towards applications in metagenomics and translation in autoimmune maladies. MOLECULAR BIOSYSTEMS 2018; 13:2083-2091. [PMID: 28809429 DOI: 10.1039/c7mb00248c] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Under the current deluge of omics, module networks distinctively emerge as methods capable of not only identifying inherently coherent groups (modules), thus reducing dimensionality, but also hypothesizing cause-effect relationships between modules and their regulators. Module networks were first designed in the transcriptomic era and further exploited in the multi-omic context to assess (for example) miRNA regulation of gene expression. Despite a number of available implementations, expansion of module networks to other omics is constrained by a limited characterization of the solutions' (modules plus regulators) accuracy and stability - an immediate need for the better characterization of molecular biology complexity in silico. We hence carefully assessed for LemonTree - a popular and open source module network implementation - the dependency of the software performances (sensitivity, specificity, false discovery rate, solutions' stability) on the input parameters and on the data quality (sample size, expression noise) based on synthetic and real data. In the process, we uncovered and fixed an issue in the code for the regulator assignment procedure. We concluded this evaluation with a table of recommended parameter settings. Finally, we applied these recommended settings to gut-intestinal metagenomic data from rheumatoid arthritis patients, to characterize the evolution of the gut-intestinal microbiome under different pharmaceutical regimens (methotrexate and prednisone) and we inferred innovative clinical recommendations with therapeutic potential, based on the computed module network.
Collapse
Affiliation(s)
- Youtao Lu
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China
| | | | | |
Collapse
|
13
|
Pirayre A, Couprie C, Duval L, Pesquet JC. BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:850-860. [PMID: 28368827 DOI: 10.1109/tcbb.2017.2688355] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells. Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information. It works as a post-processing tool for inference methods (i.e., CLR, GENIE3). In BRANE Clust, the clustering is based on the inversion of a system of linear equations involving a graph-Laplacian matrix promoting a modular structure. Our approach is validated on DREAM4 and DREAM5 datasets with objective measures, showing significant comparative improvements. We provide additional insights on the discovery of novel regulatory or co-expressed links in the inferred Escherichia coli network evaluated using the STRING database. The comparative pertinence of clustering is discussed computationally (SIMoNe, WGCNA, X-means) and biologically (RegulonDB). BRANE Clust software is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-clust.html.
Collapse
|
14
|
Tabe-Bordbar S, Emad A, Zhao SD, Sinha S. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Sci Rep 2018; 8:6620. [PMID: 29700343 PMCID: PMC5920056 DOI: 10.1038/s41598-018-24937-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 04/09/2018] [Indexed: 11/26/2022] Open
Abstract
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn’t hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model’s generalizability compared to CCV. Next, we defined the ‘distinctness’ of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
Collapse
Affiliation(s)
- Shayan Tabe-Bordbar
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Amin Emad
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America. .,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, United States of America.
| |
Collapse
|
15
|
Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017; 2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Transcriptional regulatory networks are at the core of establishing cell type specific gene expression programs. In mammalian systems, such regulatory networks are determined by multiple levels of regulation, including by transcription factors, chromatin environment, and three-dimensional organization of the genome. Recent efforts to measure diverse regulatory genomic datasets across multiple cell types and tissues offer unprecedented opportunities to examine the context-specificity and dynamics of regulatory networks at a greater resolution and scale than before. In parallel, numerous computational approaches to analyze these data have emerged that serve as important tools for understanding mammalian cell type specific regulation. In this article, we review recent computational approaches to predict the expression and sequence-based regulators of a gene's expression level and examine long-range gene regulation. We highlight promising approaches, insights gained, and open challenges that need to be overcome to build a comprehensive picture of cell type specific transcriptional regulatory networks.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715
| | - Sushmita Roy
- Wisconsin Institute for Discovery University of Wisconsin-Madison, Madison, WI 53715.,Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, WI 53792
| |
Collapse
|
16
|
Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]
Abstract
Gene regulatory networks lie at the core of cell function control. In E. coli and S. cerevisiae, the study of gene regulatory networks has led to the discovery of regulatory mechanisms responsible for the control of cell growth, differentiation and responses to environmental stimuli. In plants, computational rendering of gene regulatory networks is gaining momentum, thanks to the recent availability of high-quality genomes and transcriptomes and development of computational network inference approaches. Here, we review current techniques, challenges and trends in gene regulatory network inference and highlight challenges and opportunities for plant science. We provide plant-specific application examples to guide researchers in selecting methodologies that suit their particular research questions. Given the interdisciplinary nature of gene regulatory network inference, we tried to cater to both biologists and computer scientists to help them engage in a dialogue about concepts and caveats in network inference. Specifically, we discuss problems and opportunities in heterogeneous data integration for eukaryotic organisms and common caveats to be considered during network model evaluation. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Collapse
Affiliation(s)
- Michael Banf
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama Street, Stanford 93405, United States.
| |
Collapse
|
17
|
Arhondakis S, Bita CE, Perrakis A, Manioudaki ME, Krokida A, Kaloudas D, Kalaitzis P. In silico Transcriptional Regulatory Networks Involved in Tomato Fruit Ripening. FRONTIERS IN PLANT SCIENCE 2016; 7:1234. [PMID: 27625653 PMCID: PMC5003879 DOI: 10.3389/fpls.2016.01234] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2016] [Accepted: 08/03/2016] [Indexed: 05/18/2023]
Abstract
Tomato fruit ripening is a complex developmental programme partly mediated by transcriptional regulatory networks. Several transcription factors (TFs) which are members of gene families such as MADS-box and ERF were shown to play a significant role in ripening through interconnections into an intricate network. The accumulation of large datasets of expression profiles corresponding to different stages of tomato fruit ripening and the availability of bioinformatics tools for their analysis provide an opportunity to identify TFs which might regulate gene clusters with similar co-expression patterns. We identified two TFs, a SlWRKY22-like and a SlER24 transcriptional activator which were shown to regulate modules by using the LeMoNe algorithm for the analysis of our microarray datasets representing four stages of fruit ripening, breaker, turning, pink and red ripe. The WRKY22-like module comprised a subgroup of six various calcium sensing transcripts with similar to the TF expression patterns according to real time PCR validation. A promoter motif search identified a cis acting element, the W-box, recognized by WRKY TFs that was present in the promoter region of all six calcium sensing genes. Moreover, publicly available microarray datasets of similar ripening stages were also analyzed with LeMoNe resulting in TFs such as SlERF.E1, SlERF.C1, SlERF.B2, SLERF.A2, SlWRKY24, SLWRKY37, and MADS-box/TM29 which might also play an important role in regulation of ripening. These results suggest that the SlWRKY22-like might be involved in the coordinated regulation of expression of the six calcium sensing genes. Conclusively the LeMoNe tool might lead to the identification of putative TF targets for further physiological analysis as regulators of tomato fruit ripening.
Collapse
|
18
|
Liu Q, Song R, Li J. Inference of gene interaction networks using conserved subsequential patterns from multiple time course gene expression datasets. BMC Genomics 2015; 16 Suppl 12:S4. [PMID: 26681650 PMCID: PMC4682423 DOI: 10.1186/1471-2164-16-s12-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation Deciphering gene interaction networks (GINs) from time-course gene expression (TCGx) data is highly valuable to understand gene behaviors (e.g., activation, inhibition, time-lagged causality) at the system level. Existing methods usually use a global or local proximity measure to infer GINs from a single dataset. As the noise contained in a single data set is hardly self-resolved, the results are sometimes not reliable. Also, these proximity measurements cannot handle the co-existence of the various in vivo positive, negative and time-lagged gene interactions. Methods and results We propose to infer reliable GINs from multiple TCGx datasets using a novel conserved subsequential pattern of gene expression. A subsequential pattern is a maximal subset of genes sharing positive, negative or time-lagged correlations of one expression template on their own subsets of time points. Based on these patterns, a GIN can be built from each of the datasets. It is assumed that reliable gene interactions would be detected repeatedly. We thus use conserved gene pairs from the individual GINs of the multiple TCGx datasets to construct a reliable GIN for a species. We apply our method on six TCGx datasets related to yeast cell cycle, and validate the reliable GINs using protein interaction networks, biopathways and transcription factor-gene regulations. We also compare the reliable GINs with those GINs reconstructed by a global proximity measure Pearson correlation coefficient method from single datasets. It has been demonstrated that our reliable GINs achieve much better prediction performance especially with much higher precision. The functional enrichment analysis also suggests that gene sets in a reliable GIN are more functionally significant. Our method is especially useful to decipher GINs from multiple TCGx datasets related to less studied organisms where little knowledge is available except gene expression data.
Collapse
|
19
|
Chen D, Zhang Z, Meng Y. Systematic Tracking of Disrupted Modules Identifies Altered Pathways Associated with Congenital Heart Defects in Down Syndrome. Med Sci Monit 2015; 21:3334-42. [PMID: 26524729 PMCID: PMC4635630 DOI: 10.12659/msm.896001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND This work aimed to identify altered pathways in congenital heart defects (CHD) in Down syndrome (DS) by systematically tracking the dysregulated modules of reweighted protein-protein interaction (PPI) networks. MATERIAL AND METHODS We performed systematic identification and comparison of modules across normal and disease conditions by integrating PPI and gene-expression data. Based on Pearson correlation coefficient (PCC), normal and disease PPI networks were inferred and reweighted. Then, modules in the PPI network were explored by clique-merging algorithm; altered modules were identified via maximum weight bipartite matching and ranked in non-increasing order. Finally, pathways enrichment analysis of genes in altered modules was carried out based on Database for Annotation, Visualization, and Integrated Discovery (DAVID) to study the biological pathways in CHD in DS. RESULTS Our analyses revealed that 348 altered modules were identified by comparing modules in normal and disease PPI networks. Pathway functional enrichment analysis of disrupted module genes showed that the 4 most significantly altered pathways were: ECM-receptor interaction, purine metabolism, focal adhesion, and dilated cardiomyopathy. CONCLUSIONS We successfully identified 4 altered pathways and we predicted that these pathways would be good indicators for CHD in DS.
Collapse
Affiliation(s)
- Denghong Chen
- Department of Obstetrics, Jining No. 1 People's Hospital, Jining, Shandong, China (mainland)
| | - Zhenhua Zhang
- Department of Children's Health Prevention, Jining No. 1 People's Hospital, Jining, Shandong, China (mainland)
| | - Yuxiu Meng
- Department of Neonatology, Jining No. 1 People's Hospital, Jining, Shandong, China (mainland)
| |
Collapse
|
20
|
Li Y, Pearl SA, Jackson SA. Gene Networks in Plant Biology: Approaches in Reconstruction and Analysis. TRENDS IN PLANT SCIENCE 2015; 20:664-675. [PMID: 26440435 DOI: 10.1016/j.tplants.2015.06.013] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 06/28/2015] [Accepted: 06/30/2015] [Indexed: 05/25/2023]
Abstract
Even though vast amounts of genome-wide gene expression data have become available in plants, it remains a challenge to effectively mine this information for the discovery of genes and gene networks, for instance those that control agronomically important traits. These networks reflect potential interactions among genes and, therefore, can lead to a systematic understanding of the molecular mechanisms underlying targeted biological processes. We discuss methods to analyze gene networks using gene expression data, specifically focusing on four common statistical approaches used to reconstruct networks: correlation, feature selection in supervised learning, probabilistic graphical model, and meta-prediction. In addition, we discuss the effective use of these methods for acquiring an in-depth understanding of biological systems in plants.
Collapse
Affiliation(s)
- Yupeng Li
- Center for Applied Genetic Technologies, University of Georgia, 111 Riverbend Road, Athens, GA 30602; Institute of Plant Breeding, Genetics and Genomics, University of Georgia, 111 Riverbend Road, Athens, GA 30602; Department of Statistics, University of Georgia, 101 Cedar Street, Athens, GA 30602
| | - Stephanie A Pearl
- Center for Applied Genetic Technologies, University of Georgia, 111 Riverbend Road, Athens, GA 30602
| | - Scott A Jackson
- Center for Applied Genetic Technologies, University of Georgia, 111 Riverbend Road, Athens, GA 30602; Institute of Plant Breeding, Genetics and Genomics, University of Georgia, 111 Riverbend Road, Athens, GA 30602.
| |
Collapse
|
21
|
Bai Y, Dougherty L, Cheng L, Zhong GY, Xu K. Uncovering co-expression gene network modules regulating fruit acidity in diverse apples. BMC Genomics 2015; 16:612. [PMID: 26276125 PMCID: PMC4537561 DOI: 10.1186/s12864-015-1816-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2015] [Accepted: 08/05/2015] [Indexed: 11/10/2022] Open
Abstract
Background Acidity is a major contributor to fruit quality. Several organic acids are present in apple fruit, but malic acid is predominant and determines fruit acidity. The trait is largely controlled by the Malic acid (Ma) locus, underpinning which Ma1 that putatively encodes a vacuolar aluminum-activated malate transporter1 (ALMT1)-like protein is a strong candidate gene. We hypothesize that fruit acidity is governed by a gene network in which Ma1 is key member. The goal of this study is to identify the gene network and the potential mechanisms through which the network operates. Results Guided by Ma1, we analyzed the transcriptomes of mature fruit of contrasting acidity from six apple accessions of genotype Ma_ (MaMa or Mama) and four of mama using RNA-seq and identified 1301 fruit acidity associated genes, among which 18 were most significant acidity genes (MSAGs). Network inferring using weighted gene co-expression network analysis (WGCNA) revealed five co-expression gene network modules of significant (P < 0.001) correlation with malate. Of these, the Ma1 containing module (Turquoise) of 336 genes showed the highest correlation (0.79). We also identified 12 intramodular hub genes from each of the five modules and 18 enriched gene ontology (GO) terms and MapMan sub-bines, including two GO terms (GO:0015979 and GO:0009765) and two MapMap sub-bins (1.3.4 and 1.1.1.1) related to photosynthesis in module Turquoise. Using Lemon-Tree algorithms, we identified 12 regulator genes of probabilistic scores 35.5–81.0, including MDP0000525602 (a LLR receptor kinase), MDP0000319170 (an IQD2-like CaM binding protein) and MDP0000190273 (an EIN3-like transcription factor) of greater interest for being one of the 18 MSAGs or one of the 12 intramodular hub genes in Turquoise, and/or a regulator to the cluster containing Ma1. Conclusions The most relevant finding of this study is the identification of the MSAGs, intramodular hub genes, enriched photosynthesis related processes, and regulator genes in a WGCNA module Turquoise that not only encompasses Ma1 but also shows the highest modular correlation with acidity. Overall, this study provides important insight into the Ma1-mediated gene network controlling acidity in mature apple fruit of diverse genetic background. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1816-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yang Bai
- Horticulture Section, School of Integrative Plant Science, Cornell University, New York State Agricultural Experiment Station, Geneva, NY, 14456, USA.
| | - Laura Dougherty
- Horticulture Section, School of Integrative Plant Science, Cornell University, New York State Agricultural Experiment Station, Geneva, NY, 14456, USA.
| | - Lailiang Cheng
- Horticulture Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, 14853, USA.
| | - Gan-Yuan Zhong
- USDA-ARS, Plant Genetic resource and Grape Genetic Research Units, Geneva, NY, 14456, USA.
| | - Kenong Xu
- Horticulture Section, School of Integrative Plant Science, Cornell University, New York State Agricultural Experiment Station, Geneva, NY, 14456, USA.
| |
Collapse
|
22
|
An integrated approach to reconstructing genome-scale transcriptional regulatory networks. PLoS Comput Biol 2015; 11:e1004103. [PMID: 25723545 PMCID: PMC4344238 DOI: 10.1371/journal.pcbi.1004103] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 12/23/2014] [Indexed: 11/24/2022] Open
Abstract
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions. The ever growing amount of genomic data enables the assembly of large-scale network models that can provide important new insights into living systems. However, assembly and validation of such large-scale models can be challenging, since we often lack sufficient information to make accurate predictions. This work describes a new approach for constructing large-scale transcriptional regulatory networks of individual cells. We show that the reconstructed network captures a significantly larger fraction of cellular regulatory processes than networks generated by other existing approaches. We predict this approach, with appropriate refinements, will allow reconstruction of large-scale transcriptional network models for a variety of other organisms. As we work towards modeling the function of cells or complex ecosystems, individually reconstructed network models of signaling, information transfer and metabolism, can be integrated to provide high information predictions and insights not otherwise obtainable.
Collapse
|
23
|
Bonnet E, Calzone L, Michoel T. Integrative multi-omics module network inference with Lemon-Tree. PLoS Comput Biol 2015; 11:e1003983. [PMID: 25679508 PMCID: PMC4332478 DOI: 10.1371/journal.pcbi.1003983] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 10/14/2014] [Indexed: 01/05/2023] Open
Abstract
Module network inference is an established statistical method to reconstruct co-expression modules and their upstream regulatory programs from integrated multi-omics datasets measuring the activity levels of various cellular components across different individuals, experimental conditions or time points of a dynamic process. We have developed Lemon-Tree, an open-source, platform-independent, modular, extensible software package implementing state-of-the-art ensemble methods for module network inference. We benchmarked Lemon-Tree using large-scale tumor datasets and showed that Lemon-Tree algorithms compare favorably with state-of-the-art module network inference software. We also analyzed a large dataset of somatic copy-number alterations and gene expression levels measured in glioblastoma samples from The Cancer Genome Atlas and found that Lemon-Tree correctly identifies known glioblastoma oncogenes and tumor suppressors as master regulators in the inferred module network. Novel candidate driver genes predicted by Lemon-Tree were validated using tumor pathway and survival analyses. Lemon-Tree is available from http://lemon-tree.googlecode.com under the GNU General Public License version 2.0.
Collapse
Affiliation(s)
- Eric Bonnet
- Institut Curie, Paris, France
- INSERM U900, Paris, France
- Mines ParisTech, Fontainebleau, France
- * E-mail: (EB); (TM)
| | - Laurence Calzone
- Institut Curie, Paris, France
- INSERM U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| | - Tom Michoel
- Division of Genetics & Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush, Midlothian, United Kingdom
- * E-mail: (EB); (TM)
| |
Collapse
|
24
|
Vermeirssen V, De Clercq I, Van Parys T, Van Breusegem F, Van de Peer Y. Arabidopsis ensemble reverse-engineered gene regulatory network discloses interconnected transcription factors in oxidative stress. THE PLANT CELL 2014; 26:4656-79. [PMID: 25549671 PMCID: PMC4311199 DOI: 10.1105/tpc.114.131417] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Revised: 11/27/2014] [Accepted: 12/10/2014] [Indexed: 05/19/2023]
Abstract
The abiotic stress response in plants is complex and tightly controlled by gene regulation. We present an abiotic stress gene regulatory network of 200,014 interactions for 11,938 target genes by integrating four complementary reverse-engineering solutions through average rank aggregation on an Arabidopsis thaliana microarray expression compendium. This ensemble performed the most robustly in benchmarking and greatly expands upon the availability of interactions currently reported. Besides recovering 1182 known regulatory interactions, cis-regulatory motifs and coherent functionalities of target genes corresponded with the predicted transcription factors. We provide a valuable resource of 572 abiotic stress modules of coregulated genes with functional and regulatory information, from which we deduced functional relationships for 1966 uncharacterized genes and many regulators. Using gain- and loss-of-function mutants of seven transcription factors grown under control and salt stress conditions, we experimentally validated 141 out of 271 predictions (52% precision) for 102 selected genes and mapped 148 additional transcription factor-gene regulatory interactions (49% recall). We identified an intricate core oxidative stress regulatory network where NAC13, NAC053, ERF6, WRKY6, and NAC032 transcription factors interconnect and function in detoxification. Our work shows that ensemble reverse-engineering can generate robust biological hypotheses of gene regulation in a multicellular eukaryote that can be tested by medium-throughput experimental validation.
Collapse
Affiliation(s)
- Vanessa Vermeirssen
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Inge De Clercq
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Thomas Van Parys
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Frank Van Breusegem
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| |
Collapse
|
25
|
An integrated cell purification and genomics strategy reveals multiple regulators of pancreas development. PLoS Genet 2014; 10:e1004645. [PMID: 25330008 PMCID: PMC4199491 DOI: 10.1371/journal.pgen.1004645] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 08/02/2014] [Indexed: 12/15/2022] Open
Abstract
The regulatory logic underlying global transcriptional programs controlling development of visceral organs like the pancreas remains undiscovered. Here, we profiled gene expression in 12 purified populations of fetal and adult pancreatic epithelial cells representing crucial progenitor cell subsets, and their endocrine or exocrine progeny. Using probabilistic models to decode the general programs organizing gene expression, we identified co-expressed gene sets in cell subsets that revealed patterns and processes governing progenitor cell development, lineage specification, and endocrine cell maturation. Purification of Neurog3 mutant cells and module network analysis linked established regulators such as Neurog3 to unrecognized gene targets and roles in pancreas development. Iterative module network analysis nominated and prioritized transcriptional regulators, including diabetes risk genes. Functional validation of a subset of candidate regulators with corresponding mutant mice revealed that the transcription factors Etv1, Prdm16, Runx1t1 and Bcl11a are essential for pancreas development. Our integrated approach provides a unique framework for identifying regulatory genes and functional gene sets underlying pancreas development and associated diseases such as diabetes mellitus. Discovery of specific pancreas developmental regulators has accelerated in recent years. In contrast, the global regulatory programs controlling pancreas development are poorly understood compared to other organs or tissues like heart or blood. Decoding this regulatory logic may accelerate development of replacement organs from renewable sources like stem cells, but this goal requires identification of regulators and assessment of their functions on a global scale. To address this important challenge for pancreas biology, we combined purification of normal and mutant cells with genome-scale methods to generate and analyze expression profiles from developing pancreas cells. Our work revealed regulatory gene sets governing development of pancreas progenitor cells and their progeny. Our integrative approach nominated multiple pancreas developmental regulators, including suspected risk genes for human diabetes, which we validated by phenotyping mutant mice on a scale not previously reported. Selection of these candidate regulators was unbiased; thus it is remarkable that all were essential for pancreatic islet development. Thus, our studies provide a new heuristic resource for identifying genetic functions underlying pancreas development and diseases like diabetes mellitus.
Collapse
|
26
|
Kogelman LJA, Cirera S, Zhernakova DV, Fredholm M, Franke L, Kadarmideen HN. Identification of co-expression gene networks, regulatory genes and pathways for obesity based on adipose tissue RNA Sequencing in a porcine model. BMC Med Genomics 2014; 7:57. [PMID: 25270054 PMCID: PMC4183073 DOI: 10.1186/1755-8794-7-57] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Accepted: 09/24/2014] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Obesity is a complex metabolic condition in strong association with various diseases, like type 2 diabetes, resulting in major public health and economic implications. Obesity is the result of environmental and genetic factors and their interactions, including genome-wide genetic interactions. Identification of co-expressed and regulatory genes in RNA extracted from relevant tissues representing lean and obese individuals provides an entry point for the identification of genes and pathways of importance to the development of obesity. The pig, an omnivorous animal, is an excellent model for human obesity, offering the possibility to study in-depth organ-level transcriptomic regulations of obesity, unfeasible in humans. Our aim was to reveal adipose tissue co-expression networks, pathways and transcriptional regulations of obesity using RNA Sequencing based systems biology approaches in a porcine model. METHODS We selected 36 animals for RNA Sequencing from a previously created F2 pig population representing three extreme groups based on their predicted genetic risks for obesity. We applied Weighted Gene Co-expression Network Analysis (WGCNA) to detect clusters of highly co-expressed genes (modules). Additionally, regulator genes were detected using Lemon-Tree algorithms. RESULTS WGCNA revealed five modules which were strongly correlated with at least one obesity-related phenotype (correlations ranging from -0.54 to 0.72, P < 0.001). Functional annotation identified pathways enlightening the association between obesity and other diseases, like osteoporosis (osteoclast differentiation, P = 1.4E-7), and immune-related complications (e.g. Natural killer cell mediated cytotoxity, P = 3.8E-5; B cell receptor signaling pathway, P = 7.2E-5). Lemon-Tree identified three potential regulator genes, using confident scores, for the WGCNA module which was associated with osteoclast differentiation: CCR1, MSR1 and SI1 (probability scores respectively 95.30, 62.28, and 34.58). Moreover, detection of differentially connected genes identified various genes previously identified to be associated with obesity in humans and rodents, e.g. CSF1R and MARC2. CONCLUSIONS To our knowledge, this is the first study to apply systems biology approaches using porcine adipose tissue RNA-Sequencing data in a genetically characterized porcine model for obesity. We revealed complex networks, pathways, candidate and regulatory genes related to obesity, confirming the complexity of obesity and its association with immune-related disorders and osteoporosis.
Collapse
Affiliation(s)
| | | | | | | | | | - Haja N Kadarmideen
- Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870, Frederiksberg, Denmark.
| |
Collapse
|
27
|
iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014; 10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 600] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open
Abstract
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.
Collapse
|
28
|
Kogelman LJA, Pant SD, Fredholm M, Kadarmideen HN. Systems genetics of obesity in an F2 pig model by genome-wide association, genetic network, and pathway analyses. Front Genet 2014; 5:214. [PMID: 25071839 PMCID: PMC4087325 DOI: 10.3389/fgene.2014.00214] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Accepted: 06/20/2014] [Indexed: 11/29/2022] Open
Abstract
Obesity is a complex condition with world-wide exponentially rising prevalence rates, linked with severe diseases like Type 2 Diabetes. Economic and welfare consequences have led to a raised interest in a better understanding of the biological and genetic background. To date, whole genome investigations focusing on single genetic variants have achieved limited success, and the importance of including genetic interactions is becoming evident. Here, the aim was to perform an integrative genomic analysis in an F2 pig resource population that was constructed with an aim to maximize genetic variation of obesity-related phenotypes and genotyped using the 60K SNP chip. Firstly, Genome Wide Association (GWA) analysis was performed on the Obesity Index to locate candidate genomic regions that were further validated using combined Linkage Disequilibrium Linkage Analysis and investigated by evaluation of haplotype blocks. We built Weighted Interaction SNP Hub (WISH) and differentially wired (DW) networks using genotypic correlations amongst obesity-associated SNPs resulting from GWA analysis. GWA results and SNP modules detected by WISH and DW analyses were further investigated by functional enrichment analyses. The functional annotation of SNPs revealed several genes associated with obesity, e.g., NPC2 and OR4D10. Moreover, gene enrichment analyses identified several significantly associated pathways, over and above the GWA study results, that may influence obesity and obesity related diseases, e.g., metabolic processes. WISH networks based on genotypic correlations allowed further identification of various gene ontology terms and pathways related to obesity and related traits, which were not identified by the GWA study. In conclusion, this is the first study to develop a (genetic) obesity index and employ systems genetics in a porcine model to provide important insights into the complex genetic architecture associated with obesity and many biological pathways that underlie it.
Collapse
Affiliation(s)
- Lisette J A Kogelman
- Animal Genetics, Bioinformatics and Breeding Section, Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen Copenhagen, Denmark
| | - Sameer D Pant
- Animal Genetics, Bioinformatics and Breeding Section, Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen Copenhagen, Denmark
| | - Merete Fredholm
- Animal Genetics, Bioinformatics and Breeding Section, Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen Copenhagen, Denmark
| | - Haja N Kadarmideen
- Animal Genetics, Bioinformatics and Breeding Section, Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen Copenhagen, Denmark
| |
Collapse
|
29
|
Corrado G, Tebaldi T, Bertamini G, Costa F, Quattrone A, Viero G, Passerini A. PTRcombiner: mining combinatorial regulation of gene expression from post-transcriptional interaction maps. BMC Genomics 2014; 15:304. [PMID: 24758252 PMCID: PMC4234518 DOI: 10.1186/1471-2164-15-304] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 04/02/2014] [Indexed: 02/07/2023] Open
Abstract
Background The progress in mapping RNA-protein and RNA-RNA interactions at the transcriptome-wide level paves the way to decipher possible combinatorial patterns embedded in post-transcriptional regulation of gene expression. Results Here we propose an innovative computational tool to extract clusters of mRNA trans-acting co-regulators (RNA binding proteins and non-coding RNAs) from pairwise interaction annotations. In addition the tool allows to analyze the binding site similarity of co-regulators belonging to the same cluster, given their positional binding information. The tool has been tested on experimental collections of human and yeast interactions, identifying modules that coordinate functionally related messages. Conclusions This tool is an original attempt to uncover combinatorial patterns using all the post-transcriptional interaction data available so far. PTRcombiner is available at http://disi.unitn.it/~passerini/software/PTRcombiner/.
Collapse
Affiliation(s)
| | | | | | | | | | - Gabriella Viero
- Department of Information Engineering and Computer Science (DISI), University of Trento, 38123 Trento, Italy.
| | | |
Collapse
|
30
|
Brignull LM, Czimmerer Z, Saidi H, Daniel B, Villela I, Bartlett NW, Johnston SL, Meira LB, Nagy L, Nohturfft A. Reprogramming of lysosomal gene expression by interleukin-4 and Stat6. BMC Genomics 2013; 14:853. [PMID: 24314139 PMCID: PMC3880092 DOI: 10.1186/1471-2164-14-853] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/26/2013] [Indexed: 01/05/2023] Open
Abstract
Background Lysosomes play important roles in multiple aspects of physiology, but the problem of how the transcription of lysosomal genes is coordinated remains incompletely understood. The goal of this study was to illuminate the physiological contexts in which lysosomal genes are coordinately regulated and to identify transcription factors involved in this control. Results As transcription factors and their target genes are often co-regulated, we performed meta-analyses of array-based expression data to identify regulators whose mRNA profiles are highly correlated with those of a core set of lysosomal genes. Among the ~50 transcription factors that rank highest by this measure, 65% are involved in differentiation or development, and 22% have been implicated in interferon signaling. The most strongly correlated candidate was Stat6, a factor commonly activated by interleukin-4 (IL-4) or IL-13. Publicly available chromatin immunoprecipitation (ChIP) data from alternatively activated mouse macrophages show that lysosomal genes are overrepresented among Stat6-bound targets. Quantification of RNA from wild-type and Stat6-deficient cells indicates that Stat6 promotes the expression of over 100 lysosomal genes, including hydrolases, subunits of the vacuolar H+ ATPase and trafficking factors. While IL-4 inhibits and activates different sets of lysosomal genes, Stat6 mediates only the activating effects of IL-4, by promoting increased expression and by neutralizing undefined inhibitory signals induced by IL-4. Conclusions The current data establish Stat6 as a broadly acting regulator of lysosomal gene expression in mouse macrophages. Other regulators whose expression correlates with lysosomal genes suggest that lysosome function is frequently re-programmed during differentiation, development and interferon signaling.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Axel Nohturfft
- Division of Biomedical Sciences, Molecular and Metabolic Signaling Centre, St, George's University of London, Cranmer Terrace, London SW17 0RE, UK.
| |
Collapse
|
31
|
Abstract
Background Cell survival and development are orchestrated by complex interlocking programs of gene activation and repression. Understanding how this gene regulatory network (GRN) functions in normal states, and is altered in cancers subtypes, offers fundamental insight into oncogenesis and disease progression, and holds great promise for guiding clinical decisions. Inferring a GRN from empirical microarray gene expression data is a challenging task in cancer systems biology. In recent years, module-based approaches for GRN inference have been proposed to address this challenge. Despite the demonstrated success of module-based approaches in uncovering biologically meaningful regulatory interactions, their application remains limited a single condition, without supporting the comparison of multiple disease subtypes/conditions. Also, their use remains unnecessarily restricted to computational biologists, as accurate inference of modules and their regulators requires integration of diverse tools and heterogeneous data sources, which in turn requires scripting skills, data infrastructure and powerful computational facilities. New analytical frameworks are required to make module-based GRN inference approach more generally useful to the research community. Results We present the RMaNI (Regulatory Module Network Inference) framework, which supports cancer subtype-specific or condition specific GRN inference and differential network analysis. It combines both transcriptomic as well as genomic data sources, and integrates heterogeneous knowledge resources and a set of complementary bioinformatic methods for automated inference of modules, their condition specific regulators and facilitates downstream network analyses and data visualization. To demonstrate its utility, we applied RMaNI to a hepatocellular microarray data containing normal and three disease conditions. We demonstrate that how RMaNI can be employed to understand the genetic architecture underlying three disease conditions. RMaNI is freely available at http://inspect.braembl.org.au/bi/inspect/rmani Conclusion RMaNI makes available a workflow with comprehensive set of tools that would otherwise be challenging for non-expert users to install and apply. The framework presented in this paper is flexible and can be easily extended to analyse any dataset with multiple disease conditions.
Collapse
|
32
|
Roy S, Lagree S, Hou Z, Thomson JA, Stewart R, Gasch AP. Integrated module and gene-specific regulatory inference implicates upstream signaling networks. PLoS Comput Biol 2013; 9:e1003252. [PMID: 24146602 PMCID: PMC3798279 DOI: 10.1371/journal.pcbi.1003252] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 08/17/2013] [Indexed: 11/19/2022] Open
Abstract
Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Wisconsin Institute for Discovery, Madison, Wisconsin, United States of America
- * E-mail:
| | - Stephen Lagree
- Department of Computer Science, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Zhonggang Hou
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - James A. Thomson
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Ron Stewart
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Audrey P. Gasch
- Department of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
33
|
Zhu M, Dahmen JL, Stacey G, Cheng J. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data. BMC Bioinformatics 2013; 14:278. [PMID: 24053776 PMCID: PMC3854569 DOI: 10.1186/1471-2105-14-278] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 09/03/2013] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. RESULTS We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. CONCLUSIONS We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.
Collapse
Affiliation(s)
- Mingzhu Zhu
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
- Current address: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| | - Jeremy L Dahmen
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO, USA
- Divisions of Plant Science and Biochemistry, Columbia, MO, USA
| | - Gary Stacey
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO, USA
- Divisions of Plant Science and Biochemistry, Columbia, MO, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
- Informatics Institute, University of Missouri, Columbia, MO, USA
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
34
|
Abstract
High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | | | |
Collapse
|
35
|
Qi J, Michoel T. Context-specific transcriptional regulatory network inference from global gene expression maps using double two-way t-tests. ACTA ACUST UNITED AC 2013; 28:2325-32. [PMID: 22962443 DOI: 10.1093/bioinformatics/bts434] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Transcriptional regulatory network inference methods have been studied for years. Most of them rely on complex mathematical and algorithmic concepts, making them hard to adapt, re-implement or integrate with other methods. To address this problem, we introduce a novel method based on a minimal statistical model for observing transcriptional regulatory interactions in noisy expression data, which is conceptually simple, easy to implement and integrate in any statistical software environment and equally well performing as existing methods. RESULTS We developed a method to infer regulatory interactions based on a model where transcription factors (TFs) and their targets are both differentially expressed in a gene-specific, critical sample contrast, as measured by repeated two-way t-tests. Benchmarking on standard Escherichia coli and yeast reference datasets showed that this method performs equally well as the best existing methods. Analysis of the predicted interactions suggested that it works best to infer context-specific TF-target interactions which only co-express locally. We confirmed this hypothesis on a dataset of >1000 normal human tissue samples, where we found that our method predicts highly tissue-specific and functionally relevant interactions, whereas a global co-expression method only associates general TFs to non-specific biological processes. AVAILABILITY A software tool called TwixTrix is available from http://twixtrix.googlecode.com. SUPPLEMENTARY INFORMATION Supplementary Material is available from http://www.roslin.ed.ac.uk/tom-michoel/supplementary-data. CONTACT tom.michoel@roslin.ed.ac.uk.
Collapse
Affiliation(s)
- Jianlong Qi
- School of Life Sciences-LifeNet, Freiburg Institute for Advanced Studies, University of Freiburg, Albertstrasse 19, D-79104 Freiburg im Breisgau, Germany
| | | |
Collapse
|
36
|
Dahlin A, Tantisira KG. Integrative systems biology approaches in asthma pharmacogenomics. Pharmacogenomics 2013; 13:1387-404. [PMID: 22966888 DOI: 10.2217/pgs.12.126] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
In order to improve therapeutic outcomes, there is a tremendous need to identify patients who are likely to respond to a given asthma treatment. Pharmacogenomic studies have explained a portion of the variability in drug response and provided an increasing list of candidate genes and SNPs. However, as phenotypic variation arises from a network of complex interactions among genetic and environmental factors, rather than individual genes or SNPs, a multidisciplinary, systems-level approach is required in order to understand the inter-relationships among these factors. Systems biology, which seeks to capture interactions between genetic factors and other variables, offers a promising approach to improved therapeutic outcomes in asthma. This aritcle will review and update progress in the pharmacogenomics of asthma and then discuss the application of systems biology approaches to asthma pharmacogenomics.
Collapse
Affiliation(s)
- Amber Dahlin
- Channing Laboratory, Brigham & Women's Hospital & Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA
| | | |
Collapse
|
37
|
Roy S, Wapinski I, Pfiffner J, French C, Socha A, Konieczka J, Habib N, Kellis M, Thompson D, Regev A. Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules. Genome Res 2013; 23:1039-50. [PMID: 23640720 PMCID: PMC3668358 DOI: 10.1101/gr.146233.112] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates expression data from multiple species with species and gene phylogenies to infer modules of coexpressed genes in extant species and their evolutionary histories. We also develop new, generally applicable measures of conservation and divergence in gene regulatory modules to assess the impact of changes in gene content and expression on module evolution. We used Arboretum to study the evolution of the transcriptional response to heat shock in eight species of Ascomycota fungi and to reconstruct modules of the ancestral environmental stress response (ESR). We found substantial conservation in the stress response across species and in the reconstructed components of the ancestral ESR modules. The greatest divergence was in the most induced stress, primarily through module expansion. The divergence of the heat stress response exceeds that observed in the response to glucose depletion in the same species. Arboretum and its associated analyses provide a comprehensive framework to systematically study regulatory evolution of condition-specific responses.
Collapse
Affiliation(s)
- Sushmita Roy
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Faria JP, Overbeek R, Xia F, Rocha M, Rocha I, Henry CS. Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models. Brief Bioinform 2013; 15:592-611. [DOI: 10.1093/bib/bbs071] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
39
|
Baitaluk M, Kozhenkov S, Ponomarenko J. An integrative approach to inferring gene regulatory module networks. PLoS One 2012; 7:e52836. [PMID: 23285197 PMCID: PMC3527610 DOI: 10.1371/journal.pone.0052836] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 11/22/2012] [Indexed: 12/31/2022] Open
Abstract
Background Gene regulatory networks (GRNs) provide insight into the mechanisms of differential gene expression at a system level. However, the methods for inference, functional analysis and visualization of gene regulatory modules and GRNs require the user to collect heterogeneous data from many sources using numerous bioinformatics tools. This makes the analysis expensive and time-consuming. Results In this work, the BiologicalNetworks application–the data integration and network based research environment–was extended with tools for inference and analysis of gene regulatory modules and networks. The backend database of the application integrates public data on gene expression, pathways, transcription factor binding sites, gene and protein sequences, and functional annotations. Thus, all data essential for the gene regulation analysis can be mined publicly. In addition, the user’s data can either be integrated in the database and become public, or kept private within the application. The capabilities to analyze multiple gene expression experiments are also provided. Conclusion The generated modular networks, regulatory modules and binding sites can be visualized and further analyzed within this same application. The developed tools were applied to the mouse model of asthma and the OCT4 regulatory network in embryonic stem cells. Developed methods and data are available through the Java application from BiologicalNetworks program at http://www.biologicalnetworks.org.
Collapse
Affiliation(s)
- Michael Baitaluk
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
| | - Sergey Kozhenkov
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
| | - Julia Ponomarenko
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
40
|
Pirim H, Ekşioğlu B, Perkins A, Yüceer Ç. Clustering of High Throughput Gene Expression Data. COMPUTERS & OPERATIONS RESEARCH 2012; 39:3046-3061. [PMID: 23144527 PMCID: PMC3491664 DOI: 10.1016/j.cor.2012.03.008] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community.
Collapse
Affiliation(s)
- Harun Pirim
- Department of Industrial and Systems Engineering, Mississippi State University, P.O. Box 9542, Mississippi State, MS 39762
- Corresponding author. Tel.:+1-662-325-4226;
| | - Burak Ekşioğlu
- Department of Industrial and Systems Engineering, Mississippi State University, P.O. Box 9542, Mississippi State, MS 39762
| | - Andy Perkins
- Department of Computer Science and Engineering, Mississippi State University
| | - Çetin Yüceer
- Department of Forestry, Mississippi State University
| |
Collapse
|
41
|
Zhu M, Deng X, Joshi T, Xu D, Stacey G, Cheng J. Reconstructing differentially co-expressed gene modules and regulatory networks of soybean cells. BMC Genomics 2012; 13:437. [PMID: 22938179 PMCID: PMC3563468 DOI: 10.1186/1471-2164-13-437] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 08/22/2012] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Current experimental evidence indicates that functionally related genes show coordinated expression in order to perform their cellular functions. In this way, the cell transcriptional machinery can respond optimally to internal or external stimuli. This provides a research opportunity to identify and study co-expressed gene modules whose transcription is controlled by shared gene regulatory networks. RESULTS We developed and integrated a set of computational methods of differential gene expression analysis, gene clustering, gene network inference, gene function prediction, and DNA motif identification to automatically identify differentially co-expressed gene modules, reconstruct their regulatory networks, and validate their correctness. We tested the methods using microarray data derived from soybean cells grown under various stress conditions. Our methods were able to identify 42 coherent gene modules within which average gene expression correlation coefficients are greater than 0.8 and reconstruct their putative regulatory networks. A total of 32 modules and their regulatory networks were further validated by the coherence of predicted gene functions and the consistency of putative transcription factor binding motifs. Approximately half of the 32 modules were partially supported by the literature, which demonstrates that the bioinformatic methods used can help elucidate the molecular responses of soybean cells upon various environmental stresses. CONCLUSIONS The bioinformatics methods and genome-wide data sources for gene expression, clustering, regulation, and function analysis were integrated seamlessly into one modular protocol to systematically analyze and infer modules and networks from only differential expression genes in soybean cells grown under stress conditions. Our approach appears to effectively reduce the complexity of the problem, and is sufficiently robust and accurate to generate a rather complete and detailed view of putative soybean gene transcription logic potentially underlying the responses to the various environmental challenges. The same automated method can also be applied to reconstruct differentially co-expressed gene modules and their regulatory networks from gene expression data of any other transcriptome.
Collapse
Affiliation(s)
- Mingzhu Zhu
- Department of Computer Science, University of Missouri, Columbia, MO 65211, U.S.A
| | - Xin Deng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, U.S.A
| | - Trupti Joshi
- Department of Computer Science, University of Missouri, Columbia, MO 65211, U.S.A
- Informatics Institute, University of Missouri, Columbia, MO 65211, U.S.A
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO 65211, U.S.A
| | - Dong Xu
- Department of Computer Science, University of Missouri, Columbia, MO 65211, U.S.A
- Informatics Institute, University of Missouri, Columbia, MO 65211, U.S.A
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO 65211, U.S.A
| | - Gary Stacey
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO 65211, U.S.A
- Divisions of Plant Sciences and Biochemistry, University of Missouri, Columbia, MO 65211, U.S.A
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, U.S.A
- Informatics Institute, University of Missouri, Columbia, MO 65211, U.S.A
- C.S. Bond Life Science Center, University of Missouri, Columbia, MO 65211, U.S.A
| |
Collapse
|
42
|
miRNA-mRNA correlation-network modules in human prostate cancer and the differences between primary and metastatic tumor subtypes. PLoS One 2012; 7:e40130. [PMID: 22768240 PMCID: PMC3387006 DOI: 10.1371/journal.pone.0040130] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 06/01/2012] [Indexed: 11/19/2022] Open
Abstract
Recent studies have shown the contribution of miRNAs to cancer pathogenesis. Prostate cancer is the most commonly diagnosed cancer in men. Unlike other major types of cancer, no single gene has been identified as being mutated in the majority of prostate tumors. This implies that the expression profiling of genes, including the non-coding miRNAs, may substantially vary across individual cases of this cancer. The within-class variability makes it possible to reconstruct or infer disease-specific miRNA-mRNA correlation and regulatory modular networks using high-dimensional microarray data of prostate tumor samples. Furthermore, since miRNAs and tumor suppressor genes are usually tissue specific, miRNA-mRNA modules could potentially differ between primary prostate cancer (PPC) and metastatic prostate cancer (MPC). We herein performed an in silico analysis to explore the miRNA-mRNA correlation network modules in the two tumor subtypes. Our analysis identified 5 miRNA-mRNA module pairs (MPs) for PPC and MPC, respectively. Each MP includes one positive-connection (correlation) module and one negative-connection (correlation) module. The number of miRNAs or mRNAs (genes) in each module varies from 2 to 8 or from 6 to 622. The modules discovered for PPC are more informative than those for MPC in terms of the implicated biological insights. In particular, one negative-connection module in PPC fits well with the popularly recognized miRNA-mediated post-transcriptional regulation theory. That is, the 3′UTR sequences of the involved mRNAs (∼620) are enriched with the target site motifs of the 7 modular miRNAs, has-miR-106b, -191, -19b, -92a, -92b, -93, and -141. About 330 GO terms and KEGG pathways, including TGF-beta signaling pathway that maintains tissue homeostasis and plays a crucial role in the suppression of the proliferation of cancer cells, are over-represented (adj.p<0.05) in the modular gene list. These computationally identified modules provide remarkable biological evidence for the interference of miRNAs in the development of prostate cancers and warrant additional follow-up in independent laboratory studies.
Collapse
|
43
|
An integrative approach to infer regulation programs in a transcription regulatory module network. J Biomed Biotechnol 2012; 2012:245968. [PMID: 22577292 PMCID: PMC3336162 DOI: 10.1155/2012/245968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 02/12/2012] [Indexed: 12/29/2022] Open
Abstract
The module network method, a special type of Bayesian network algorithms, has been proposed to infer transcription regulatory networks from gene expression data. In this method, a module represents a set of genes, which have similar expression profiles and are regulated by same transcription factors. The process of learning module networks consists of two steps: first clustering genes into modules and then inferring the regulation program (transcription factors) of each module. Many algorithms have been designed to infer the regulation program of a given gene module, and these algorithms show very different biases in detecting regulatory relationships. In this work, we explore the possibility of integrating results from different algorithms. The integration methods we select are union, intersection, and weighted rank aggregation. Experiments in a yeast dataset show that the union and weighted rank aggregation methods produce more accurate predictions than those given by individual algorithms, whereas the intersection method does not yield any improvement in the accuracy of predictions. In addition, somewhat surprisingly, the union method, which has a lower computational cost than rank aggregation, achieves comparable results as given by rank aggregation.
Collapse
|
44
|
Kuo TY, Hsi E, Yang IP, Tsai PC, Wang JY, Juo SHH. Computational analysis of mRNA expression profiles identifies microRNA-29a/c as predictor of colorectal cancer early recurrence. PLoS One 2012; 7:e31587. [PMID: 22348113 PMCID: PMC3278467 DOI: 10.1371/journal.pone.0031587] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2011] [Accepted: 01/09/2012] [Indexed: 12/19/2022] Open
Abstract
Colorectal cancer (CRC) is one of the leading malignant cancers with a rapid increase in incidence and mortality. The recurrences of CRC after curative resection are sometimes unavoidable and often take place within the first year after surgery. MicroRNAs may serve as biomarkers to predict early recurrence of CRC, but identifying them from over 1,400 known human microRNAs is challenging and costly. An alternative approach is to analyze existing expression data of messenger RNAs (mRNAs) because generally speaking the expression levels of microRNAs and their target mRNAs are inversely correlated. In this study, we extracted six mRNA expression data of CRC in four studies (GSE12032, GSE17538, GSE4526 and GSE17181) from the gene expression omnibus (GEO). We inferred microRNA expression profiles and performed computational analysis to identify microRNAs associated with CRC recurrence using the IMRE method based on the MicroCosm database that includes 568,071 microRNA-target connections between 711 microRNAs and 20,884 gene targets. Two microRNAs, miR-29a and miR-29c, were disclosed and further meta-analysis of the six mRNA expression datasets showed that these two microRNAs were highly significant based on the Fisher p-value combination (p = 9.14 × 10(-9) for miR-29a and p = 1.14 × 10(-6) for miR-29c). Furthermore, these two microRNAs were experimentally tested in 78 human CRC samples to validate their effect on early recurrence. Our empirical results showed that the two microRNAs were significantly down-regulated (p = 0.007 for miR-29a and p = 0.007 for miR-29c) in the early-recurrence patients. This study shows the feasibility of using mRNA profiles to indicate microRNAs. We also shows miR-29a/c could be potential biomarkers for CRC early recurrence.
Collapse
Affiliation(s)
- Tai-Yue Kuo
- Department of Medical Genetics, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Edward Hsi
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - I-Ping Yang
- Department of Medical Genetics, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Nursing, Shu Zen College of Medicine and Management, Kaohsiung, Taiwan
| | - Pei-Chien Tsai
- Department of Medical Genetics, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Jaw-Yuan Wang
- Department of Medical Genetics, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
- Department of Surgery, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Suh-Hang Hank Juo
- Department of Medical Genetics, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Cancer Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| |
Collapse
|
45
|
Bazil JN, Qi F, Beard DA. A parallel algorithm for reverse engineering of biological networks. Integr Biol (Camb) 2011; 3:1215-23. [PMID: 22080176 PMCID: PMC3424073 DOI: 10.1039/c1ib00117e] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Dynamic biological systems, such as gene regulatory networks (GRNs) and protein signaling networks, are often represented as systems of ordinary differential equations. Such equations can be utilized in reverse engineering these biological networks, specifically since identifying these networks is challenging due to the cost of the necessary experiments growing with at least the square of the size of the system. Moreover, the number of possible models, proportional to the number of directed graphs connecting nodes representing the variables in the system, suffers from combinatorial explosion as the size of the system grows. Therefore, exhaustive searches for systems of nontrivial complexity are not feasible. Here we describe a practical and scalable algorithm for determining candidate network interactions based on decomposing an N-dimensional system into N one-dimensional problems. The algorithm was tested on in silico networks based on known biological GRNs. The computational complexity of the network identification is shown to increase as N(2) while a parallel implementation achieves essentially linear speedup with the increasing number of processing cores. For each in silico network tested, the algorithm successfully predicts a candidate network that reproduces the network dynamics. This approach dramatically reduces the computational demand required for reverse engineering GRNs and produces a wealth of exploitable information in the process. Moreover, the candidate network topologies returned by the algorithm can be used to design future experiments aimed at gathering informative data capable of further resolving the true network topology.
Collapse
Affiliation(s)
- Jason N Bazil
- Medical College of Wisconsin, 8701 Watertown Plank Rd., Milwaukee, USA.
| | | | | |
Collapse
|
46
|
Kim M, Shin H, Su Chung T, Joung JG, Kim JH. Extracting regulatory modules from gene expression data by sequential pattern mining. BMC Genomics 2011; 12 Suppl 3:S5. [PMID: 22369275 PMCID: PMC3333188 DOI: 10.1186/1471-2164-12-s3-s5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray gene-expression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. However, previous sequential pattern mining-based biclustering has several weak points in that they can easily be computationally intractable in the real-size of microarray data and sensitive to inherent noise in the expression value. Results In this paper, we propose a novel sequential pattern mining algorithm that is scalable in the size of microarray data and robust with respect to noise. When applied to the microarray data of yeast, the proposed algorithm successfully found long order-preserving patterns, which are biologically significant but cannot be found in randomly shuffled data. The resulting patterns are well enriched to known annotations and are consistent with known biological knowledge. Furthermore, RMs as well as inter-module relations were inferred from the biologically significant patterns. Conclusions Our approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level.
Collapse
Affiliation(s)
- Mingoo Kim
- Seoul National University Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
| | | | | | | | | |
Collapse
|
47
|
Joshi A, Van de Peer Y, Michoel T. Structural and functional organization of RNA regulons in the post-transcriptional regulatory network of yeast. Nucleic Acids Res 2011; 39:9108-17. [PMID: 21840901 PMCID: PMC3241661 DOI: 10.1093/nar/gkr661] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Post-transcriptional control of mRNA transcript processing by RNA binding proteins (RBPs) is an important step in the regulation of gene expression and protein production. The post-transcriptional regulatory network is similar in complexity to the transcriptional regulatory network and is thought to be organized in RNA regulons, coherent sets of functionally related mRNAs combinatorially regulated by common RBPs. We integrated genome-wide transcriptional and translational expression data in yeast with large-scale regulatory networks of transcription factor and RBP binding interactions to analyze the functional organization of post-transcriptional regulation and RNA regulons at a system level. We found that post-transcriptional feedback loops and mixed bifan motifs are overrepresented in the integrated regulatory network and control the coordinated translation of RNA regulons, manifested as clusters of functionally related mRNAs which are strongly coexpressed in the translatome data. These translatome clusters are more functionally coherent than transcriptome clusters and are expressed with higher mRNA and protein levels and less noise. Our results show how the post-transcriptional network is intertwined with the transcriptional network to regulate gene expression in a coordinated way and that the integration of heterogeneous genome-wide datasets allows to relate structure to function in regulatory networks at a system level.
Collapse
Affiliation(s)
- Anagha Joshi
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building Hills Road, Cambridge CB2 0XY, UK
| | | | | |
Collapse
|
48
|
Bauer T, Eils R, König R. RIP: the regulatory interaction predictor--a machine learning-based approach for predicting target genes of transcription factors. ACTA ACUST UNITED AC 2011; 27:2239-47. [PMID: 21690103 DOI: 10.1093/bioinformatics/btr366] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Understanding transcriptional gene regulation is essential for studying cellular systems. Identifying genome-wide targets of transcription factors (TFs) provides the basis to discover the involvement of TFs and TF cooperativeness in cellular systems and pathogenesis. RESULTS We present the regulatory interaction predictor (RIP), a machine learning approach that inferred 73 923 regulatory interactions (RIs) for 301 human TFs and 11 263 target genes with considerably good quality and 4516 RIs with very high quality. The inference of RIs is independent of any specific condition. Our approach employs support vector machines (SVMs) trained on a set of experimentally proven RIs from a public repository (TRANSFAC). Features of RIs for the learning process are based on a correlation meta-analysis of 4064 gene expression profiles from 76 studies, in silico predictions of transcription factor binding sites (TFBSs) and combinations of these employing knowledge about co-regulation of genes by a common TF (TF-module). The trained SVMs were applied to infer new RIs for a large set of TFs and genes. In a case study, we employed the inferred RIs to analyze an independent microarray dataset. We identified key TFs regulating the transcriptional response upon interferon alpha stimulation of monocytes, most prominently interferon-stimulated gene factor 3 (ISGF3). Furthermore, predicted TF-modules were highly associated to their functionally related pathways. CONCLUSION Descriptors of gene expression, TFBS predictions, experimentally verified binding information and statistical combination of this enabled inferring RIs on a genome-wide scale for human genes with considerably good precision serving as a good basis for expression profiling studies. CONTACT r.koenig@dkfz.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tobias Bauer
- Department of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), INF 280, 69120 Heidelberg, Germany
| | | | | |
Collapse
|
49
|
Nguyen TT, Foteinou PT, Calvano SE, Lowry SF, Androulakis IP. Computational identification of transcriptional regulators in human endotoxemia. PLoS One 2011; 6:e18889. [PMID: 21637747 PMCID: PMC3103499 DOI: 10.1371/journal.pone.0018889] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2010] [Accepted: 03/23/2011] [Indexed: 12/21/2022] Open
Abstract
One of the great challenges in the post-genomic era is to decipher the underlying principles governing the dynamics of biological responses. As modulating gene expression levels is among the key regulatory responses of an organism to changes in its environment, identifying biologically relevant transcriptional regulators and their putative regulatory interactions with target genes is an essential step towards studying the complex dynamics of transcriptional regulation. We present an analysis that integrates various computational and biological aspects to explore the transcriptional regulation of systemic inflammatory responses through a human endotoxemia model. Given a high-dimensional transcriptional profiling dataset from human blood leukocytes, an elementary set of temporal dynamic responses which capture the essence of a pro-inflammatory phase, a counter-regulatory response and a dysregulation in leukocyte bioenergetics has been extracted. Upon identification of these expression patterns, fourteen inflammation-specific gene batteries that represent groups of hypothetically ‘coregulated’ genes are proposed. Subsequently, statistically significant cis-regulatory modules (CRMs) are identified and decomposed into a list of critical transcription factors (34) that are validated largely on primary literature. Finally, our analysis further allows for the construction of a dynamic representation of the temporal transcriptional regulatory program across the host, deciphering possible combinatorial interactions among factors under which they might be active. Although much remains to be explored, this study has computationally identified key transcription factors and proposed a putative time-dependent transcriptional regulatory program associated with critical transcriptional inflammatory responses. These results provide a solid foundation for future investigations to elucidate the underlying transcriptional regulatory mechanisms under the host inflammatory response. Also, the assumption that coexpressed genes that are functionally relevant are more likely to share some common transcriptional regulatory mechanism seems to be promising, making the proposed framework become essential in unravelling context-specific transcriptional regulatory interactions underlying diverse mammalian biological processes.
Collapse
Affiliation(s)
- Tung T. Nguyen
- BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Panagiota T. Foteinou
- Department of Biomedical Engineering, Rutgers University, Piscataway, New Jersey, United States of America
| | - Steven E. Calvano
- Department of Surgery, Robert Wood Johnson Medical School, University of Medicine and Dentistry, New Jersey, New Brunswick, New Jersey, United States of America
| | - Stephen F. Lowry
- Department of Surgery, Robert Wood Johnson Medical School, University of Medicine and Dentistry, New Jersey, New Brunswick, New Jersey, United States of America
| | - Ioannis P. Androulakis
- Department of Biomedical Engineering, Rutgers University, Piscataway, New Jersey, United States of America
- Department of Surgery, Robert Wood Johnson Medical School, University of Medicine and Dentistry, New Jersey, New Brunswick, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
50
|
Gu Q, Nagaraj SH, Hudson NJ, Dalrymple BP, Reverter A. Genome-wide patterns of promoter sharing and co-expression in bovine skeletal muscle. BMC Genomics 2011; 12:23. [PMID: 21226902 PMCID: PMC3025955 DOI: 10.1186/1471-2164-12-23] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 01/12/2011] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Gene regulation by transcription factors (TF) is species, tissue and time specific. To better understand how the genetic code controls gene expression in bovine muscle we associated gene expression data from developing Longissimus thoracis et lumborum skeletal muscle with bovine promoter sequence information. RESULTS We created a highly conserved genome-wide promoter landscape comprising 87,408 interactions relating 333 TFs with their 9,242 predicted target genes (TGs). We discovered that the complete set of predicted TGs share an average of 2.75 predicted TF binding sites (TFBSs) and that the average co-expression between a TF and its predicted TGs is higher than the average co-expression between the same TF and all genes. Conversely, pairs of TFs sharing predicted TGs showed a co-expression correlation higher that pairs of TFs not sharing TGs. Finally, we exploited the co-occurrence of predicted TFBS in the context of muscle-derived functionally-coherent modules including cell cycle, mitochondria, immune system, fat metabolism, muscle/glycolysis, and ribosome. Our findings enabled us to reverse engineer a regulatory network of core processes, and correctly identified the involvement of E2F1, GATA2 and NFKB1 in the regulation of cell cycle, fat, and muscle/glycolysis, respectively. CONCLUSION The pivotal implication of our research is two-fold: (1) there exists a robust genome-wide expression signal between TFs and their predicted TGs in cattle muscle consistent with the extent of promoter sharing; and (2) this signal can be exploited to recover the cellular mechanisms underpinning transcription regulation of muscle structure and development in bovine. Our study represents the first genome-wide report linking tissue specific co-expression to co-regulation in a non-model vertebrate.
Collapse
Affiliation(s)
- Quan Gu
- Computational and Systems Biology, CSIRO Food Futures Flagship and CSIRO Livestock Industries, 306 Carmody Rd, St. Lucia, Brisbane, Queensland 4067, Australia
| | | | | | | | | |
Collapse
|