1
|
Wang D, Yan KK, Sisu C, Cheng C, Rozowsky J, Meyerson W, Gerstein MB. Loregic: a method to characterize the cooperative logic of regulatory factors. PLoS Comput Biol 2015; 11:e1004132. [PMID: 25884877 PMCID: PMC4401777 DOI: 10.1371/journal.pcbi.1004132] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 01/12/2015] [Indexed: 12/24/2022] Open
Abstract
The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet’s observed gene expression pattern across many conditions. We make Loregic available as a general-purpose tool (github.com/gersteinlab/loregic). We validate it with known yeast transcription-factor knockout experiments. Next, using human ENCODE ChIP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs. Furthermore, we show that MYC, a well-known oncogenic driving TF, can be modeled as acting independently from other TFs (e.g., using OR gates) but antagonistically with repressing miRNAs. Finally, we inter-relate Loregic’s gate logic with other aspects of regulation, such as indirect binding via protein-protein interactions, feed-forward loop motifs and global regulatory hierarchy. Gene expression is controlled by various gene regulatory factors. Those factors work cooperatively forming a complex regulatory circuit genome wide. Corruptions of regulatory cooperativity may lead to abnormal gene expression activity such as cancer. Traditional experimental methods, however, can only identify small-scale regulatory activity. Thus, to systematically understand the cooperativity between and among different types of regulatory factors, we need the efficient and systematic computational methods. Regulatory circuits have been suggested to behave analogously to the electronic circuits in which a wide variety of electronic elements work coordinately to function correctly. Recently, an increasing amount of next generation sequencing data provides a great resource to study regulatory activity. Thus, we developed a general-purpose computational method using logic-circuit models from electronics and applied it to a human leukemia dataset, identifying the genome-wide cooperativity of transcription factors and microRNAs.
Collapse
Affiliation(s)
- Daifeng Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Koon-Kiu Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Cristina Sisu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
| | - Joel Rozowsky
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - William Meyerson
- School of Medicine, Yale University, New Haven, Connecticut, United States of America
| | - Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
2
|
Wen J, Chen Z, Cai X. A biophysical model for identifying splicing regulatory elements and their interactions. PLoS One 2013; 8:e54885. [PMID: 23382993 PMCID: PMC3559881 DOI: 10.1371/journal.pone.0054885] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 12/17/2012] [Indexed: 11/18/2022] Open
Abstract
Alternative splicing (AS) of precursor mRNA (pre-mRNA) is a crucial step in the expression of most eukaryotic genes. Splicing factors (SFs) play an important role in AS regulation by binding to the cis-regulatory elements on the pre-mRNA. Although many splicing factors (SFs) and their binding sites have been identified, their combinatorial regulatory effects remain to be elucidated. In this paper, we derive a biophysical model for AS regulation that integrates combinatorial signals of cis-acting splicing regulatory elements (SREs) and their interactions. We also develop a systematic framework for model inference. Applying the biophysical model to a human RNA-Seq data set, we demonstrate that our model can explain 49.1%–66.5% variance of the data, which is comparable to the best result achieved by biophysical models for transcription. In total, we identified 119 SRE pairs between different regions of cassette exons that may regulate exon or intron definition in splicing, and 77 SRE pairs from the same region that may arise from a long motif or two different SREs bound by different SFs. Particularly, putative binding sites of polypyrimidine tract-binding protein (PTB), heterogeneous nuclear ribonucleoprotein (hnRNP) F/H and E/K are identified as interacting SRE pairs, and have been shown to be consistent with the interaction models proposed in previous experimental results. These results show that our biophysical model and inference method provide a means of quantitative modeling of splicing regulation and is a useful tool for identifying SREs and their interactions. The software package for model inference is available under an open source license.
Collapse
Affiliation(s)
- Ji Wen
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
| | - Zhibin Chen
- Department of Microbiology and Immunology, University of Miami, Miami, Florida, United States of America
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
- * E-mail:
| |
Collapse
|
3
|
Cornish JP, Matthews F, Thomas JR, Erill I. Inference of self-regulated transcriptional networks by comparative genomics. Evol Bioinform Online 2012; 8:449-61. [PMID: 23032607 PMCID: PMC3422134 DOI: 10.4137/ebo.s9205] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The assumption of basic properties, like self-regulation, in simple transcriptional regulatory networks can be exploited to infer regulatory motifs from the growing amounts of genomic and meta-genomic data. These motifs can in principle be used to elucidate the nature and scope of transcriptional networks through comparative genomics. Here we assess the feasibility of this approach using the SOS regulatory network of Gram-positive bacteria as a test case. Using experimentally validated data, we show that the known regulatory motif can be inferred through the assumption of self-regulation. Furthermore, the inferred motif provides a more robust search pattern for comparative genomics than the experimental motifs defined in reference organisms. We take advantage of this robustness to generate a functional map of the SOS response in Gram-positive bacteria. Our results reveal definite differences in the composition of the LexA regulon between Firmicutes and Actinobacteria, and confirm that regulation of cell-division inhibition is a widespread characteristic of this network among Gram-positive bacteria.
Collapse
Affiliation(s)
- Joseph P Cornish
- Department of Biological Sciences, University of Maryland Baltimore County
| | | | | | | |
Collapse
|
4
|
Wu M, Chan C. Learning transcriptional regulation on a genome scale: a theoretical analysis based on gene expression data. Brief Bioinform 2012; 13:150-61. [PMID: 21622543 PMCID: PMC3294238 DOI: 10.1093/bib/bbr029] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 04/23/2011] [Indexed: 12/17/2022] Open
Abstract
The recent advent of high-throughput microarray data has enabled the global analysis of the transcriptome, driving the development and application of computational approaches to study transcriptional regulation on the genome scale, by reconstructing in silico the regulatory interactions of the gene network. Although there are many in-depth reviews of such 'reverse-engineering' methodologies, most have focused on the practical aspect of data mining, and few on the biological problem and the biological relevance of the methodology. Therefore, in this review, from a biological perspective, we used a set of yeast microarray data as a working example, to evaluate the fundamental assumptions implicit in associating transcription factor (TF)-target gene expression levels and estimating TFs' activity, and further explore cooperative models. Finally we confirm that the detailed transcription mechanism is overly-complex for expression data alone to reveal, nevertheless, future network reconstruction studies could benefit from the incorporation of context-specific information, the modeling of multiple layers of regulation (e.g. micro-RNA), or the development of approaches for context-dependent analysis, to uncover the mechanisms of gene regulation.
Collapse
Affiliation(s)
- Ming Wu
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | | |
Collapse
|
5
|
Morita M, Nakamura M, Hamada M, Takahashi S. Combinatorial motif analysis of regulatory gene expression in Mafb deficient macrophages. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 2:S7. [PMID: 22784578 PMCID: PMC3287487 DOI: 10.1186/1752-0509-5-s2-s7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background Deficiency of the transcription factor MafB, which is normally expressed in macrophages, can underlie cellular dysfunction associated with a range of autoimmune diseases and arteriosclerosis. MafB has important roles in cell differentiation and regulation of target gene expression; however, the mechanisms of this regulation and the identities of other transcription factors with which MafB interacts remain uncertain. Bioinformatics methods provide a valuable approach for elucidating the nature of these interactions with transcriptional regulatory elements from a large number of DNA sequences. In particular, identification of patterns of co-occurrence of regulatory cis-elements (motifs) offers a robust approach. Results Here, the directional relationships among several functional motifs were evaluated using the Log-linear Graphical Model (LGM) after extraction and search for evolutionarily conserved motifs. This analysis highlighted GATA-1 motifs and 5’AT-rich half Maf recognition elements (MAREs) in promoter regions of 18 genes that were down-regulated in Mafb deficient macrophages. GATA-1 motifs and MafB motifs could regulate expression of these genes in both a negative and positive manner, respectively. The validity of this conclusion was tested with data from a luciferase assay that used a C1qa promoter construct carrying both the GATA-1 motifs and MAREs. GATA-1 was found to inhibit the activity of the C1qa promoter with the GATA-1 motifs and MafB motifs. Conclusions These observations suggest that both the GATA-1 motifs and MafB motifs are important for lineage specific expression of C1qa. In addition, these findings show that analysis of combinations of evolutionarily conserved motifs can be successfully used to identify patterns of gene regulation.
Collapse
Affiliation(s)
- Mariko Morita
- Department of Anatomy and Embryology, Institute of Basic Medical Sciences, Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1, Tennodai, Tsukuba, 305-8575, Ibaraki, Japan.
| | | | | | | |
Collapse
|
6
|
Irie T, Park SJ, Yamashita R, Seki M, Yada T, Sugano S, Nakai K, Suzuki Y. Predicting promoter activities of primary human DNA sequences. Nucleic Acids Res 2011; 39:e75. [PMID: 21486745 PMCID: PMC3113590 DOI: 10.1093/nar/gkr173] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.
Collapse
Affiliation(s)
- Takuma Irie
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8562, Japan
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Street NR, Jansson S, Hvidsten TR. A systems biology model of the regulatory network in Populus leaves reveals interacting regulators and conserved regulation. BMC PLANT BIOLOGY 2011; 11:13. [PMID: 21232107 PMCID: PMC3030533 DOI: 10.1186/1471-2229-11-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 01/13/2011] [Indexed: 05/23/2023]
Abstract
BACKGROUND Green plant leaves have always fascinated biologists as hosts for photosynthesis and providers of basic energy to many food webs. Today, comprehensive databases of gene expression data enable us to apply increasingly more advanced computational methods for reverse-engineering the regulatory network of leaves, and to begin to understand the gene interactions underlying complex emergent properties related to stress-response and development. These new systems biology methods are now also being applied to organisms such as Populus, a woody perennial tree, in order to understand the specific characteristics of these species. RESULTS We present a systems biology model of the regulatory network of Populus leaves. The network is reverse-engineered from promoter information and expression profiles of leaf-specific genes measured over a large set of conditions related to stress and developmental. The network model incorporates interactions between regulators, such as synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms, and is therefore able to identify new regulators of leaf development not found by traditional genomics methods based on pair-wise expression similarity. The approach is shown to explain available gene function information and to provide robust prediction of expression levels in new data. We also use the predictive capability of the model to identify condition-specific regulation as well as conserved regulation between Populus and Arabidopsis. CONCLUSIONS We outline a computationally inferred model of the regulatory network of Populus leaves, and show how treating genes as interacting, rather than individual, entities identifies new regulators compared to traditional genomics analysis. Although systems biology models should be used with care considering the complexity of regulatory programs and the limitations of current genomics data, methods describing interactions can provide hypotheses about the underlying cause of emergent properties and are needed if we are to identify target genes other than those constituting the "low hanging fruit" of genomic analysis.
Collapse
Affiliation(s)
- Nathaniel Robert Street
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, 901 87 Umeå, Sweden
| | - Stefan Jansson
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, 901 87 Umeå, Sweden
| | - Torgeir R Hvidsten
- Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, 901 87 Umeå, Sweden
- Computational Life Science Cluster (CLiC), Umeå University, 901 87 Umeå, Sweden
| |
Collapse
|
8
|
Pessiot JF, Chiba H, Hyakkoku H, Taniguchi T, Fujibuchi W. PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs. PLoS One 2010; 5:e11881. [PMID: 20806061 PMCID: PMC2929187 DOI: 10.1371/journal.pone.0011881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 06/07/2010] [Indexed: 01/20/2023] Open
Abstract
How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.
Collapse
Affiliation(s)
- Jean-François Pessiot
- Computational Biology Research Center, Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Hirokazu Chiba
- Computational Biology Research Center, Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Hiroto Hyakkoku
- Computational Biology Research Center, Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- Waseda University, Tokyo, Japan
| | | | - Wataru Fujibuchi
- Computational Biology Research Center, Advanced Industrial Science and Technology (AIST), Tokyo, Japan
- * E-mail:
| |
Collapse
|
9
|
Li X, Panea C, Wiggins CH, Reinke V, Leslie C. Learning "graph-mer" motifs that predict gene expression trajectories in development. PLoS Comput Biol 2010; 6:e1000761. [PMID: 20454681 PMCID: PMC2861633 DOI: 10.1371/journal.pcbi.1000761] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2009] [Accepted: 03/24/2010] [Indexed: 12/19/2022] Open
Abstract
A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns--represented by graphs of k-mers, or "graph-mers"--that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.
Collapse
Affiliation(s)
- Xuejing Li
- Department of Physics, Columbia University, New York, New York, United States of America
| | - Casandra Panea
- Department of Genetics, Yale University, New Haven, Connecticut, United States of America
| | - Chris H. Wiggins
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York, United States of America
| | - Valerie Reinke
- Department of Genetics, Yale University, New Haven, Connecticut, United States of America
| | - Christina Leslie
- Computational Biology Program, Sloan-Kettering Institute, New York, New York, United States of America
| |
Collapse
|
10
|
Xiao Y, Segal MR. Identification of yeast transcriptional regulation networks using multivariate random forests. PLoS Comput Biol 2009; 5:e1000414. [PMID: 19543377 PMCID: PMC2691601 DOI: 10.1371/journal.pcbi.1000414] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2008] [Accepted: 05/12/2009] [Indexed: 02/02/2023] Open
Abstract
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.
Collapse
Affiliation(s)
- Yuanyuan Xiao
- Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA.
| | | |
Collapse
|