1
|
Toneyan S, Koo PK. Interpreting Cis-Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.03.547592. [PMID: 37461616 PMCID: PMC10349992 DOI: 10.1101/2023.07.03.547592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we introduce CREME, an in silico perturbation toolkit that interrogates large-scale DNNs to uncover rules of gene regulation that it learns. Using CREME, we investigate Enformer, a prominent DNN in gene expression prediction, revealing cis-regulatory elements (CREs) that directly enhance or silence target genes. We explore the intricate complexity of higher-order CRE interactions, the relationship between CRE distance from transcription start sites on gene expression, as well as the biochemical features of enhancers and silencers learned by Enformer. Moreover, we demonstrate the flexibility of CREME to efficiently uncover a higher-resolution view of functional sequence elements within CREs. This work demonstrates how CREME can be employed to translate the powerful predictions of large-scale DNNs to study open questions in gene regulation.
Collapse
|
2
|
Kaucka M. Cis-regulatory landscapes in the evolution and development of the mammalian skull. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220079. [PMID: 37183897 PMCID: PMC10184250 DOI: 10.1098/rstb.2022.0079] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
Extensive morphological variation found in mammals reflects the wide spectrum of their ecological adaptations. The highest morphological diversity is present in the craniofacial region, where geometry is mainly dictated by the bony skull. Mammalian craniofacial development represents complex multistep processes governed by numerous conserved genes that require precise spatio-temporal control. A central question in contemporary evolutionary biology is how a defined set of conserved genes can orchestrate formation of fundamentally different structures, and therefore how morphological variability arises. In principle, differential gene expression patterns during development are the source of morphological variation. With the emergence of multicellular organisms, precise regulation of gene expression in time and space is attributed to cis-regulatory elements. These elements contribute to higher-order chromatin structure and together with trans-acting factors control transcriptional landscapes that underlie intricate morphogenetic processes. Consequently, divergence in cis-regulation is believed to rewire existing gene regulatory networks and form the core of morphological evolution. This review outlines the fundamental principles of the genetic code and genomic regulation interplay during development. Recent work that deepened our comprehension of cis-regulatory element origin, divergence and function is presented here to illustrate the state-of-the-art research that uncovered the principles of morphological novelty. This article is part of the theme issue 'The mammalian skull: development, structure and function'.
Collapse
Affiliation(s)
- Marketa Kaucka
- Max Planck Institute for Evolutionary Biology, Plön 24306, Germany
| |
Collapse
|
3
|
Kim YJ, Rhee K, Liu J, Jeammet S, Turner MA, Small SJ, Garcia HG. Predictive modeling reveals that higher-order cooperativity drives transcriptional repression in a synthetic developmental enhancer. eLife 2022; 11:73395. [PMID: 36503705 PMCID: PMC9836395 DOI: 10.7554/elife.73395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 12/09/2022] [Indexed: 12/14/2022] Open
Abstract
A challenge in quantitative biology is to predict output patterns of gene expression from knowledge of input transcription factor patterns and from the arrangement of binding sites for these transcription factors on regulatory DNA. We tested whether widespread thermodynamic models could be used to infer parameters describing simple regulatory architectures that inform parameter-free predictions of more complex enhancers in the context of transcriptional repression by Runt in the early fruit fly embryo. By modulating the number and placement of Runt binding sites within an enhancer, and quantifying the resulting transcriptional activity using live imaging, we discovered that thermodynamic models call for higher-order cooperativity between multiple molecular players. This higher-order cooperativity captures the combinatorial complexity underlying eukaryotic transcriptional regulation and cannot be determined from simpler regulatory architectures, highlighting the challenges in reaching a predictive understanding of transcriptional regulation in eukaryotes and calling for approaches that quantitatively dissect their molecular nature.
Collapse
Affiliation(s)
- Yang Joon Kim
- Chan Zuckerberg Biohub, San Francisco, United States
| | - Kaitlin Rhee
- Department of Chemical Biology, University of California, Berkeley, Berkeley, United States
| | - Jonathan Liu
- Department of Physics, University of California, Berkeley, Berkeley, United States
| | - Selene Jeammet
- Department of Biology, Ecole Polytechnique, Paris, France
| | - Meghan A Turner
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, United States
| | - Stephen J Small
- Department of Biology, New York University, New York, United States
| | - Hernan G Garcia
- Chan Zuckerberg Biohub, San Francisco, United States.,Department of Physics, University of California, Berkeley, Berkeley, United States.,Biophysics Graduate Group, University of California, Berkeley, Berkeley, United States.,Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, United States.,Institute for Quantitative Biosciences-QB3, University of California at Berkeley, Berkeley, United States
| |
Collapse
|
4
|
Gaiewski MJ, Drewell RA, Dresch JM. Fitting thermodynamic-based models: Incorporating parameter sensitivity improves the performance of an evolutionary algorithm. Math Biosci 2021; 342:108716. [PMID: 34687735 DOI: 10.1016/j.mbs.2021.108716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 09/10/2021] [Accepted: 09/17/2021] [Indexed: 11/30/2022]
Abstract
A detailed comprehension of transcriptional regulation is critical to understanding the genetic control of development and disease across many different organisms. To more fully investigate the complex molecular interactions controlling the precise expression of genes, many groups have constructed mathematical models to complement their experimental approaches. A critical step in such studies is choosing the most appropriate parameter estimation algorithm to enable detailed analysis of the parameters that contribute to the models. In this study, we develop a novel set of evolutionary algorithms that use a pseudo-random Sobol Set to construct the initial population and incorporate parameter sensitivities into the adaptation of mutation rates, using local, global, and hybrid strategies. Comparison of the performance of these new algorithms to a number of current state-of-the-art global parameter estimation algorithms on a range of continuous test functions, as well as synthetic biological data representing models of gene regulatory systems, reveals improved performance of the new algorithms in terms of runtime, error and reproducibility. In addition, by analyzing the ability of these algorithms to fit datasets of varying quality, we provide the experimentalist with a guide to how the algorithms perform across a range of noisy data. These results demonstrate the improved performance of the new set of parameter estimation algorithms and facilitate meaningful integration of model parameters and predictions in our understanding of the molecular mechanisms of gene regulation.
Collapse
Affiliation(s)
- Michael J Gaiewski
- Department of Mathematics and Computer Science, Clark University, Worcester, MA, USA; Department of Mathematics, University of Connecticut, Storrs, CT, USA.
| | | | | |
Collapse
|
5
|
Dibaeinia P, Sinha S. Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks. Nucleic Acids Res 2021; 49:10309-10327. [PMID: 34508359 PMCID: PMC8501998 DOI: 10.1093/nar/gkab765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/18/2021] [Accepted: 08/25/2021] [Indexed: 11/18/2022] Open
Abstract
Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer ‘grammar’ in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, such as different cell types and experimental conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference. CoNSEPT is freely available at: https://github.com/PayamDiba/CoNSEPT.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
6
|
A Mutation in the Drosophila melanogaster eve Stripe 2 Minimal Enhancer Is Buffered by Flanking Sequences. G3 (BETHESDA, MD.) 2020; 10:4473-4482. [PMID: 33037064 PMCID: PMC7718739 DOI: 10.1534/g3.120.401777] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Enhancers are DNA sequences composed of transcription factor binding sites that drive complex patterns of gene expression in space and time. Until recently, studying enhancers in their genomic context was technically challenging. Therefore, minimal enhancers, the shortest pieces of DNA that can drive an expression pattern that resembles a gene's endogenous pattern, are often used to study features of enhancer function. However, evidence suggests that some enhancers require sequences outside the minimal enhancer to maintain function under environmental perturbations. We hypothesized that these additional sequences also prevent misexpression caused by a transcription factor binding site mutation within a minimal enhancer. Using the Drosophila melanogastereven-skipped stripe 2 enhancer as a case study, we tested the effect of a Giant binding site mutation (gt-2) on the expression patterns driven by minimal and extended enhancer reporter constructs. We found that, in contrast to the misexpression caused by the gt-2 binding site deletion in the minimal enhancer, the same gt-2 binding site deletion in the extended enhancer did not have an effect on expression. The buffering of expression levels, but not expression pattern, is partially explained by an additional Giant binding site outside the minimal enhancer. Deleting the gt-2 binding site in the endogenous locus had no significant effect on stripe 2 expression. Our results indicate that rules derived from mutating enhancer reporter constructs may not represent what occurs in the endogenous context.
Collapse
|
7
|
Zeitlinger J. Seven myths of how transcription factors read the cis-regulatory code. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 23:22-31. [PMID: 33134611 PMCID: PMC7592701 DOI: 10.1016/j.coisb.2020.08.002] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genomics data are now being generated at large quantities, of exquisite high resolution and from single cells. They offer a unique opportunity to develop powerful machine learning algorithms, including neural networks, to uncover the rules of the cis-regulatory code. However, current modeling assumptions are often not based on state-of-the-art knowledge of the cis-regulatory code from transcription, developmental genetics, imaging and structural studies. Here I aim to fill this gap by giving a brief historical overview of the field, describing common misconceptions and providing knowledge that might help to guide computational approaches. I will describe the principles and mechanisms involved in the combinatorial requirement of transcription factor binding motifs for enhancer activity, including the role of chromatin accessibility, repressors and low-affinity motifs in the cis-regulatory code. Deciphering the cis-regulatory code would unlock an enormous amount of regulatory information in the genome and would allow us to locate cis-regulatory genetic variants involved in development and disease.
Collapse
Affiliation(s)
- Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO, USA
- The University of Kansas Medical Center, Kansas City, KS, USA
| |
Collapse
|
8
|
Bozek M, Gompel N. Developmental Transcriptional Enhancers: A Subtle Interplay between Accessibility and Activity: Considering Quantitative Accessibility Changes between Different Regulatory States of an Enhancer Deconvolutes the Complex Relationship between Accessibility and Activity. Bioessays 2020; 42:e1900188. [PMID: 32142172 DOI: 10.1002/bies.201900188] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 01/16/2020] [Indexed: 12/21/2022]
Abstract
Measurements of open chromatin in specific cell types are widely used to infer the spatiotemporal activity of transcriptional enhancers. How reliable are these predictions? In this review, it is argued that the relationship between the accessibility and activity of an enhancer is insufficiently described by simply considering open versus closed chromatin, or active versus inactive enhancers. Instead, recent studies focusing on the quantitative nature of accessibility signal reveal subtle differences between active enhancers and their different inactive counterparts: the closed silenced state and the accessible primed and repressed states. While the open structure as such is not a specific indicator of enhancer activity, active enhancers display a higher degree of accessibility than the primed and repressed states. Molecular mechanisms that may account for these quantitative differences are discussed. A model that relates molecular events at an enhancer to changes in its activity and accessibility in a developing tissue is also proposed.
Collapse
Affiliation(s)
- Marta Bozek
- Department Biochemie, Ludwig-Maximilians Universität München, Genzentrum, 81377, München, Germany
| | - Nicolas Gompel
- Fakultät für Biologie, Ludwig-Maximilians Universität München, Biozentrum, 82152, Planegg-Martinsried, Germany
| |
Collapse
|
9
|
Repele A, Krueger S, Bhattacharyya T, Tuineau MY. The regulatory control of Cebpa enhancers and silencers in the myeloid and red-blood cell lineages. PLoS One 2019; 14:e0217580. [PMID: 31181110 PMCID: PMC6557489 DOI: 10.1371/journal.pone.0217580] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 05/14/2019] [Indexed: 12/31/2022] Open
Abstract
Cebpa encodes a transcription factor (TF) that plays an instructive role in the development of multiple myeloid lineages. The expression of Cebpa itself is finely modulated, as Cebpa is expressed at high and intermediate levels in neutrophils and macrophages respectively and downregulated in non-myeloid lineages. The cis-regulatory logic underlying the lineage-specific modulation of Cebpa's expression level is yet to be fully characterized. Previously, we had identified 6 new cis-regulatory modules (CRMs) in a 78kb region surrounding Cebpa. We had also inferred the TFs that regulate each CRM by fitting a sequence-based thermodynamic model to a comprehensive reporter activity dataset. Here, we report the cis-regulatory logic of Cebpa CRMs at the resolution of individual binding sites. We tested the binding sites and functional roles of inferred TFs by designing and constructing mutated CRMs and comparing theoretical predictions of their activity against empirical measurements in a myeloid cell line. The enhancers were confirmed to be activated by combinations of PU.1, C/EBP family TFs, Egr1, and Gfi1 as predicted by the model. We show that silencers repress the activity of the proximal promoter in a dominant manner in G1ME cells, which are derived from the red-blood cell lineage. Dominant repression in G1ME cells can be traced to binding sites for GATA and Myb, a motif shared by all of the silencers. Finally, we demonstrate that GATA and Myb act redundantly to silence the proximal promoter. These results indicate that dominant repression is a novel mechanism for resolving hematopoietic lineages. Furthermore, Cebpa has a fail-safe cis-regulatory architecture, featuring several functionally similar CRMs, each of which contains redundant binding sites for multiple TFs. Lastly, by experimentally demonstrating the predictive ability of our sequence-based thermodynamic model, this work highlights the utility of this computational approach for understanding mammalian gene regulation.
Collapse
Affiliation(s)
- Andrea Repele
- Department of Biology, University of North Dakota, Grand Forks, ND, United States of America
| | - Shawn Krueger
- Department of Biology, University of North Dakota, Grand Forks, ND, United States of America
| | - Tapas Bhattacharyya
- Department of Biology, University of North Dakota, Grand Forks, ND, United States of America
| | - Michelle Y Tuineau
- Department of Biology, University of North Dakota, Grand Forks, ND, United States of America
| |
Collapse
|
10
|
Samee MAH, Lydiard-Martin T, Biette KM, Vincent BJ, Bragdon MD, Eckenrode KB, Wunderlich Z, Estrada J, Sinha S, DePace AH. Quantitative Measurement and Thermodynamic Modeling of Fused Enhancers Support a Two-Tiered Mechanism for Interpreting Regulatory DNA. Cell Rep 2018; 21:236-245. [PMID: 28978476 DOI: 10.1016/j.celrep.2017.09.033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 07/30/2017] [Accepted: 09/08/2017] [Indexed: 02/07/2023] Open
Abstract
Computational models of enhancer function generally assume that transcription factors (TFs) exert their regulatory effects independently, modeling an enhancer as a "bag of sites." These models fail on endogenous loci that harbor multiple enhancers, and a "two-tier" model appears better suited: in each enhancer TFs work independently, and the total expression is a weighted sum of their expression readouts. Here, we test these two opposing views on how cis-regulatory information is integrated. We fused two Drosophila blastoderm enhancers, measured their readouts, and applied the above two models to these data. The two-tier mechanism better fits these readouts, suggesting that these fused enhancers comprise multiple independent modules, despite having sequence characteristics typical of single enhancers. We show that short-range TF-TF interactions are not sufficient to designate such modules, suggesting unknown underlying mechanisms. Our results underscore that mechanisms of how modules are defined and how their outputs are combined remain to be elucidated.
Collapse
Affiliation(s)
- Md Abul Hassan Samee
- Gladstone Institutes, University of California San Francisco, San Francisco, CA 94158, USA
| | - Tara Lydiard-Martin
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Kelly M Biette
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Ben J Vincent
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Meghan D Bragdon
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Kelly B Eckenrode
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Zeba Wunderlich
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697, USA
| | - Javier Estrada
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute of Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.
| | - Angela H DePace
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
11
|
Bekiaris PS, Tekath T, Staiger D, Danisman S. Computational exploration of cis-regulatory modules in rhythmic expression data using the "Exploration of Distinctive CREs and CRMs" (EDCC) and "CRM Network Generator" (CNG) programs. PLoS One 2018; 13:e0190421. [PMID: 29298348 PMCID: PMC5752016 DOI: 10.1371/journal.pone.0190421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 12/14/2017] [Indexed: 11/19/2022] Open
Abstract
Understanding the effect of cis-regulatory elements (CRE) and clusters of CREs, which are called cis-regulatory modules (CRM), in eukaryotic gene expression is a challenge of computational biology. We developed two programs that allow simple, fast and reliable analysis of candidate CREs and CRMs that may affect specific gene expression and that determine positional features between individual CREs within a CRM. The first program, "Exploration of Distinctive CREs and CRMs" (EDCC), correlates candidate CREs and CRMs with specific gene expression patterns. For pairs of CREs, EDCC also determines positional preferences of the single CREs in relation to each other and to the transcriptional start site. The second program, "CRM Network Generator" (CNG), prioritizes these positional preferences using a neural network and thus allows unbiased rating of the positional preferences that were determined by EDCC. We tested these programs with data from a microarray study of circadian gene expression in Arabidopsis thaliana. Analyzing more than 1.5 million pairwise CRE combinations, we found 22 candidate combinations, of which several contained known clock promoter elements together with elements that had not been identified as relevant to circadian gene expression before. CNG analysis further identified positional preferences of these CRE pairs, hinting at positional information that may be relevant for circadian gene expression. Future wet lab experiments will have to determine which of these combinations confer daytime specific circadian gene expression.
Collapse
Affiliation(s)
| | - Tobias Tekath
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Selahattin Danisman
- RNA Biology and Molecular Physiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
12
|
Barr KA, Martinez C, Moran JR, Kim AR, Ramos AF, Reinitz J. Synthetic enhancer design by in silico compensatory evolution reveals flexibility and constraint in cis-regulation. BMC SYSTEMS BIOLOGY 2017; 11:116. [PMID: 29187214 PMCID: PMC5708098 DOI: 10.1186/s12918-017-0485-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 11/09/2017] [Indexed: 11/12/2022]
Abstract
BACKGROUND Models that incorporate specific chemical mechanisms have been successful in describing the activity of Drosophila developmental enhancers as a function of underlying transcription factor binding motifs. Despite this, the minimum set of mechanisms required to reconstruct an enhancer from its constituent parts is not known. Synthetic biology offers the potential to test the sufficiency of known mechanisms to describe the activity of enhancers, as well as to uncover constraints on the number, order, and spacing of motifs. RESULTS Using a functional model and in silico compensatory evolution, we generated putative synthetic even-skipped stripe 2 enhancers with varying degrees of similarity to the natural enhancer. These elements represent the evolutionary trajectories of the natural stripe 2 enhancer towards two synthetic enhancers designed ab initio. In the first trajectory, spatially regulated expression was maintained, even after more than a third of binding sites were lost. In the second, sequences with high similarity to the natural element did not drive expression, but a highly diverged sequence about half the length of the minimal stripe 2 enhancer drove ten times greater expression. Additionally, homotypic clusters of Zelda or Stat92E motifs, but not Bicoid, drove expression in developing embryos. CONCLUSIONS Here, we present a functional model of gene regulation to test the degree to which the known transcription factors and their interactions explain the activity of the Drosophila even-skipped stripe 2 enhancer. Initial success in the first trajectory showed that the gene regulation model explains much of the function of the stripe 2 enhancer. Cases where expression deviated from prediction indicates that undescribed factors likely act to modulate expression. We also showed that activation driven Bicoid and Hunchback is highly sensitive to spatial organization of binding motifs. In contrast, Zelda and Stat92E drive expression from simple homotypic clusters, suggesting that activation driven by these factors is less constrained. Collectively, the 40 sequences generated in this work provides a powerful training set for building future models of gene regulation.
Collapse
Affiliation(s)
- Kenneth A Barr
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Zoology 111, 1101 E 57th St, Chicago, 60637, Illinois, USA.
- Department of Ecology and Evolution, The University of Chicago, Chicago, 60637, Illinois, USA.
| | - Carlos Martinez
- Department Biochemistry and Molecular Genetics, Northwestern University, Chicago, 60611, Illinois, USA
| | - Jennifer R Moran
- Department Human Genetics, The University of Chicago, Chicago, 60637, Illinois, USA
- Institute for Genomics & Systems Biology, The University of Chicago, Chicago, 60637, Illinois, USA
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, 37554, Gyeongbuk, South Korea
| | - Alexandre F Ramos
- Departamento de Radiologia - Faculdade de Medicina, Universidade de São Paulo & Instituto do Câncer do Estado de São Paulo, São Paulo, SP CEP, 05403-911, Brazil
- Escola de Artes, Ciências e Humanidades & Núcleo de Estudos Interdisciplinares em Sistemas Complexos, Universidade de São Paulo, Av. Arlindo Béttio, São Paulo, 1000 CEP 03828-000, SP, Brazil
| | - John Reinitz
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Zoology 111, 1101 E 57th St, Chicago, 60637, Illinois, USA
- Department of Ecology and Evolution, The University of Chicago, Chicago, 60637, Illinois, USA
- Institute for Genomics & Systems Biology, The University of Chicago, Chicago, 60637, Illinois, USA
- Department Statistics, The University of Chicago, 5747 S. Ellis Avenue Jones 312, Chicago, 60637, IL, USA
| |
Collapse
|
13
|
Koenecke N, Johnston J, He Q, Meier S, Zeitlinger J. Drosophila poised enhancers are generated during tissue patterning with the help of repression. Genome Res 2016; 27:64-74. [PMID: 27979994 PMCID: PMC5204345 DOI: 10.1101/gr.209486.116] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2016] [Accepted: 11/08/2016] [Indexed: 12/18/2022]
Abstract
Histone modifications are frequently used as markers for enhancer states, but how to interpret enhancer states in the context of embryonic development is not clear. The poised enhancer signature, involving H3K4me1 and low levels of H3K27ac, has been reported to mark inactive enhancers that are poised for future activation. However, future activation is not always observed, and alternative reasons for the widespread occurrence of this enhancer signature have not been investigated. By analyzing enhancers during dorsal-ventral (DV) axis formation in the Drosophila embryo, we find that the poised enhancer signature is specifically generated during patterning in the tissue where the enhancers are not induced, including at enhancers that are known to be repressed by a transcriptional repressor. These results suggest that, rather than serving exclusively as an intermediate step before future activation, the poised enhancer state may be a mark for spatial regulation during tissue patterning. We discuss the possibility that the poised enhancer state is more generally the result of repression by transcriptional repressors.
Collapse
Affiliation(s)
- Nina Koenecke
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA
| | - Jeff Johnston
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA
| | - Qiye He
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA
| | - Samuel Meier
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA.,University of Kansas Medical Center, Department of Pathology, Kansas City, Kansas 66160, USA
| |
Collapse
|
14
|
Sayal R, Dresch JM, Pushel I, Taylor BR, Arnosti DN. Quantitative perturbation-based analysis of gene expression predicts enhancer activity in early Drosophila embryo. eLife 2016; 5. [PMID: 27152947 PMCID: PMC4859806 DOI: 10.7554/elife.08445] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 04/04/2016] [Indexed: 01/02/2023] Open
Abstract
Enhancers constitute one of the major components of regulatory machinery of metazoans. Although several genome-wide studies have focused on finding and locating enhancers in the genomes, the fundamental principles governing their internal architecture and cis-regulatory grammar remain elusive. Here, we describe an extensive, quantitative perturbation analysis targeting the dorsal-ventral patterning gene regulatory network (GRN) controlled by Drosophila NF-κB homolog Dorsal. To understand transcription factor interactions on enhancers, we employed an ensemble of mathematical models, testing effects of cooperativity, repression, and factor potency. Models trained on the dataset correctly predict activity of evolutionarily divergent regulatory regions, providing insights into spatial relationships between repressor and activator binding sites. Importantly, the collective predictions of sets of models were effective at novel enhancer identification and characterization. Our study demonstrates how experimental dataset and modeling can be effectively combined to provide quantitative insights into cis-regulatory information on a genome-wide scale. DOI:http://dx.doi.org/10.7554/eLife.08445.001 DNA contains regions known as genes, which may be “transcribed” to produce the RNA molecules that act as templates for building proteins and regulate cell activity. Proteins called transcription factors can bind to specific sequences of DNA to influence whether nearby genes are transcribed. For example, so-called enhancer regions of DNA contain several binding sites for transcription factors, and this binding activates gene transcription. Little is known about how the transcription factor binding sites are organized in enhancer regions, which makes it difficult to use DNA sequence information alone to predict the regulation of genes. A transcription factor called Dorsal controls the activity of a network of genes that plays a crucial role in the development of fruit fly embryos. Dorsal binds to the enhancer region of a gene called rhomboid, which has been well studied and is known to be a fairly typical example of an enhancer region. To understand the regulatory information encoded in the DNA sequences of enhancers, Sayal, Dresch et al. have now used a technique called perturbation analysis to investigate the interactions that are likely to occur between Dorsal and other transcription factors as they bind to the rhomboid enhancer. This technique involves systematically mutating the enhancer to remove different combinations of transcription factor binding sites and quantitatively investigating the effect this has on gene activity. A large set of mathematical models were then trained using this data and shown to correctly predict the activity of a range of other gene regulatory regions. The collective predictions of the models identified new enhancer regions and revealed details about how different types of transcription factor binding sites are arranged within enhancers. As we enter an era where the DNA sequences of entire human populations are increasingly accessible, we would like to know the functional significance of changes in gene regulatory regions. Sayal, Dresch et al. show that the regulatory properties of specific control proteins are accessible by employing quantitative experiments and mathematical models. Similar studies will be required to learn how mutations found across the genome may alter gene expression, leading to better diagnosis and treatment of disease. DOI:http://dx.doi.org/10.7554/eLife.08445.002
Collapse
Affiliation(s)
- Rupinder Sayal
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, United States.,Department of Biochemistry, DAV University, Jalandhar, India
| | - Jacqueline M Dresch
- Department of Mathematics, Michigan State University, East Lansing, United States.,Department of Mathematics and Computer Science, Clark University, Worcester, United States
| | - Irina Pushel
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, United States.,Stowers Institute for Medical Research, Kansas City, United States
| | - Benjamin R Taylor
- Department of Computer Science and Engineering, Michigan State University, East Lansing, United States.,School of Computer Science, Georgia Institute of Technology, Atlanta, United States
| | - David N Arnosti
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, United States
| |
Collapse
|
15
|
Bertolino E, Reinitz J, Manu. The analysis of novel distal Cebpa enhancers and silencers using a transcriptional model reveals the complex regulatory logic of hematopoietic lineage specification. Dev Biol 2016; 413:128-44. [PMID: 26945717 DOI: 10.1016/j.ydbio.2016.02.030] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 01/13/2016] [Accepted: 02/15/2016] [Indexed: 11/25/2022]
Abstract
C/EBPα plays an instructive role in the macrophage-neutrophil cell-fate decision and its expression is necessary for neutrophil development. How Cebpa itself is regulated in the myeloid lineage is not known. We decoded the cis-regulatory logic of Cebpa, and two other myeloid transcription factors, Egr1 and Egr2, using a combined experimental-computational approach. With a reporter design capable of detecting both distal enhancers and silencers, we analyzed 46 putative cis-regulatory modules (CRMs) in cells representing myeloid progenitors, and derived early macrophages or neutrophils. In addition to novel enhancers, this analysis revealed a surprisingly large number of silencers. We determined the regulatory roles of 15 potential transcriptional regulators by testing 32,768 alternative sequence-based transcriptional models against CRM activity data. This comprehensive analysis allowed us to infer the cis-regulatory logic for most of the CRMs. Silencer-mediated repression of Cebpa was found to be effected mainly by TFs expressed in non-myeloid lineages, highlighting a previously unappreciated contribution of long-distance silencing to hematopoietic lineage resolution. The repression of Cebpa by multiple factors expressed in alternative lineages suggests that hematopoietic genes are organized into densely interconnected repressive networks instead of hierarchies of mutually repressive pairs of pivotal TFs. More generally, our results demonstrate that de novo cis-regulatory dissection is feasible on a large scale with the aid of transcriptional modeling. Current address: Department of Biology, University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, USA.
Collapse
Affiliation(s)
- Eric Bertolino
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, USA.
| | - John Reinitz
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, IL 60637, USA; Department of Statistics, The University of Chicago, Chicago, IL 60637, USA; Department of Ecology and Evolution and Institute of Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Manu
- Department of Ecology and Evolution and Institute of Genomics and Systems Biology, The University of Chicago, Chicago, IL 60637, USA; Department of Biology, University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, USA.
| |
Collapse
|
16
|
Brunwasser-Meirom M, Pollak Y, Goldberg S, Levy L, Atar O, Amit R. Using synthetic bacterial enhancers to reveal a looping-based mechanism for quenching-like repression. Nat Commun 2016; 7:10407. [PMID: 26832446 PMCID: PMC4740811 DOI: 10.1038/ncomms10407] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 12/02/2015] [Indexed: 01/20/2023] Open
Abstract
We explore a model for 'quenching-like' repression by studying synthetic bacterial enhancers, each characterized by a different binding site architecture. To do so, we take a three-pronged approach: first, we compute the probability that a protein-bound dsDNA molecule will loop. Second, we use hundreds of synthetic enhancers to test the model's predictions in bacteria. Finally, we verify the mechanism bioinformatically in native genomes. Here we show that excluded volume effects generated by DNA-bound proteins can generate substantial quenching. Moreover, the type and extent of the regulatory effect depend strongly on the relative arrangement of the binding sites. The implications of these results are that enhancers should be insensitive to 10-11 bp insertions or deletions (INDELs) and sensitive to 5-6 bp INDELs. We test this prediction on 61 σ(54)-regulated qrr genes from the Vibrio genus and confirm the tolerance of these enhancers' sequences to the DNA's helical repeat.
Collapse
Affiliation(s)
- Michal Brunwasser-Meirom
- Department of Biotechnology and Food Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel
| | - Yaroslav Pollak
- Russell Berrie Nanotechnology Institute, Technion—Israel Institute of Technology, Haifa 32000, Israel
| | - Sarah Goldberg
- Department of Biotechnology and Food Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel
| | - Lior Levy
- Department of Biotechnology and Food Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel
| | - Orna Atar
- Department of Biotechnology and Food Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel
- Russell Berrie Nanotechnology Institute, Technion—Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
17
|
Kozlov K, Gursky VV, Kulakovskiy IV, Dymova A, Samsonova M. Analysis of functional importance of binding sites in the Drosophila gap gene network model. BMC Genomics 2015; 16 Suppl 13:S7. [PMID: 26694511 PMCID: PMC4686791 DOI: 10.1186/1471-2164-16-s13-s7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions. RESULTS We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions. CONCLUSIONS The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.
Collapse
Affiliation(s)
- Konstantin Kozlov
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| | - Vitaly V Gursky
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
- Ioffe Institute, 26 Polytechnicheskaya, 194021 St.Petersburg, Russia
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, 32 Vavilova, 119991 Moscow, Russia
| | - Arina Dymova
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| | - Maria Samsonova
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| |
Collapse
|
18
|
Abstract
Transcriptional enhancers direct precise on-off patterns of gene expression during development. To explore the basis for this precision, we conducted a high-throughput analysis of the Otx-a enhancer, which mediates expression in the neural plate of Ciona embryos in response to fibroblast growth factor (FGF) signaling and a localized GATA determinant. We provide evidence that enhancer specificity depends on submaximal recognition motifs having reduced binding affinities ("suboptimization"). Native GATA and ETS (FGF) binding sites contain imperfect matches to consensus motifs. Perfect matches mediate robust but ectopic patterns of gene expression. The native sites are not arranged at optimal intervals, and subtle changes in their spacing alter enhancer activity. Multiple tiers of enhancer suboptimization produce specific, but weak, patterns of expression, and we suggest that clusters of weak enhancers, including certain "superenhancers," circumvent this trade-off in specificity and activity.
Collapse
Affiliation(s)
- Emma K Farley
- Department of Molecular and Cell Biology, Division of Genetics, Genomics and Development, Center for Integrative Genomics, University of California, Berkeley, CA 94720-3200, USA. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | - Katrina M Olson
- Department of Molecular and Cell Biology, Division of Genetics, Genomics and Development, Center for Integrative Genomics, University of California, Berkeley, CA 94720-3200, USA. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Wei Zhang
- Department of Medicine, University of California, San Diego, CA 92093-0688, USA
| | - Alexander J Brandt
- Department of Chemistry, University of California, Berkeley, CA 94720-3200, USA
| | - Daniel S Rokhsar
- Department of Molecular and Cell Biology, Division of Genetics, Genomics and Development, Center for Integrative Genomics, University of California, Berkeley, CA 94720-3200, USA
| | - Michael S Levine
- Department of Molecular and Cell Biology, Division of Genetics, Genomics and Development, Center for Integrative Genomics, University of California, Berkeley, CA 94720-3200, USA. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
19
|
Abstract
BACKGROUND The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. RESULTS We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. CONCLUSIONS The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3) functional important sites are not exclusively located in cis-regulatory elements, but are rather dispersed through regulatory region. It is of importance that some of the sites with high functional impact in hb, Kr and kni regulatory regions coincide with strong sites annotated and verified in Dnase I footprint assays.
Collapse
Affiliation(s)
- Konstantin Kozlov
- St.Petersburg State Polytechnical University, Polytekhnicheskaya 29, 195251 St.Petersburg, Russia
| | - Vitaly Gursky
- Ioffe Physical-Technical Institute, RAS, Polytekhnicheskaya 26, 194021 St.Petersburg, Russia
| | - Ivan Kulakovskiy
- Engelhardt Institute of Molecular Biology, RAS, Vavilov 32, 119991 Moscow, Russia
| | - Maria Samsonova
- St.Petersburg State Polytechnical University, Polytekhnicheskaya 29, 195251 St.Petersburg, Russia
| |
Collapse
|
20
|
Jing Z, Gangalum RK, Mock DC, Bhat SP. A gene-specific non-enhancer sequence is critical for expression from the promoter of the small heat shock protein gene αB-crystallin. Hum Genomics 2014; 8:5. [PMID: 24589182 PMCID: PMC3975602 DOI: 10.1186/1479-7364-8-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2013] [Accepted: 02/10/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Deciphering of the information content of eukaryotic promoters has remained confined to universal landmarks and conserved sequence elements such as enhancers and transcription factor binding motifs, which are considered sufficient for gene activation and regulation. Gene-specific sequences, interspersed between the canonical transacting factor binding sites or adjoining them within a promoter, are generally taken to be devoid of any regulatory information and have therefore been largely ignored. An unanswered question therefore is, do gene-specific sequences within a eukaryotic promoter have a role in gene activation? Here, we present an exhaustive experimental analysis of a gene-specific sequence adjoining the heat shock element (HSE) in the proximal promoter of the small heat shock protein gene, αB-crystallin (cryab). These sequences are highly conserved between the rodents and the humans. RESULTS Using human retinal pigment epithelial cells in culture as the host, we have identified a 10-bp gene-specific promoter sequence (GPS), which, unlike an enhancer, controls expression from the promoter of this gene, only when in appropriate position and orientation. Notably, the data suggests that GPS in comparison with the HSE works in a context-independent fashion. Additionally, when moved upstream, about a nucleosome length of DNA (-154 bp) from the transcription start site (TSS), the activity of the promoter is markedly inhibited, suggesting its involvement in local promoter access. Importantly, we demonstrate that deletion of the GPS results in complete loss of cryab promoter activity in transgenic mice. CONCLUSIONS These data suggest that gene-specific sequences such as the GPS, identified here, may have critical roles in regulating gene-specific activity from eukaryotic promoters.
Collapse
Affiliation(s)
| | | | | | - Suraj P Bhat
- Jules Stein Eye Institute, University of California, Los Angeles, CA 90095, USA.
| |
Collapse
|
21
|
A comparison of midline and tracheal gene regulation during Drosophila development. PLoS One 2014; 9:e85518. [PMID: 24465586 PMCID: PMC3896416 DOI: 10.1371/journal.pone.0085518] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 11/28/2013] [Indexed: 11/19/2022] Open
Abstract
Within the Drosophila embryo, two related bHLH-PAS proteins, Single-minded and Trachealess, control development of the central nervous system midline and the trachea, respectively. These two proteins are bHLH-PAS transcription factors and independently form heterodimers with another bHLH-PAS protein, Tango. During early embryogenesis, expression of Single-minded is restricted to the midline and Trachealess to the trachea and salivary glands, whereas Tango is ubiquitously expressed. Both Single-minded/Tango and Trachealess/Tango heterodimers bind to the same DNA sequence, called the CNS midline element (CME) within cis-regulatory sequences of downstream target genes. While Single-minded/Tango and Trachealess/Tango activate some of the same genes in their respective tissues during embryogenesis, they also activate a number of different genes restricted to only certain tissues. The goal of this research is to understand how these two related heterodimers bind different enhancers to activate different genes, thereby regulating the development of functionally diverse tissues. Existing data indicates that Single-minded and Trachealess may bind to different co-factors restricted to various tissues, causing them to interact with the CME only within certain sequence contexts. This would lead to the activation of different target genes in different cell types. To understand how the context surrounding the CME is recognized by different bHLH-PAS heterodimers and their co-factors, we identified and analyzed novel enhancers that drive midline and/or tracheal expression and compared them to previously characterized enhancers. In addition, we tested expression of synthetic reporter genes containing the CME flanked by different sequences. Taken together, these experiments identify elements overrepresented within midline and tracheal enhancers and suggest that sequences immediately surrounding a CME help dictate whether a gene is expressed in the midline or trachea.
Collapse
|
22
|
Using evolutionary computations to understand the design and evolution of gene and cell regulatory networks. Methods 2013; 62:39-55. [PMID: 23726941 DOI: 10.1016/j.ymeth.2013.05.013] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 11/30/2012] [Accepted: 05/21/2013] [Indexed: 12/21/2022] Open
Abstract
This paper surveys modeling approaches for studying the evolution of gene regulatory networks (GRNs). Modeling of the design or 'wiring' of GRNs has become increasingly common in developmental and medical biology, as a means of quantifying gene-gene interactions, the response to perturbations, and the overall dynamic motifs of networks. Drawing from developments in GRN 'design' modeling, a number of groups are now using simulations to study how GRNs evolve, both for comparative genomics and to uncover general principles of evolutionary processes. Such work can generally be termed evolution in silico. Complementary to these biologically-focused approaches, a now well-established field of computer science is Evolutionary Computations (ECs), in which highly efficient optimization techniques are inspired from evolutionary principles. In surveying biological simulation approaches, we discuss the considerations that must be taken with respect to: (a) the precision and completeness of the data (e.g. are the simulations for very close matches to anatomical data, or are they for more general exploration of evolutionary principles); (b) the level of detail to model (we proceed from 'coarse-grained' evolution of simple gene-gene interactions to 'fine-grained' evolution at the DNA sequence level); (c) to what degree is it important to include the genome's cellular context; and (d) the efficiency of computation. With respect to the latter, we argue that developments in computer science EC offer the means to perform more complete simulation searches, and will lead to more comprehensive biological predictions.
Collapse
|
23
|
Chiu C, Fakhouri W, Liu N, Dayringer E, Dresch J, Arnosti D. A two-scale mathematical model for DNA transcription. Math Biosci 2012; 236:132-40. [PMID: 22343054 DOI: 10.1016/j.mbs.2011.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2011] [Revised: 12/16/2011] [Accepted: 12/21/2011] [Indexed: 10/28/2022]
Abstract
Unlike the earlier description of regulation of DNA transcription as a biological switch which simply turns on and off, scientists now understand that DNA transcription is a much more complex process. It can depend on several transcription factors (proteins) and DNA regulatory elements (transcription factor binding sites). The combination of these two groups of different scaled factors determines the transcription outcome. In this paper, we propose a two-scale mathematical model for the DNA transcription processes, which integrates the characteristics of both transcription factors and DNA cis-regulatory elements. The model was tested on a well designed synthetic system during early development stage of Drosophila embryo. The system involves three transcription factors (two activators and one repressor) and a reporter gene. The predicted results using the model were compared with the real experimental data using both graphical methods and statistical methods. Parameter estimation will also be discussed in the paper.
Collapse
Affiliation(s)
- Chichia Chiu
- Department of Mathematics, Michigan State University, East Lansing, MI 48824-1027, USA.
| | | | | | | | | | | |
Collapse
|
24
|
Kharazmi J, Moshfegh C, Brody T. Identification of cis-Regulatory Elements in the dmyc Gene of Drosophila Melanogaster. GENE REGULATION AND SYSTEMS BIOLOGY 2012; 6:15-42. [PMID: 22267917 PMCID: PMC3256997 DOI: 10.4137/grsb.s8044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Myc is a crucial regulator of growth and proliferation during animal development. Many signals and transcription factors lead to changes in the expression levels of Drosophila myc, yet no clear model exists to explain the complexity of its regulation at the level of transcription. In this study we used Drosophila genetic tools to track the dmyc cis-regulatory elements. Bioinformatics analyses identified conserved sequence blocks in the noncoding regions of the dmyc gene. Investigation of lacZ reporter activity driven by upstream, downstream, and intronic sequences of the dmyc gene in embryonic, larval imaginal discs, larval brain, and adult ovaries, revealed that it is likely to be transcribed from multiple transcription initiation units including a far upstream regulatory region, a TATA box containing proximal complex and a TATA-less downstream promoter element in conjunction with an initiator within the intron 2 region. Our data provide evidence for a modular organization of dmyc regulatory sequences; these modules will most likely be required to generate the tissue-specific patterns of dmyc transcripts. The far upstream region is active in late embryogenesis, while activity of other cis elements is evident during embryogenesis, in specific larval imaginal tissues and during oogenesis. These data provide a framework for further investigation of the transcriptional regulatory mechanisms of dmyc.
Collapse
Affiliation(s)
- Jasmine Kharazmi
- Biotechnopark Zurich, Molecular Biology Laboratory, University of Zurich-Irchel, Zurich, Switzerland
| | | | | |
Collapse
|
25
|
Technau M, Knispel M, Roth S. Molecular mechanisms of EGF signaling-dependent regulation of pipe, a gene crucial for dorsoventral axis formation in Drosophila. Dev Genes Evol 2011; 222:1-17. [PMID: 22198544 PMCID: PMC3291829 DOI: 10.1007/s00427-011-0384-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 11/29/2011] [Indexed: 01/28/2023]
Abstract
During Drosophila oogenesis the expression of the sulfotransferase Pipe in ventral follicle cells is crucial for dorsoventral axis formation. Pipe modifies proteins that are incorporated in the ventral eggshell and activate Toll signaling which in turn initiates embryonic dorsoventral patterning. Ventral pipe expression is the result of an oocyte-derived EGF signal which down-regulates pipe in dorsal follicle cells. The analysis of mutant follicle cell clones reveals that none of the transcription factors known to act downstream of EGF signaling in Drosophila is required or sufficient for pipe regulation. However, the pipe cis-regulatory region harbors a 31-bp element which is essential for pipe repression, and ovarian extracts contain a protein that binds this element. Thus, EGF signaling does not act by down-regulating an activator of pipe as previously suggested but rather by activating a repressor. Surprisingly, this repressor acts independent of the common co-repressors Groucho or CtBP.
Collapse
Affiliation(s)
- Martin Technau
- Institute for Developmental Biology, Biocenter, University of Cologne, Zuelpicher Straße 47b, 50674, Cologne, Germany
| | | | | |
Collapse
|
26
|
Transcription factor binding site redundancy in embryonic enhancers of the Drosophila bithorax complex. G3-GENES GENOMES GENETICS 2011; 1:603-6. [PMID: 22384371 PMCID: PMC3276168 DOI: 10.1534/g3.111.001404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 10/18/2011] [Indexed: 01/22/2023]
Abstract
The molecular control of gene expression in development is mediated through the activity of embryonic enhancer cis-regulatory modules. This activity is determined by the combination of repressor and activator transcription factors that bind at specific DNA sequences in the enhancer. A proposed mechanism to ensure a high fidelity of transcriptional output is functional redundancy between closely spaced binding sites within an enhancer. Here I show that at the bithorax complex in Drosophila there is selective redundancy for both repressor and activator factor binding sites in vivo. The absence of compensatory binding sites is responsible for two rare gain-of-function mutations in the complex.
Collapse
|
27
|
Nourmohammad A, Lässig M. Formation of regulatory modules by local sequence duplication. PLoS Comput Biol 2011; 7:e1002167. [PMID: 21998564 PMCID: PMC3188502 DOI: 10.1371/journal.pcbi.1002167] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2011] [Accepted: 06/30/2011] [Indexed: 11/24/2022] Open
Abstract
Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms. Since Jacob and Monod stressed the importance of gene regulation in evolution, our understanding of the mechanisms of regulation has substantially advanced. In higher eukaryotes, genes often have complex regulatory input, which is encoded in cis-regulatory sequence with multiple transcription factor binding sites. However, the modes of genome evolution generating regulatory complexity are much less understood. This study reports a surprising finding: in fly regulatory modules, the majority of transcription factor binding sites show evidence of a local sequence duplication in their evolutionary history, which relates their sequence information to that of neighboring binding sites. Our analysis suggests that local sequence duplications are a pervasive production mode of regulatory information. This mode appears to be specific to higher eukaryotes; we have not found evidence of frequent local duplications in the yeast genome. Our results affect genomic sequence analysis, in particular, computational identification of cis-regulatory elements and alignment of regulatory DNA. At the same time, they address fundamental questions on the evolution of regulation: How much of the regulatory “grammar” observed in higher eukaryotes is due to optimization of function, and how much reflects the underlying sequence evolution modes? What is the result and what is the substrate of natural selection?
Collapse
Affiliation(s)
| | - Michael Lässig
- Institute for Theoretical Physics, University of Cologne, Köln, Germany
- * E-mail:
| |
Collapse
|
28
|
Swanson CI, Schwimmer DB, Barolo S. Rapid evolutionary rewiring of a structurally constrained eye enhancer. Curr Biol 2011; 21:1186-96. [PMID: 21737276 DOI: 10.1016/j.cub.2011.05.056] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Revised: 04/18/2011] [Accepted: 05/27/2011] [Indexed: 12/20/2022]
Abstract
BACKGROUND Enhancers are genomic cis-regulatory sequences that integrate spatiotemporal signals to control gene expression. Enhancer activity depends on the combination of bound transcription factors as well as-in some cases-the arrangement and spacing of binding sites for these factors. Here, we examine evolutionary changes to the sequence and structure of sparkling, a Notch/EGFR/Runx-regulated enhancer that activates the dPax2 gene in cone cells of the developing Drosophila eye. RESULTS Despite functional and structural constraints on its sequence, sparkling has undergone major reorganization in its recent evolutionary history. Our data suggest that the relative strengths of the various regulatory inputs into sparkling change rapidly over evolutionary time, such that reduced input from some factors is compensated by increased input from different regulators. These gains and losses are at least partly responsible for the changes in enhancer structure that we observe. Furthermore, stereotypical spatial relationships between certain binding sites ("grammar elements") can be identified in all sparkling orthologs-although the sites themselves are often recently derived. We also find that low binding affinity for the Notch-regulated transcription factor Su(H), a conserved property of sparkling, is required to prevent ectopic responses to Notch in noncone cells. CONCLUSIONS Rapid DNA sequence turnover does not imply either the absence of critical cis-regulatory information or the absence of structural rules. Our findings demonstrate that even a severely constrained cis-regulatory sequence can be significantly rewired over a short evolutionary timescale.
Collapse
Affiliation(s)
- Christina I Swanson
- Department of Cell and Developmental Biology, University of Michigan Medical School, Ann Arbor, MI 48109-2200, USA
| | | | | |
Collapse
|
29
|
Kaplan T, Li XY, Sabo PJ, Thomas S, Stamatoyannopoulos JA, Biggin MD, Eisen MB. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet 2011; 7:e1001290. [PMID: 21304941 PMCID: PMC3033374 DOI: 10.1371/journal.pgen.1001290] [Citation(s) in RCA: 139] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 01/01/2011] [Indexed: 01/01/2023] Open
Abstract
Transcription factors that drive complex patterns of gene expression during animal development bind to thousands of genomic regions, with quantitative differences in binding across bound regions mediating their activity. While we now have tools to characterize the DNA affinities of these proteins and to precisely measure their genome-wide distribution in vivo, our understanding of the forces that determine where, when, and to what extent they bind remains primitive. Here we use a thermodynamic model of transcription factor binding to evaluate the contribution of different biophysical forces to the binding of five regulators of early embryonic anterior-posterior patterning in Drosophila melanogaster. Predictions based on DNA sequence and in vitro protein-DNA affinities alone achieve a correlation of ∼0.4 with experimental measurements of in vivo binding. Incorporating cooperativity and competition among the five factors, and accounting for spatial patterning by modeling binding in every nucleus independently, had little effect on prediction accuracy. A major source of error was the prediction of binding events that do not occur in vivo, which we hypothesized reflected reduced accessibility of chromatin. To test this, we incorporated experimental measurements of genome-wide DNA accessibility into our model, effectively restricting predicted binding to regions of open chromatin. This dramatically improved our predictions to a correlation of 0.6-0.9 for various factors across known target genes. Finally, we used our model to quantify the roles of DNA sequence, accessibility, and binding competition and cooperativity. Our results show that, in regions of open chromatin, binding can be predicted almost exclusively by the sequence specificity of individual factors, with a minimal role for protein interactions. We suggest that a combination of experimentally determined chromatin accessibility data and simple computational models of transcription factor binding may be used to predict the binding landscape of any animal transcription factor with significant precision.
Collapse
Affiliation(s)
- Tommy Kaplan
- Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
| | - Xiao-Yong Li
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
| | - Peter J. Sabo
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Sean Thomas
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | | | - Mark D. Biggin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
30
|
Sayal R, Ryu SM, Arnosti DN. Optimization of reporter gene architecture for quantitative measurements of gene expression in the Drosophila embryo. Fly (Austin) 2011; 5:47-52. [PMID: 21150286 DOI: 10.4161/fly.5.1.14159] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Quantitative assessment of gene regulation is critical for mathematical modeling of transcriptional systems for systems biology efforts. Enhancers, also termed cis-regulatory modules (CRMs), are the primary mediators of transcriptional regulation in higher eukaryotes; transcription factors binding to CRMs dictate the likelihood and frequency of promoter activation. To provide a suitable platform for in-depth CRM analysis, we adapted a targeted integration vector to compare action of basal promoters with diverse combination of TATA, Inr and DPE motifs, as well as a set of 3'-UTRs representative of those used in different reporter vectors. This "Honda" series of reporter gene vectors was activated by a regulatory element binding Dorsal and Twist activators suitable for transcription in the early Drosophila embryo. The diverse promoters functioned in a similar manner with minor quantitative differences, consistent with a lack of enhancer-promoter specificity. Constructs bearing SV40 3'-UTR sequences appeared to produce somewhat higher levels of mRNA. Confocal laser scanning microscopy revealed that the mRNA distribution produced by these constructs was punctate; this pattern appears to be dependent on 5'-UTR sequences, as an optimized vector including an alternate 5'-UTR produced a more even distribution, which may be preferable for quantitative imaging. This set of Honda vectors contains convenient sites for modification of basal promoter, 3' UTR, and enhancer, and will be useful for analysis of CRMs and quantitative studies of gene expression.
Collapse
Affiliation(s)
- Rupinder Sayal
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | | | | |
Collapse
|
31
|
Fulkerson E, Estes PA. Common motifs shared by conserved enhancers of Drosophila midline glial genes. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2010; 316:61-75. [PMID: 21154525 DOI: 10.1002/jez.b.21382] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 09/07/2010] [Accepted: 09/28/2010] [Indexed: 12/12/2022]
Abstract
Coding sequences are usually the most highly conserved sectors of DNA, but genomic regions controlling the expression pattern of certain genes can also be conserved across diverse species. In this study, we identify five enhancers capable of activating transcription in the midline glia of Drosophila melanogaster and each contains sequences conserved across at least 11 Drosophila species. In addition, the conserved sequences contain reiterated motifs for binding sites of the known midline transcriptional activators, Single-minded, Tango, Dichaete, and Pointed. To understand the molecular basis for the highly conserved genomic subregions within enhancers of the midline genes, we tested the ability of various motifs to affect midline expression, both individually and in combination, within synthetic reporter constructs. Multiple copies of the binding site for the midline regulators Single-minded and Tango can drive expression in midline cells; however, small changes to the sequences flanking this transcription factor binding site can inactivate expression in midline cells and activate expression in tracheal cells instead. For the midline genes described in this study, the highly conserved sequences appear to juxtapose positive and negative regulatory factors in a configuration that activates genes specifically in the midline glia, while maintaining them inactive in other tissues, including midline neurons and tracheal cells.
Collapse
Affiliation(s)
- Eric Fulkerson
- Department of Genetics, North Carolina State University, Raleigh, North Carolina 27695, USA
| | | |
Collapse
|
32
|
Ribeiro TC, Ventrice G, Machado-Lima A, Andrioli LP. Investigating giant (Gt) repression in the formation of partially overlapping pair-rule stripes. Dev Dyn 2010; 239:2989-99. [DOI: 10.1002/dvdy.22434] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
33
|
Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression. PLoS Comput Biol 2010; 6. [PMID: 20862354 PMCID: PMC2940721 DOI: 10.1371/journal.pcbi.1000935] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2010] [Accepted: 08/17/2010] [Indexed: 01/08/2023] Open
Abstract
Quantitative models of cis-regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled, or heuristic approximations of the underlying regulatory mechanisms. We have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence, as a function of transcription factor concentrations and their DNA-binding specificities. It uses statistical thermodynamics theory to model not only protein-DNA interaction, but also the effect of DNA-bound activators and repressors on gene expression. In addition, the model incorporates mechanistic features such as synergistic effect of multiple activators, short range repression, and cooperativity in transcription factor-DNA binding, allowing us to systematically evaluate the significance of these features in the context of available expression data. Using this model on segmentation-related enhancers in Drosophila, we find that transcriptional synergy due to simultaneous action of multiple activators helps explain the data beyond what can be explained by cooperative DNA-binding alone. We find clear support for the phenomenon of short-range repression, where repressors do not directly interact with the basal transcriptional machinery. We also find that the binding sites contributing to an enhancer's function may not be conserved during evolution, and a noticeable fraction of these undergo lineage-specific changes. Our implementation of the model, called GEMSTAT, is the first publicly available program for simultaneously modeling the regulatory activities of a given set of sequences. The development of complex multicellular organisms requires genes to be expressed at specific stages and in specific tissues. Regulatory DNA sequences, often called cis-regulatory modules, drive the desired gene expression patterns by integrating information about the environment in the form of the activities of transcription factors. The rules by which regulatory sequences read this type of information, however, are unclear. In this work, we developed quantitative models based on physicochemical principles that directly map regulatory sequences to the expression profiles they generate. We evaluated these models on the segmentation network of the model organism Drosophila melanogaster. Our models incorporate mechanistic features that attempt to capture how activating and repressing transcription factors work in the segmentation system. By evaluating the importance of these features, we were able to gain insights on the quantitative regulatory rules. We found that two different mechanisms may contribute to cooperative gene activation and that repressors often have a short range of influence in DNA sequences. Combining the quantitative modeling with comparative sequence analysis, we also found that even functional sequences may be lost during evolution.
Collapse
|
34
|
Role of en and novel interactions between msh, ind, and vnd in dorsoventral patterning of the Drosophila brain and ventral nerve cord. Dev Biol 2010; 346:332-45. [PMID: 20673828 DOI: 10.1016/j.ydbio.2010.07.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Revised: 07/14/2010] [Accepted: 07/17/2010] [Indexed: 12/27/2022]
Abstract
Subdivision of the neuroectoderm into discrete gene expression domains is essential for the correct specification of neural stem cells (neuroblasts) during central nervous system development. Here, we extend our knowledge on dorsoventral (DV) patterning of the Drosophila brain and uncover novel genetic interactions that control expression of the evolutionary conserved homeobox genes ventral nervous system defective (vnd), intermediate neuroblasts defective (ind), and muscle segment homeobox (msh). We show that cross-repression between Ind and Msh stabilizes the border between intermediate and dorsal tritocerebrum and deutocerebrum, and that both transcription factors are competent to inhibit vnd expression. Conversely, Vnd segment-specifically affects ind expression; it represses ind in the tritocerebrum but positively regulates ind in the deutocerebrum by suppressing Msh. These data provide further evidence that in the brain, in contrast to the trunc, the precise boundaries between DV gene expression domains are largely established through mutual inhibition. Moreover, we find that the segment-polarity gene engrailed (en) regulates the expression of vnd, ind, and msh in a segment-specific manner. En represses msh and ind but maintains vnd expression in the deutocerebrum, is required for down-regulation of Msh in the tritocerebrum to allow activation of ind, and is necessary for maintenance of Ind in truncal segments. These results indicate that input from the anteroposterior patterning system is needed for the spatially restricted expression of DV genes in the brain and ventral nerve cord.
Collapse
|
35
|
Abstract
The expression of most genes is regulated by multiple transcription factors. The interactions between transcription factors produce complex patterns of gene expression that are not always obvious from the arrangement of cis-regulatory elements in a promoter. One critical element of promoters is the TATA box, the docking site for the RNA polymerase holoenzyme. Using a synthetic promoter system coupled to a thermodynamic model of combinatorial regulation, we analyze the effects of different strength TATA boxes on various aspects of combinatorial cis-regulation. The thermodynamic model explains 75% of the variance in gene expression in synthetic promoter libraries with different strength TATA boxes, suggesting that many of the salient aspects of cis-regulation are captured by the model. Our results demonstrate that the effect of changing the TATA box on gene expression is the same for all synthetic promoters regardless of the arrangement of cis-regulatory sites we studied. Our analysis also showed that in our synthetic system the strength of the RNA polymerase-TATA interaction does not alter the combinatorial interactions between transcription factors, or between transcription factors and RNA polymerase. Finally, we show that although stronger TATA boxes increase expression in a predictable fashion, stronger TATA boxes have very little effect on noise in our synthetic promoters, regardless of the arrangement of cis-regulatory sites. Our results support a modular model of promoter function, where cis-regulatory elements can be mixed and matched (programmed) with outcomes on expression that are predictable based on the rules of simple protein-protein and protein-DNA interactions.
Collapse
|
36
|
Lusk RW, Eisen MB. Evolutionary mirages: selection on binding site composition creates the illusion of conserved grammars in Drosophila enhancers. PLoS Genet 2010; 6:e1000829. [PMID: 20107516 PMCID: PMC2809757 DOI: 10.1371/journal.pgen.1000829] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2009] [Accepted: 12/22/2009] [Indexed: 01/05/2023] Open
Abstract
The clustering of transcription factor binding sites in developmental enhancers and the apparent preferential conservation of clustered sites have been widely interpreted as proof that spatially constrained physical interactions between transcription factors are required for regulatory function. However, we show here that selection on the composition of enhancers alone, and not their internal structure, leads to the accumulation of clustered sites with evolutionary dynamics that suggest they are preferentially conserved. We simulated the evolution of idealized enhancers from Drosophila melanogaster constrained to contain only a minimum number of binding sites for one or more factors. Under this constraint, mutations that destroy an existing binding site are tolerated only if a compensating site has emerged elsewhere in the enhancer. Overlapping sites, such as those frequently observed for the activator Bicoid and repressor Krüppel, had significantly longer evolutionary half-lives than isolated sites for the same factors. This leads to a substantially higher density of overlapping sites than expected by chance and the appearance that such sites are preferentially conserved. Because D. melanogaster (like many other species) has a bias for deletions over insertions, sites tended to become closer together over time, leading to an overall clustering of sites in the absence of any selection for clustered sites. Since this effect is strongest for the oldest sites, clustered sites also incorrectly appear to be preferentially conserved. Following speciation, sites tend to be closer together in all descendent species than in their common ancestors, violating the common assumption that shared features of species' genomes reflect their ancestral state. Finally, we show that selection on binding site composition alone recapitulates the observed number of overlapping and closely neighboring sites in real D. melanogaster enhancers. Thus, this study calls into question the common practice of inferring "cis-regulatory grammars" from the organization and evolutionary dynamics of developmental enhancers.
Collapse
Affiliation(s)
- Richard W. Lusk
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
37
|
Fakhouri WD, Ay A, Sayal R, Dresch J, Dayringer E, Arnosti DN. Deciphering a transcriptional regulatory code: modeling short-range repression in the Drosophila embryo. Mol Syst Biol 2010; 6:341. [PMID: 20087339 PMCID: PMC2824527 DOI: 10.1038/msb.2009.97] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2009] [Accepted: 11/30/2009] [Indexed: 12/29/2022] Open
Abstract
A well-defined set of transcriptional regulatory modules was created and analyzed in the Drosophila embryo. Fractional occupancy-based models were developed to explain the interaction of short range transcriptional repressors with endogenous activators by using quantitative data from these modules. Our fractional occupancy-based modeling uncovered specific quantitative features of short-range repressors; a complex nonlinear quenching relationship, similar quenching efficiencies for different activators, and modest levels of cooperativity The extension of the study to endogenous enhancers highlighted several features of enhancer architecture design in Drosophila embryos.
Transcriptional regulatory information, represented by patterns of protein-binding sites on DNA, comprises an important portion of genetic coding. Despite the abundance of genomic sequences now available, identifying and characterizing this information remain a major challenge. Minor changes in protein-binding sites can have profound effects on gene expression, and such changes have been shown to underlie important aspects of disease and evolution. Thus, an important aim in contemporary systems biology is to develop a global understanding of the transcriptional regulatory code, allowing prediction of gene output based on DNA sequence information. Recent studies have focused on endogenous transcriptional regulatory sequences (Janssens et al, 2006; Zinzen et al, 2006; Segal et al, 2008); however, distinct enhancers differ in many features, including transcription factor activity, spacing, and cooperativity, making it difficult to learn the effects of individual features and generalize them to other cis-regulatory elements. We have pursued a bottom up approach to understand the mechanistic processing of regulatory elements by the transcriptional machinery, using a well-defined and characterized set of repressors and activators in Drosophila blastoderm embryos. The study focuses on the Giant, Krüppel, Knirps, and Snail proteins, which have been characterized as short-range repressors, able to act locally to interfere with activator function (quenching) (Gray et al, 1994; Arnosti et al, 1996a). Such repressors have central functions in development. The aim our study was to enable ab initio predictions of enhancer function, given defined quantities of regulatory proteins and the sequence of the enhancer (Figure 1). We have generated a large quantitative data set using fluorescent confocal laser scanning microscopy to determine the inputs (Giant, Krüppel, and Knirps protein levels) and outputs (lacZ mRNA levels) of the regulatory elements introduced into Drosophila by transgenesis. We analyzed the effect of altering specific features of a set of related gene modules, designed to uncover critical aspects of repression, including quenching distance, cooperativity, and overall factor potency. We generated specific descriptions for each regulatory element using fractional occupancy-based modeling and identified quantitative values for parameters affecting transcriptional regulation in vivo, and these parameters were used to build and test the model. Through this process, we uncovered earlier unknown features that allow correct predictions of regulation by short-range repressors, including a non-monotonic distance function for quenching, which implicates possible phasing effects, a modest contribution for repressor–repressor cooperativity, and similarity in repression of disparate activators. By applying these parameters to a model of the endogenous rhomboid enhancer, we uncovered novel insights into the architecture of this enhancer (Figure 8). Our study provides essential quantitative elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms. Extension of these predictive models should facilitate the development of more sophisticated computational algorithms for the identification and functional characterization of novel regulatory elements. The development of such quantitative modeling tools will change our understanding of the genome from essentially a parts list to a dynamically regulated system, and will greatly facilitate studies in disease, population genetics, and evolutionary biology. Systems biology seeks a genomic-level interpretation of transcriptional regulatory information represented by patterns of protein-binding sites. Obtaining this information without direct experimentation is challenging; minor alterations in binding sites can have profound effects on gene expression, and underlie important aspects of disease and evolution. Quantitative modeling offers an alternative path to develop a global understanding of the transcriptional regulatory code. Recent studies have focused on endogenous regulatory sequences; however, distinct enhancers differ in many features, making it difficult to generalize to other cis-regulatory elements. We applied a systematic approach to simpler elements and present here the first quantitative analysis of short-range transcriptional repressors, which have central functions in metazoan development. Our fractional occupancy-based modeling uncovered unexpected features of these proteins' activity that allow accurate predictions of regulation by the Giant, Knirps, Krüppel, and Snail repressors, including modeling of an endogenous enhancer. This study provides essential elements of a transcriptional regulatory code that will allow extensive analysis of genomic information in Drosophila melanogaster and related organisms.
Collapse
Affiliation(s)
- Walid D Fakhouri
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824-1319, USA
| | | | | | | | | | | |
Collapse
|
38
|
He X, Sinha S. Evolution of cis-regulatory sequences in Drosophila. Methods Mol Biol 2010; 674:283-296. [PMID: 20827599 DOI: 10.1007/978-1-60761-854-6_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Cross-species comparison is an emerging paradigm for identifying cis-regulatory sequences and understanding their function and evolution. In this chapter, we review probabilistic models of evolution of transcription factor binding sites, which provide the theoretical basis for a number of new bioinformatics tools for comparative sequence analysis. We illustrate how important functional and evolutionary insights on binding site gain and loss can be acquired through sequence comparison. This includes the observation that binding site turnover follows a molecular clock and that its rate correlates with the strength of binding sites and the presence of other sites in the neighborhood. We also comment on emerging trends that go beyond individual binding sites to a more holistic study of regulatory evolution. We point out common technical challenges, such as reliable sequence alignment and binding site prediction, when doing comparative regulatory sequence analysis and note some potential solutions thereof.
Collapse
Affiliation(s)
- Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| | | |
Collapse
|
39
|
Papatsenko D. Stripe formation in the early fly embryo: principles, models, and networks. Bioessays 2009; 31:1172-80. [DOI: 10.1002/bies.200900096] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
40
|
Groucho corepressor functions as a cofactor for the Knirps short-range transcriptional repressor. Proc Natl Acad Sci U S A 2009; 106:17314-9. [PMID: 19805071 DOI: 10.1073/pnas.0904507106] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite the pervasive roles for repressors in transcriptional control, the range of action of these proteins on cis regulatory elements remains poorly understood. Knirps has essential roles in patterning the Drosophila embryo by means of short-range repression, an activity that is essential for proper regulation of complex transcriptional control elements. Short-range repressors function in a local fashion to interfere with the activity of activators or basal promoters within approximately 100 bp. In contrast, long-range repressors such as Hairy act over distances >1 kb. The functional distinction between these two classes of repressors has been suggested to stem from the differential recruitment of the CtBP corepressor to short-range repressors and Groucho to long-range repressors. Contrary to this differential recruitment model, we report that Groucho is a functional part of the Knirps short-range repression complex. The corepressor interaction is mediated via an eh-1 like motif present in the N terminus and a conserved region present in the central portion of Knirps. We also show that this interaction is important for the CtBP-independent repression activity of Knirps and is required for regulation of even-skipped. Our study uncovers a previously uncharacterized interaction between proteins previously thought to function in distinct repression pathways, and indicates that the Groucho corepressor can be differentially harnessed to execute short- and long-range repression.
Collapse
|
41
|
Jeziorska DM, Jordan KW, Vance KW. A systems biology approach to understanding cis-regulatory module function. Semin Cell Dev Biol 2009; 20:856-62. [PMID: 19660565 DOI: 10.1016/j.semcdb.2009.07.007] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Accepted: 07/29/2009] [Indexed: 12/27/2022]
Abstract
The genomic instructions used to regulate development are encoded within a set of functional DNA elements called cis-regulatory modules (CRMs). These elements determine the precise patterns of temporal and spatial gene expression. Here we summarize recent progress made towards cataloguing and characterizing the complete repertoire of CRMs. We describe CRMs as genomic information processing devices containing clusters of transcription factor binding sites and we position CRMs as nodes within large gene regulatory networks. We define CRM architecture and describe how these genomic elements process the information they encode to their target genes. Furthermore, we present an overview describing high-throughput techniques to identify CRMs genome wide and experimental methodologies to validate their function on a large scale. This review emphasizes the advantages and power of a systems biology approach which integrates computational and experimental technologies to further our understanding of CRM function.
Collapse
Affiliation(s)
- Danuta M Jeziorska
- Departments of Systems Biology and Biological Sciences, University of Warwick, Biomedical Research Institute, Gibbet Hill, Coventry CV4 7AL, UK
| | | | | |
Collapse
|
42
|
Kim J, He X, Sinha S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet 2009; 5:e1000330. [PMID: 19132088 PMCID: PMC2607023 DOI: 10.1371/journal.pgen.1000330] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Accepted: 12/05/2008] [Indexed: 01/07/2023] Open
Abstract
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor–specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection. The spatial–temporal expression pattern of a gene, which is crucial to its function, is controlled by cis-regulatory DNA sequences. Forming the basic units of regulatory sequences are transcription factor binding sites, often organized into larger modules that determine gene expression in response to combinatorial environmental signals. Understanding the conservation and change of regulatory sequences is critical to our knowledge of the unity as well as diversity of animal development and phenotypes. In this paper, we study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo. We find that mutations of nucleotides within a binding site are constrained by evolutionary forces to preserve the site's binding affinity to the cognate transcription factor. Functional binding sites are frequently destroyed during evolution and the rate of loss across evolutionary spans is roughly constant. We also find that the evolutionary fate of a site strongly depends on its context; a pair of interacting sites are more likely to survive mutational forces than isolated sites. Together, these findings provide new insights and pose new challenges to our understanding of cis-regulatory sequences and their evolution.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
43
|
Polishchuk MS, Heinzel A, Favorov AV, Makeev YV. The binding sites of the proteins regulating transcription in the early development of Drosophila melanogaster: A comparative analysis of ChIP-chip data and theoretically predicted clusters. Biophysics (Nagoya-shi) 2008. [DOI: 10.1134/s0006350908050059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
44
|
Shen L, Liu J, Wang W. GBNet: deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach. BMC Bioinformatics 2008; 9:395. [PMID: 18811979 PMCID: PMC2571992 DOI: 10.1186/1471-2105-9-395] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 09/24/2008] [Indexed: 12/19/2022] Open
Abstract
Background Combinatorial regulation of transcription factors (TFs) is important in determining the complex gene expression patterns particularly in higher organisms. Deciphering regulatory rules between cooperative TFs is a critical step towards understanding the mechanisms of combinatorial regulation. Results We present here a Bayesian network approach called GBNet to search for DNA motifs that may be cooperative in transcriptional regulation and the sequence constraints that these motifs may satisfy. We showed that GBNet outperformed the other available methods in the simulated and the yeast data. We also demonstrated the usefulness of GBNet on learning regulatory rules between YY1, a human TF, and its co-factors. Most of the rules learned by GBNet on YY1 and co-factors were supported by literature. In addition, a spacing constraint between YY1 and E2F was also supported by independent TF binding experiments. Conclusion We thus conclude that GBNet is a useful tool for deciphering the "grammar" of transcriptional regulation.
Collapse
Affiliation(s)
- Li Shen
- Department of Chemistry and Biochemistry, University of California, San Diego, California, USA.
| | | | | |
Collapse
|
45
|
A Simple Model of the Modular Structure of Transcriptional Regulation in Yeast. J Comput Biol 2008; 15:393-405. [DOI: 10.1089/cmb.2008.0020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
|
46
|
Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 2008; 7:29-59. [PMID: 16719718 DOI: 10.1146/annurev.genom.7.080505.115623] [Citation(s) in RCA: 546] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The faithful execution of biological processes requires a precise and carefully orchestrated set of steps that depend on the proper spatial and temporal expression of genes. Here we review the various classes of transcriptional regulatory elements (core promoters, proximal promoters, distal enhancers, silencers, insulators/boundary elements, and locus control regions) and the molecular machinery (general transcription factors, activators, and coactivators) that interacts with the regulatory elements to mediate precisely controlled patterns of gene expression. The biological importance of transcriptional regulation is highlighted by examples of how alterations in these transcriptional components can lead to disease. Finally, we discuss the methods currently used to identify transcriptional regulatory elements, and the ability of these methods to be scaled up for the purpose of annotating the entire human genome.
Collapse
Affiliation(s)
- Glenn A Maston
- Howard Hughes Medical Institute, Programs in Gene Function and Expression and Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
| | | | | |
Collapse
|
47
|
Abstract
Transcriptional repressor proteins play key roles in the control of gene expression in development. For the Drosophila embryo, the following two functional classes of repressors have been described: short-range repressors such as Knirps that locally inhibit the activity of enhancers and long-range repressors such as Hairy that can dominantly inhibit distal elements. Several long-range repressors interact with Groucho, a conserved corepressor that is homologous to mammalian TLE proteins. Groucho interacts with histone deacetylases and histone proteins, suggesting that it may effect repression by means of chromatin modification; however, it is not known how long-range effects are mediated. Using embryo chromatin immunoprecipitation, we have analyzed a Hairy-repressible gene in the embryo during activation and repression. When inactivated, repressors, activators, and coactivators cooccupy the promoter, suggesting that repression is not accomplished by the displacement of activators or coactivators. Strikingly, the Groucho corepressor is found to be recruited to the transcribed region of the gene, contacting a region of several kilobases, concomitant with a loss of histone H3 and H4 acetylation. Groucho has been shown to form higher-order complexes in vitro; thus, our observations suggest that long-range effects may be mediated by a "spreading" mechanism, modifying chromatin over extensive regions to inhibit transcription.
Collapse
|
48
|
Kohn MH, Shapiro J, Wu CI. Decoupled differentiation of gene expression and coding sequence among Drosophila populations. Genes Genet Syst 2008; 83:265-73. [DOI: 10.1266/ggs.83.265] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Michael H. Kohn
- Department of Ecology & Evolutionary Biology, Rice University
| | - Joshua Shapiro
- Lewis-Sigler Institute for Integrative Genomics & Department of Ecology and Evolutionary Biology, Princeton University
| | - Chung-I Wu
- Department of Ecology and Evolution, University of Chicago
| |
Collapse
|
49
|
Zartman JJ, Shvartsman SY. Enhancer Organization: Transistor with a Twist or Something in a Different Vein? Curr Biol 2007; 17:R1048-50. [DOI: 10.1016/j.cub.2007.10.036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
50
|
Reddy TE, DeLisi C, Shakhnovich BE. Binding site graphs: a new graph theoretical framework for prediction of transcription factor binding sites. PLoS Comput Biol 2007; 3:e90. [PMID: 17500587 PMCID: PMC1866359 DOI: 10.1371/journal.pcbi.0030090] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2006] [Accepted: 04/09/2007] [Indexed: 11/25/2022] Open
Abstract
Computational prediction of nucleotide binding specificity for transcription factors remains a fundamental and largely unsolved problem. Determination of binding positions is a prerequisite for research in gene regulation, a major mechanism controlling phenotypic diversity. Furthermore, an accurate determination of binding specificities from high-throughput data sources is necessary to realize the full potential of systems biology. Unfortunately, recently performed independent evaluation showed that more than half the predictions from most widely used algorithms are false. We introduce a graph-theoretical framework to describe local sequence similarity as the pair-wise distances between nucleotides in promoter sequences, and hypothesize that densely connected subgraphs are indicative of transcription factor binding sites. Using a well-established sampling algorithm coupled with simple clustering and scoring schemes, we identify sets of closely related nucleotides and test those for known TF binding activity. Using an independent benchmark, we find our algorithm predicts yeast binding motifs considerably better than currently available techniques and without manual curation. Importantly, we reduce the number of false positive predictions in yeast to less than 30%. We also develop a framework to evaluate the statistical significance of our motif predictions. We show that our approach is robust to the choice of input promoters, and thus can be used in the context of predicting binding positions from noisy experimental data. We apply our method to identify binding sites using data from genome scale ChIP–chip experiments. Results from these experiments are publicly available at http://cagt10.bu.edu/BSG. The graphical framework developed here may be useful when combining predictions from numerous computational and experimental measures. Finally, we discuss how our algorithm can be used to improve the sensitivity of computational predictions of transcription factor binding specificities. A historically difficult problem in computational biology is the identification of transcription factor binding sites (TFBS) in the promoters of co-regulated genes. With increasing emphasis on research in transcriptional regulation, this problem is also uniquely relevant to emerging results from recent experiments in high-throughput and systems biology. Despite extensive research in the area, recent evaluations of previously published techniques show much room for improvement. In this paper, we introduce a fundamentally new approach to the identification of TFBS. First, we start by representing nucleotides in promoters as an undirected, weighted graph. Given this representation of a binding site graph (BSG), we employ relatively simple graph clustering techniques to identify functional TFBS. We show that BSG predictions significantly outperform all previously evaluated methods in nearly every performance measure using a standardized assessment benchmark. We also find that this approach is more robust than traditional Gibbs sampling to selection of input promoters, and thus more likely to perform well under noisy experimental conditions. Finally, BSGs are very good at predicting specificity determining nucleotides. Using BSG predictions, we were able to confirm recent experimental results on binding specificity of E-box TFs CBF1 and PHO4 and predict novel specificity determining nucleotides for TYE7.
Collapse
Affiliation(s)
- Timothy E Reddy
- Program in Bioinformatics and Systems Biology, Boston University, Boston, Massachusetts, United States of America
| | - Charles DeLisi
- Program in Bioinformatics and Systems Biology, Boston University, Boston, Massachusetts, United States of America
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Boris E Shakhnovich
- Program in Bioinformatics and Systems Biology, Boston University, Boston, Massachusetts, United States of America
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|