1
|
Abstract
A common path to the formation of complex 3D structures starts with an epithelial sheet that is patterned by inductive cues that control the spatiotemporal activities of transcription factors. These activities are then interpreted by the cis-regulatory regions of the genes involved in cell differentiation and tissue morphogenesis. Although this general strategy has been documented in multiple developmental contexts, the range of experimental models in which each of the steps can be examined in detail and evaluated in its effect on the final structure remains very limited. Studies of the Drosophila eggshell patterning provide unique insights into the multiscale mechanisms that connect gene regulation and 3D epithelial morphogenesis. Here we review the current understanding of this system, emphasizing how the recent identification of cis-regulatory regions of genes within the eggshell patterning network enables mechanistic analysis of its spatiotemporal dynamics and evolutionary diversification. It appears that cis-regulatory changes can account for only some aspects of the morphological diversity of Drosophila eggshells, such as the prominent differences in the number of the respiratory dorsal appendages. Other changes, such as the appearance of the respiratory eggshell ridges, are caused by changes in the spatial distribution of inductive signals. Both types of mechanisms are at play in this rapidly evolving system, which provides an excellent model of developmental patterning and morphogenesis.
Collapse
|
2
|
Gursky VV, Kozlov KN, Kulakovskiy IV, Zubair A, Marjoram P, Lawrie DS, Nuzhdin SV, Samsonova MG. Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network. PLoS One 2017; 12:e0184657. [PMID: 28898266 PMCID: PMC5595321 DOI: 10.1371/journal.pone.0184657] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 08/28/2017] [Indexed: 11/18/2022] Open
Abstract
Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects.
Collapse
Affiliation(s)
- Vitaly V. Gursky
- Theoretical Department, Ioffe Institute, Saint Petersburg, Russia
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
- * E-mail:
| | - Konstantin N. Kozlov
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Ivan V. Kulakovskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Asif Zubair
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Paul Marjoram
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - David S. Lawrie
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Sergey V. Nuzhdin
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Maria G. Samsonova
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
| |
Collapse
|
3
|
Barr KA, Reinitz J. A sequence level model of an intact locus predicts the location and function of nonadditive enhancers. PLoS One 2017; 12:e0180861. [PMID: 28715438 PMCID: PMC5513433 DOI: 10.1371/journal.pone.0180861] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Accepted: 06/22/2017] [Indexed: 01/24/2023] Open
Abstract
Metazoan gene expression is controlled through the action of long stretches of noncoding DNA that contain enhancers-shorter sequences responsible for controlling a single aspect of a gene's expression pattern. Models built on thermodynamics have shown how enhancers interpret protein concentration in order to determine specific levels of gene expression, but the emergent regulatory logic of a complete regulatory locus shows qualitative and quantitative differences from isolated enhancers. Such differences may arise from steric competition limiting the quantity of DNA that can simultaneously influence the transcription machinery. We incorporated this competition into a mechanistic model of gene regulation, generated efficient algorithms for this computation, and applied it to the regulation of Drosophila even-skipped (eve). This model finds the location of enhancers and identifies which factors control the boundaries of eve expression. This model predicts a new enhancer that, when assayed in vivo, drives expression in a non-eve pattern. Incorporation of chromatin accessibility eliminates this inconsistency.
Collapse
Affiliation(s)
- Kenneth A. Barr
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - John Reinitz
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, Illinois, United States of America
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
4
|
Chertkova AA, Schiffman JS, Nuzhdin SV, Kozlov KN, Samsonova MG, Gursky VV. In silico evolution of the Drosophila gap gene regulatory sequence under elevated mutational pressure. BMC Evol Biol 2017; 17:4. [PMID: 28251865 PMCID: PMC5333172 DOI: 10.1186/s12862-016-0866-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cis-regulatory sequences are often composed of many low-affinity transcription factor binding sites (TFBSs). Determining the evolutionary and functional importance of regulatory sequence composition is impeded without a detailed knowledge of the genotype-phenotype map. RESULTS We simulate the evolution of regulatory sequences involved in Drosophila melanogaster embryo segmentation during early development. Natural selection evaluates gene expression dynamics produced by a computational model of the developmental network. We observe a dramatic decrease in the total number of transcription factor binding sites through the course of evolution. Despite a decrease in average sequence binding energies through time, the regulatory sequences tend towards organisations containing increased high affinity transcription factor binding sites. Additionally, the binding energies of separate sequence segments demonstrate ubiquitous mutual correlations through time. Fewer than 10% of initial TFBSs are maintained throughout the entire simulation, deemed 'core' sites. These sites have increased functional importance as assessed under wild-type conditions and their binding energy distributions are highly conserved. Furthermore, TFBSs within close proximity of core sites exhibit increased longevity, reflecting functional regulatory interactions with core sites. CONCLUSION In response to elevated mutational pressure, evolution tends to sample regulatory sequence organisations with fewer, albeit on average, stronger functional transcription factor binding sites. These organisations are also shaped by the regulatory interactions among core binding sites with sites in their local vicinity.
Collapse
Affiliation(s)
- Aleksandra A. Chertkova
- Systems Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg, 195251 Russia
| | - Joshua S. Schiffman
- Molecular and Computational Biology, University of Southern California, Los Angeles, 90089 CA USA
| | - Sergey V. Nuzhdin
- Systems Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg, 195251 Russia
- Molecular and Computational Biology, University of Southern California, Los Angeles, 90089 CA USA
| | - Konstantin N. Kozlov
- Systems Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg, 195251 Russia
| | - Maria G. Samsonova
- Systems Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg, 195251 Russia
| | - Vitaly V. Gursky
- Systems Biology and Bioinformatics Laboratory, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya, 29, St. Petersburg, 195251 Russia
- Theoretical Department, Ioffe Institute, Polytechnicheskaya, 26, St. Petersburg, 194021 Russia
| |
Collapse
|
5
|
Peng PC, Sinha S. Quantitative modeling of gene expression using DNA shape features of binding sites. Nucleic Acids Res 2016; 44:e120. [PMID: 27257066 PMCID: PMC5291265 DOI: 10.1093/nar/gkw446] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 05/06/2016] [Accepted: 05/09/2016] [Indexed: 12/11/2022] Open
Abstract
Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
6
|
Vincent BJ, Estrada J, DePace AH. The appeasement of Doug: a synthetic approach to enhancer biology. Integr Biol (Camb) 2016; 8:475-84. [DOI: 10.1039/c5ib00321k] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Ben J. Vincent
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
| | - Javier Estrada
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
| | - Angela H. DePace
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
7
|
Peng PC, Hassan Samee MA, Sinha S. Incorporating chromatin accessibility data into sequence-to-expression modeling. Biophys J 2016; 108:1257-67. [PMID: 25762337 DOI: 10.1016/j.bpj.2014.12.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Revised: 12/01/2014] [Accepted: 12/11/2014] [Indexed: 01/30/2023] Open
Abstract
Prediction of gene expression levels from regulatory sequences is one of the major challenges of genomic biology today. A particularly promising approach to this problem is that taken by thermodynamics-based models that interpret an enhancer sequence in a given cellular context specified by transcription factor concentration levels and predict precise expression levels driven by that enhancer. Such models have so far not accounted for the effect of chromatin accessibility on interactions between transcription factor and DNA and consequently on gene-expression levels. Here, we extend a thermodynamics-based model of gene expression, called GEMSTAT (Gene Expression Modeling Based on Statistical Thermodynamics), to incorporate chromatin accessibility data and quantify its effect on accuracy of expression prediction. In the new model, called GEMSTAT-A, accessibility at a binding site is assumed to affect the transcription factor's binding strength at the site, whereas all other aspects are identical to the GEMSTAT model. We show that this modification results in significantly better fits in a data set of over 30 enhancers regulating spatial expression patterns in the blastoderm-stage Drosophila embryo. It is important to note that the improved fits result not from an overall elevated accessibility in active enhancers but from the variation of accessibility levels within an enhancer. With whole-genome DNA accessibility measurements becoming increasingly popular, our work demonstrates how such data may be useful for sequence-to-expression models. It also calls for future advances in modeling accessibility levels from sequence and the transregulatory context, so as to predict accurately the effect of cis and trans perturbations on gene expression.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Md Abul Hassan Samee
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois; Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.
| |
Collapse
|
8
|
Demidov GM, Samsonova MG, Gursky VV. A stochastic model of the formation of the molecular configuration of an enhancer site. Biophysics (Nagoya-shi) 2016. [DOI: 10.1134/s0006350916010073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
9
|
Kozlov K, Gursky VV, Kulakovskiy IV, Dymova A, Samsonova M. Analysis of functional importance of binding sites in the Drosophila gap gene network model. BMC Genomics 2015; 16 Suppl 13:S7. [PMID: 26694511 PMCID: PMC4686791 DOI: 10.1186/1471-2164-16-s13-s7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The statistical thermodynamics based approach provides a promising framework for construction of the genotype-phenotype map in many biological systems. Among important aspects of a good model connecting the DNA sequence information with that of a molecular phenotype (gene expression) is the selection of regulatory interactions and relevant transcription factor bindings sites. As the model may predict different levels of the functional importance of specific binding sites in different genomic and regulatory contexts, it is essential to formulate and study such models under different modeling assumptions. RESULTS We elaborate a two-layer model for the Drosophila gap gene network and include in the model a combined set of transcription factor binding sites and concentration dependent regulatory interaction between gap genes hunchback and Kruppel. We show that the new variants of the model are more consistent in terms of gene expression predictions for various genetic constructs in comparison to previous work. We quantify the functional importance of binding sites by calculating their impact on gene expression in the model and calculate how these impacts correlate across all sites under different modeling assumptions. CONCLUSIONS The assumption about the dual interaction between hb and Kr leads to the most consistent modeling results, but, on the other hand, may obscure existence of indirect interactions between binding sites in regulatory regions of distinct genes. The analysis confirms the previously formulated regulation concept of many weak binding sites working in concert. The model predicts a more or less uniform distribution of functionally important binding sites over the sets of experimentally characterized regulatory modules and other open chromatin domains.
Collapse
Affiliation(s)
- Konstantin Kozlov
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| | - Vitaly V Gursky
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
- Ioffe Institute, 26 Polytechnicheskaya, 194021 St.Petersburg, Russia
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, 32 Vavilova, 119991 Moscow, Russia
| | - Arina Dymova
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| | - Maria Samsonova
- Peter the Great St. Petersburg Polytechnic University, 29 Polytechnicheskaya, 195251 St.Petersburg, Russia
| |
Collapse
|
10
|
Duque T, Sinha S. What does it take to evolve an enhancer? A simulation-based study of factors influencing the emergence of combinatorial regulation. Genome Biol Evol 2015; 7:1415-31. [PMID: 25956793 PMCID: PMC4494070 DOI: 10.1093/gbe/evv080] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
There is widespread interest today in understanding enhancers, which are regulatory elements typically harboring several transcription factor binding sites and mediating the combinatorial effect of transcription factors on gene expression. The evolution of enhancers poses interesting unanswered questions, for example, the evolutionary time taken for a typical enhancer to emerge or the factors shaping its evolution. Existing approaches to cis-regulatory evolution have often ignored the combinatorial nature and varied biochemical mechanisms of gene regulation encoded in enhancers. We report on our investigation of enhancer evolution through the use of PEBCRES, a framework for evolutionary simulation of enhancers that employs a mechanistic and well-supported sequence-to-expression model to assign fitness to the evolving enhancer genotype. We estimated the time necessary to evolve, from genomic background, enhancers capable of driving complex gene expression patterns similar to those involved in early development in Drosophila. We found the time-to-evolve to range between 0.5 and 10 Myr, and to vary greatly with the target expression pattern, complexity of the real enhancer known to encode that pattern, and the strength of input from specific transcription factors. To our knowledge, this is the first estimate of waiting times for realistic enhancers to evolve. The in silico evolved enhancers had, with a few interesting exceptions, site compositions similar to those seen in real enhancers for the same patterns. Our simulations also revealed that certain features of an enhancer might evolve not due to their biological function but as aids to the evolutionary process itself.
Collapse
Affiliation(s)
- Thyago Duque
- Department of Computer Science, University of Illinois at Urbana-Champaign
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign Institute for Genomic Biology, University of Illinois at Urbana-Champaign
| |
Collapse
|
11
|
Staller MV, Fowlkes CC, Bragdon MDJ, Wunderlich Z, Estrada J, DePace AH. A gene expression atlas of a bicoid-depleted Drosophila embryo reveals early canalization of cell fate. Development 2015; 142:587-96. [PMID: 25605785 PMCID: PMC4302997 DOI: 10.1242/dev.117796] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 12/01/2014] [Indexed: 01/31/2023]
Abstract
In developing embryos, gene regulatory networks drive cells towards discrete terminal fates, a process called canalization. We studied the behavior of the anterior-posterior segmentation network in Drosophila melanogaster embryos by depleting a key maternal input, bicoid (bcd), and measuring gene expression patterns of the network at cellular resolution. This method results in a gene expression atlas containing the levels of mRNA or protein expression of 13 core patterning genes over six time points for every cell of the blastoderm embryo. This is the first cellular resolution dataset of a genetically perturbed Drosophila embryo that captures all cells in 3D. We describe the technical developments required to build this atlas and how the method can be employed and extended by others. We also analyze this novel dataset to characterize the degree and timing of cell fate canalization in the segmentation network. We find that in two layers of this gene regulatory network, following depletion of bcd, individual cells rapidly canalize towards normal cell fates. This result supports the hypothesis that the segmentation network directly canalizes cell fate, rather than an alternative hypothesis whereby cells are initially mis-specified and later eliminated by apoptosis. Our gene expression atlas provides a high resolution picture of a classic perturbation and will enable further computational modeling of canalization and gene regulation in this transcriptional network.
Collapse
Affiliation(s)
- Max V Staller
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Charless C Fowlkes
- Department of Computer Science, University of California Irvine, Irvine, CA 92697, USA
| | - Meghan D J Bragdon
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Zeba Wunderlich
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Javier Estrada
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Angela H DePace
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
12
|
Abstract
BACKGROUND The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. RESULTS We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. CONCLUSIONS The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are important for the model output; 3) functional important sites are not exclusively located in cis-regulatory elements, but are rather dispersed through regulatory region. It is of importance that some of the sites with high functional impact in hb, Kr and kni regulatory regions coincide with strong sites annotated and verified in Dnase I footprint assays.
Collapse
Affiliation(s)
- Konstantin Kozlov
- St.Petersburg State Polytechnical University, Polytekhnicheskaya 29, 195251 St.Petersburg, Russia
| | - Vitaly Gursky
- Ioffe Physical-Technical Institute, RAS, Polytekhnicheskaya 26, 194021 St.Petersburg, Russia
| | - Ivan Kulakovskiy
- Engelhardt Institute of Molecular Biology, RAS, Vavilov 32, 119991 Moscow, Russia
| | - Maria Samsonova
- St.Petersburg State Polytechnical University, Polytekhnicheskaya 29, 195251 St.Petersburg, Russia
| |
Collapse
|
13
|
Hybrid incompatibility arises in a sequence-based bioenergetic model of transcription factor binding. Genetics 2014; 198:1155-66. [PMID: 25173845 DOI: 10.1534/genetics.114.168112] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Postzygotic isolation between incipient species results from the accumulation of incompatibilities that arise as a consequence of genetic divergence. When phenotypes are determined by regulatory interactions, hybrid incompatibility can evolve even as a consequence of parallel adaptation in parental populations because interacting genes can produce the same phenotype through incompatible allelic combinations. We explore the evolutionary conditions that promote and constrain hybrid incompatibility in regulatory networks using a bioenergetic model (combining thermodynamics and kinetics) of transcriptional regulation, considering the bioenergetic basis of molecular interactions between transcription factors (TFs) and their binding sites. The bioenergetic parameters consider the free energy of formation of the bond between the TF and its binding site and the availability of TFs in the intracellular environment. Together these determine fractional occupancy of the TF on the promoter site, the degree of subsequent gene expression and in diploids, and the degree of dominance among allelic interactions. This results in a sigmoid genotype-phenotype map and fitness landscape, with the details of the shape determining the degree of bioenergetic evolutionary constraint on hybrid incompatibility. Using individual-based simulations, we subjected two allopatric populations to parallel directional or stabilizing selection. Misregulation of hybrid gene expression occurred under either type of selection, although it evolved faster under directional selection. Under directional selection, the extent of hybrid incompatibility increased with the slope of the genotype-phenotype map near the derived parental expression level. Under stabilizing selection, hybrid incompatibility arose from compensatory mutations and was greater when the bioenergetic properties of the interaction caused the space of nearly neutral genotypes around the stable expression level to be wide. F2's showed higher hybrid incompatibility than F1's to the extent that the bioenergetic properties favored dominant regulatory interactions. The present model is a mechanistically explicit case of the Bateson-Dobzhansky-Muller model, connecting environmental selective pressure to hybrid incompatibility through the molecular mechanism of regulatory divergence. The bioenergetic parameters that determine expression represent measurable properties of transcriptional regulation, providing a predictive framework for empirical studies of how phenotypic evolution results in epistatic incompatibility at the molecular level in hybrids.
Collapse
|
14
|
Samee MAH, Sinha S. Quantitative modeling of a gene's expression from its intergenic sequence. PLoS Comput Biol 2014; 10:e1003467. [PMID: 24604095 PMCID: PMC3945089 DOI: 10.1371/journal.pcbi.1003467] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Accepted: 12/18/2013] [Indexed: 11/18/2022] Open
Abstract
Modeling a gene's expression from its intergenic locus and trans-regulatory context is a fundamental goal in computational biology. Owing to the distributed nature of cis-regulatory information and the poorly understood mechanisms that integrate such information, gene locus modeling is a more challenging task than modeling individual enhancers. Here we report the first quantitative model of a gene's expression pattern as a function of its locus. We model the expression readout of a locus in two tiers: 1) combinatorial regulation by transcription factors bound to each enhancer is predicted by a thermodynamics-based model and 2) independent contributions from multiple enhancers are linearly combined to fit the gene expression pattern. The model does not require any prior knowledge about enhancers contributing toward a gene's expression. We demonstrate that the model captures the complex multi-domain expression patterns of anterior-posterior patterning genes in the early Drosophila embryo. Altogether, we model the expression patterns of 27 genes; these include several gap genes, pair-rule genes, and anterior, posterior, trunk, and terminal genes. We find that the model-selected enhancers for each gene overlap strongly with its experimentally characterized enhancers. Our findings also suggest the presence of sequence-segments in the locus that would contribute ectopic expression patterns and hence were "shut down" by the model. We applied our model to identify the transcription factors responsible for forming the stripe boundaries of the studied genes. The resulting network of regulatory interactions exhibits a high level of agreement with known regulatory influences on the target genes. Finally, we analyzed whether and why our assumption of enhancer independence was necessary for the genes we studied. We found a deterioration of expression when binding sites in one enhancer were allowed to influence the readout of another enhancer. Thus, interference between enhancer activities was a possible factor necessitating enhancer independence in our model.
Collapse
Affiliation(s)
- Md. Abul Hassan Samee
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (MAHS); (SS)
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (MAHS); (SS)
| |
Collapse
|
15
|
Martinez C, Rest JS, Kim AR, Ludwig M, Kreitman M, White K, Reinitz J. Ancestral resurrection of the Drosophila S2E enhancer reveals accessible evolutionary paths through compensatory change. Mol Biol Evol 2014; 31:903-16. [PMID: 24408913 DOI: 10.1093/molbev/msu042] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Upstream regulatory sequences that control gene expression evolve rapidly, yet the expression patterns and functions of most genes are typically conserved. To address this paradox, we have reconstructed computationally and resurrected in vivo the cis-regulatory regions of the ancestral Drosophila eve stripe 2 element and evaluated its evolution using a mathematical model of promoter function. Our feed-forward transcriptional model predicts gene expression patterns directly from enhancer sequence. We used this functional model along with phylogenetics to generate a set of possible ancestral eve stripe 2 sequences for the common ancestors of 1) D. simulans and D. sechellia; 2) D. melanogaster, D. simulans, and D. sechellia; and 3) D. erecta and D. yakuba. These ancestral sequences were synthesized and resurrected in vivo. Using a combination of quantitative and computational analysis, we find clear support for functional compensation between the binding sites for Bicoid, Giant, and Krüppel over the course of 40-60 My of Drosophila evolution. We show that this compensation is driven by a coupling interaction between Bicoid activation and repression at the anterior and posterior border necessary for proper placement of the anterior stripe 2 border. A multiplicity of mechanisms for binding site turnover exemplified by Bicoid, Giant, and Krüppel sites, explains how rapid sequence change may occur while maintaining the function of the cis-regulatory element.
Collapse
Affiliation(s)
- Carlos Martinez
- Institute for Genomics and Systems Biology, University of Chicago
| | | | | | | | | | | | | |
Collapse
|
16
|
Duque T, Samee MAH, Kazemian M, Pham HN, Brodsky MH, Sinha S. Simulations of enhancer evolution provide mechanistic insights into gene regulation. Mol Biol Evol 2013; 31:184-200. [PMID: 24097306 PMCID: PMC3879441 DOI: 10.1093/molbev/mst170] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
There is growing interest in models of regulatory sequence evolution. However, existing models specifically designed for regulatory sequences consider the independent evolution of individual transcription factor (TF)-binding sites, ignoring that the function and evolution of a binding site depends on its context, typically the cis-regulatory module (CRM) in which the site is located. Moreover, existing models do not account for the gene-specific roles of TF-binding sites, primarily because their roles often are not well understood. We introduce two models of regulatory sequence evolution that address some of the shortcomings of existing models and implement simulation frameworks based on them. One model simulates the evolution of an individual binding site in the context of a CRM, while the other evolves an entire CRM. Both models use a state-of-the art sequence-to-expression model to predict the effects of mutations on the regulatory output of the CRM and determine the strength of selection. We use the new framework to simulate the evolution of TF-binding sites in 37 well-studied CRMs belonging to the anterior-posterior patterning system in Drosophila embryos. We show that these simulations provide accurate fits to evolutionary data from 12 Drosophila genomes, which includes statistics of binding site conservation on relatively short evolutionary scales and site loss across larger divergence times. The new framework allows us, for the first time, to test hypotheses regarding the underlying cis-regulatory code by directly comparing the evolutionary implications of the hypothesis with the observed evolutionary dynamics of binding sites. Using this capability, we find that explicitly modeling self-cooperative DNA binding by the TF Caudal (CAD) provides significantly better fits than an otherwise identical evolutionary simulation that lacks this mechanistic aspect. This hypothesis is further supported by a statistical analysis of the distribution of intersite spacing between adjacent CAD sites. Experimental tests confirm direct homodimeric interaction between CAD molecules as well as self-cooperative DNA binding by CAD. We note that computational modeling of the D. melanogaster CRMs alone did not yield significant evidence to support CAD self-cooperativity. We thus demonstrate how specific mechanistic details encoded in CRMs can be revealed by modeling their evolution and fitting such models to multispecies data.
Collapse
Affiliation(s)
- Thyago Duque
- Department of Computer Science, University of Illinois at Urbana-Champaign
| | | | | | | | | | | |
Collapse
|