1
|
Kin K, Bhogale S, Zhu L, Thomas D, Bertol J, Zheng WJ, Sinha S, Fakhouri WD. Sequence-to-expression approach to identify etiological non-coding DNA variations in P53 and cMYC-driven diseases. RESEARCH SQUARE 2023:rs.3.rs-3037310. [PMID: 37503250 PMCID: PMC10371153 DOI: 10.21203/rs.3.rs-3037310/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background and methods Disease risk prediction based on DNA sequence and transcriptional profile can improve disease screening, prevention, and potential therapeutic approaches by revealing contributing genetic factors and altered regulatory networks. Despite identifying many disease-associated DNA variants through genome-wide association studies, distinguishing deleterious non-coding DNA variations remains poor for most common diseases. We previously reported that non-coding variations disrupting cis-overlapping motifs (CisOMs) of opposing transcription factors significantly affect enhancer activity. We designed in vitro experiments to uncover the significance of the co-occupancy and competitive binding and inhibition between P53 and cMYC on common target gene expression. Results Analyzing publicly available ChIP-seq data for P53 and cMYC in human embryonic stem cells and mouse embryonic cells showed that ~ 344-366 genomic regions are co-occupied by P53 and cMYC. We identified, on average, two CisOMs per region, suggesting that co-occupancy is evolutionarily conserved in vertebrates. Our data showed that treating U2OS cells with doxorubicin increased P53 protein level while reducing cMYC level. In contrast, no change in protein levels was observed in Raji cells. ChIP-seq analysis illustrated that 16-922 genomic regions were co-occupied by P53 and cMYC before and after treatment, and substitutions of cMYC signals by P53 were detected after doxorubicin treatment in U2OS. Around 187 expressed genes near co-occupied regions were altered at mRNA level according to RNA-seq data. We utilized a computational motif-matching approach to determine that changes in predicted P53 binding affinity by DNA variations in CisOMs of co-occupied elements significantly correlate with alterations in reporter gene expression. We performed a similar analysis using SNPs mapped in CisOMs for P53 and cMYC from ChIP-seq data in U2OS and Raji, and expression of target genes from the GTEx portal. Conclusions We found a significant correlation between change in motif-predicted cMYC binding affinity by SNPs in CisOMs and altered gene expression. Our study brings us closer to developing a generally applicable approach to filter etiological non-coding variations associated with P53 and cMYC-dependent diseases.
Collapse
Affiliation(s)
- Katherine Kin
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| | | | - Lisha Zhu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston
| | - Derrick Thomas
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| | - Jessica Bertol
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston
| | - Saurabh Sinha
- The Wallace H. Coulter Department of Biomedical Engineering
| | - Walid D Fakhouri
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| |
Collapse
|
2
|
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022; 54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]
Abstract
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
Collapse
|
3
|
Bodzęta A, Berger F, MacGillavry HD. Subsynaptic mobility of presynaptic mGluR types is differentially regulated by intra- and extracellular interactions. Mol Biol Cell 2022; 33:ar66. [PMID: 35511883 PMCID: PMC9635276 DOI: 10.1091/mbc.e21-10-0484] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Presynaptic metabotropic glutamate receptors (mGluRs) are essential for the control of synaptic transmission. However, how the subsynaptic dynamics of these receptors is controlled and contributes to synaptic signaling remain poorly understood quantitatively. Particularly, since the affinity of individual mGluR subtypes for glutamate differs considerably, the activation of mGluR subtypes critically depends on their precise subsynaptic distribution. Here, using superresolution microscopy and single-molecule tracking, we unravel novel molecular mechanisms that control the nanoscale distribution and mobility of presynaptic mGluRs in hippocampal neurons. We demonstrate that the high-affinity group II receptor mGluR2 localizes diffusely along the axon, and is highly mobile, while the low-affinity group III receptor mGluR7 is stably anchored at the active zone. We demonstrate that intracellular interactions modulate surface diffusion of mGluR2, while immobilization of mGluR7 at the active zone relies on its extracellular domain. Receptor activation or increases in synaptic activity do not alter the surface mobility of presynaptic mGluRs. Finally, computational modeling of presynaptic mGluR activity revealed that this particular nanoscale arrangement directly impacts their ability to modulate neurotransmitter release. Altogether, this study demonstrates that distinct mechanisms control surface mobility of presynaptic mGluRs to contribute differentially to glutamatergic synaptic transmission.
Collapse
Affiliation(s)
- Anna Bodzęta
- Division of Cell Biology, Neurobiology and Biophysics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH, The Netherlands
| | - Florian Berger
- Division of Cell Biology, Neurobiology and Biophysics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH, The Netherlands
| | - Harold D MacGillavry
- Division of Cell Biology, Neurobiology and Biophysics, Department of Biology, Faculty of Science, Utrecht University, 3584 CH, The Netherlands
| |
Collapse
|
4
|
Dibaeinia P, Sinha S. Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks. Nucleic Acids Res 2021; 49:10309-10327. [PMID: 34508359 PMCID: PMC8501998 DOI: 10.1093/nar/gkab765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/18/2021] [Accepted: 08/25/2021] [Indexed: 11/18/2022] Open
Abstract
Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer ‘grammar’ in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, such as different cell types and experimental conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference. CoNSEPT is freely available at: https://github.com/PayamDiba/CoNSEPT.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.,Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
5
|
Xiao JY, Hafner A, Boettiger AN. How subtle changes in 3D structure can create large changes in transcription. eLife 2021; 10:e64320. [PMID: 34240703 PMCID: PMC8352591 DOI: 10.7554/elife.64320] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 06/25/2021] [Indexed: 12/17/2022] Open
Abstract
Animal genomes are organized into topologically associated domains (TADs). TADs are thought to contribute to gene regulation by facilitating enhancer-promoter (E-P) contacts within a TAD and preventing these contacts across TAD borders. However, the absolute difference in contact frequency across TAD boundaries is usually less than 2-fold, even though disruptions of TAD borders can change gene expression by 10-fold. Existing models fail to explain this hypersensitive response. Here, we propose a futile cycle model of enhancer-mediated regulation that can exhibit hypersensitivity through bistability and hysteresis. Consistent with recent experiments, this regulation does not exhibit strong correlation between E-P contact and promoter activity, even though regulation occurs through contact. Through mathematical analysis and stochastic simulation, we show that this system can create an illusion of E-P biochemical specificity and explain the importance of weak TAD boundaries. It also offers a mechanism to reconcile apparently contradictory results from recent global TAD disruption with local TAD boundary deletion experiments. Together, these analyses advance our understanding of cis-regulatory contacts in controlling gene expression and suggest new experimental directions.
Collapse
Affiliation(s)
| | - Antonina Hafner
- Department of Developmental Biology, Stanford UniversityStanfordUnited States
| | - Alistair N Boettiger
- Program in Biophysics, Stanford UniversityStanfordUnited States
- Department of Developmental Biology, Stanford UniversityStanfordUnited States
| |
Collapse
|
6
|
Zubair A, Rosen IG, Nuzhdin SV, Marjoram P. Bayesian model selection for the Drosophila gap gene network. BMC Bioinformatics 2019; 20:327. [PMID: 31195954 PMCID: PMC6567646 DOI: 10.1186/s12859-019-2888-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 05/09/2019] [Indexed: 11/10/2022] Open
Abstract
Background The gap gene system controls the early cascade of the segmentation pathway in Drosophila melanogaster as well as other insects. Owing to its tractability and key role in embryo patterning, this system has been the focus for both computational modelers and experimentalists. The gap gene expression dynamics can be considered strictly as a one-dimensional process and modeled as a system of reaction-diffusion equations. While substantial progress has been made in modeling this phenomenon, there still remains a deficit of approaches to evaluate competing hypotheses. Most of the model development has happened in isolation and there has been little attempt to compare candidate models. Results The Bayesian framework offers a means of doing formal model evaluation. Here, we demonstrate how this framework can be used to compare different models of gene expression. We focus on the Papatsenko-Levine formalism, which exploits a fractional occupancy based approach to incorporate activation of the gap genes by the maternal genes and cross-regulation by the gap genes themselves. The Bayesian approach provides insight about relationship between system parameters. In the regulatory pathway of segmentation, the parameters for number of binding sites and binding affinity have a negative correlation. The model selection analysis supports a stronger binding affinity for Bicoid compared to other regulatory edges, as shown by a larger posterior mean. The procedure doesn’t show support for activation of Kruppel by Bicoid. Conclusions We provide an efficient solver for the general representation of the Papatsenko-Levine model. We also demonstrate the utility of Bayes factor for evaluating candidate models for spatial pattering models. In addition, by using the parallel tempering sampler, the convergence of Markov chains can be remarkably improved and robust estimates of Bayes factors obtained. Electronic supplementary material The online version of this article (10.1186/s12859-019-2888-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Asif Zubair
- Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US.
| | - I Gary Rosen
- Department of Mathematics, USC, 3620 S. Vermont Ave., Los Angeles, CA 90089-2532, US
| | - Sergey V Nuzhdin
- Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US
| | - Paul Marjoram
- Molecular and Computational Biology, USC, 1050 Childs Way, Los Angeles, CA 90089-2532, US
| |
Collapse
|
7
|
Verd B, Monk NA, Jaeger J. Modularity, criticality, and evolvability of a developmental gene regulatory network. eLife 2019; 8:42832. [PMID: 31169494 PMCID: PMC6645726 DOI: 10.7554/elife.42832] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 06/05/2019] [Indexed: 01/16/2023] Open
Abstract
The existence of discrete phenotypic traits suggests that the complex regulatory processes which produce them are functionally modular. These processes are usually represented by networks. Only modular networks can be partitioned into intelligible subcircuits able to evolve relatively independently. Traditionally, functional modularity is approximated by detection of modularity in network structure. However, the correlation between structure and function is loose. Many regulatory networks exhibit modular behaviour without structural modularity. Here we partition an experimentally tractable regulatory network—the gap gene system of dipteran insects—using an alternative approach. We show that this system, although not structurally modular, is composed of dynamical modules driving different aspects of whole-network behaviour. All these subcircuits share the same regulatory structure, but differ in components and sensitivity to regulatory interactions. Some subcircuits are in a state of criticality, while others are not, which explains the observed differential evolvability of the various expression features in the system.
Collapse
Affiliation(s)
- Berta Verd
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Konrad Lorenz Institute for Evolution and Cognition Research (KLI), Klosterneuburg, Austria.,Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Nicholas Am Monk
- School of Mathematics and Statistics, University of Sheffield, Sheffield, United States
| | - Johannes Jaeger
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Konrad Lorenz Institute for Evolution and Cognition Research (KLI), Klosterneuburg, Austria.,School of Mathematics and Statistics, University of Sheffield, Sheffield, United States.,Wissenschaftskolleg zu Berlin, Berlin, Germany.,Center for Systems Biology Dresden (CSBD), Dresden, Germany.,Complexity Science Hub (CSH), Vienna, Austria.,Centre de Recherches Interdisciplinaires (CRI), Paris, France
| |
Collapse
|
8
|
Peng PC, Sinha S. Quantitative modeling of gene expression using DNA shape features of binding sites. Nucleic Acids Res 2016; 44:e120. [PMID: 27257066 PMCID: PMC5291265 DOI: 10.1093/nar/gkw446] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Revised: 05/06/2016] [Accepted: 05/09/2016] [Indexed: 12/11/2022] Open
Abstract
Prediction of gene expression levels driven by regulatory sequences is pivotal in genomic biology. A major focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence based on transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models largely rely on the position weight matrix (PWM) model for DNA binding, and the effect of alternative models based on DNA shape remains unexplored. Here, we propose a statistical thermodynamics model of gene expression using DNA shape features of binding sites. We used rigorous methods to evaluate the fits of expression readouts of 37 enhancers regulating spatial gene expression patterns in Drosophila embryo, and show that DNA shape-based models perform arguably better than PWM-based models. We also observed DNA shape captures information complimentary to the PWM, in a way that is useful for expression modeling. Furthermore, we tested if combining shape and PWM-based features provides better predictions than using either binding model alone. Our work demonstrates that the increasingly popular DNA-binding models based on local DNA shape can be useful in sequence-to-expression modeling. It also provides a framework for future studies to predict gene expression better than with PWM models alone.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
9
|
Peng PC, Hassan Samee MA, Sinha S. Incorporating chromatin accessibility data into sequence-to-expression modeling. Biophys J 2016; 108:1257-67. [PMID: 25762337 DOI: 10.1016/j.bpj.2014.12.037] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Revised: 12/01/2014] [Accepted: 12/11/2014] [Indexed: 01/30/2023] Open
Abstract
Prediction of gene expression levels from regulatory sequences is one of the major challenges of genomic biology today. A particularly promising approach to this problem is that taken by thermodynamics-based models that interpret an enhancer sequence in a given cellular context specified by transcription factor concentration levels and predict precise expression levels driven by that enhancer. Such models have so far not accounted for the effect of chromatin accessibility on interactions between transcription factor and DNA and consequently on gene-expression levels. Here, we extend a thermodynamics-based model of gene expression, called GEMSTAT (Gene Expression Modeling Based on Statistical Thermodynamics), to incorporate chromatin accessibility data and quantify its effect on accuracy of expression prediction. In the new model, called GEMSTAT-A, accessibility at a binding site is assumed to affect the transcription factor's binding strength at the site, whereas all other aspects are identical to the GEMSTAT model. We show that this modification results in significantly better fits in a data set of over 30 enhancers regulating spatial expression patterns in the blastoderm-stage Drosophila embryo. It is important to note that the improved fits result not from an overall elevated accessibility in active enhancers but from the variation of accessibility levels within an enhancer. With whole-genome DNA accessibility measurements becoming increasingly popular, our work demonstrates how such data may be useful for sequence-to-expression models. It also calls for future advances in modeling accessibility levels from sequence and the transregulatory context, so as to predict accurately the effect of cis and trans perturbations on gene expression.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Md Abul Hassan Samee
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois; Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois.
| |
Collapse
|
10
|
Samee MAH, Lim B, Samper N, Lu H, Rushlow CA, Jiménez G, Shvartsman SY, Sinha S. A Systematic Ensemble Approach to Thermodynamic Modeling of Gene Expression from Sequence Data. Cell Syst 2015; 1:396-407. [PMID: 27136354 DOI: 10.1016/j.cels.2015.12.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 10/19/2015] [Accepted: 12/02/2015] [Indexed: 11/17/2022]
Abstract
To understand the relationship between an enhancer DNA sequence and quantitative gene expression, thermodynamics-driven mathematical models of transcription are often employed. These "sequence-to-expression" models can describe an incomplete or even incorrect set of regulatory relationships if the parameter space is not searched systematically. Here, we focus on an enhancer of the Drosophila gene ind and demonstrate how a systematic search of parameter space can reveal a more comprehensive picture of a gene's regulatory mechanisms, resolve outstanding ambiguities, and suggest testable hypotheses. We describe an approach that generates an ensemble of ind models; all of these models are technically acceptable solutions to the sequence-to-expression problem in light of wild-type data, and some represent mechanistically distinct hypotheses about the regulation of ind. This ensemble can be restricted to biologically plausible models using requirements gleaned from in vivo perturbation experiments. Biologically plausible models make unique predictions about how specific ind enhancer sequences affect ind expression; we validate these predictions in vivo through site mutagenesis in transgenic Drosophila embryos.
Collapse
Affiliation(s)
- Md Abul Hassan Samee
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Bomyi Lim
- Department of Chemical and Biological Engineering and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Núria Samper
- Department of Developmental Biology, Instituto de Biología Molecular de Barcelona, Consejo Superior de Investigaciones Científicas (CSIC), Barcelona 08208, Spain
| | - Hang Lu
- School of Chemical and Biomolecular Engineering and Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | - Gerardo Jiménez
- Department of Developmental Biology, Instituto de Biología Molecular de Barcelona, Consejo Superior de Investigaciones Científicas (CSIC), Barcelona 08208, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain
| | - Stanislav Y Shvartsman
- Department of Chemical and Biological Engineering and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
11
|
Papatsenko D, Lemischka IR. NetExplore: a web server for modeling small network motifs. ACTA ACUST UNITED AC 2015; 31:1860-2. [PMID: 25637559 DOI: 10.1093/bioinformatics/btv058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 01/26/2015] [Indexed: 01/28/2023]
Abstract
MOTIVATION Quantitative and qualitative assessment of biological data often produces small essential recurrent networks, containing 3-5 components called network motifs. In this context, model solutions for small network motifs represent very high interest. RESULTS Software package NetExplore has been created in order to generate, classify and analyze solutions for network motifs including up to six network components. NetExplore allows plotting and visualization of the solution's phase spaces and bifurcation diagrams. AVAILABILITY AND IMPLEMENTATION The current version of NetExplore has been implemented in Perl-CGI and is accessible at the following locations: http://line.bioinfolab.net/nex/NetExplore.htm and http://nex.autosome.ru/nex/NetExplore.htm.
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Regenerative and Developmental Biology, Black Family Stem Cell Institute andDepartment of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA Department of Regenerative and Developmental Biology, Black Family Stem Cell Institute andDepartment of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA
| | - Ihor R Lemischka
- Department of Regenerative and Developmental Biology, Black Family Stem Cell Institute andDepartment of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA Department of Regenerative and Developmental Biology, Black Family Stem Cell Institute andDepartment of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA Department of Regenerative and Developmental Biology, Black Family Stem Cell Institute andDepartment of Pharmacology and System Therapeutics, Icahn School of Medicine at Mount Sinai, Systems Biology Center New York, One Gustave L. Levy Place, New York, NY 10029, USA
| |
Collapse
|
12
|
Analytic approaches to stochastic gene expression in multicellular systems. Biophys J 2014; 105:2629-40. [PMID: 24359735 DOI: 10.1016/j.bpj.2013.10.033] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2013] [Accepted: 10/16/2013] [Indexed: 11/22/2022] Open
Abstract
Deterministic thermodynamic models of the complex systems, which control gene expression in metazoa, are helping researchers identify fundamental themes in the regulation of transcription. However, quantitative single cell studies are increasingly identifying regulatory mechanisms that control variability in expression. Such behaviors cannot be captured by deterministic models and are poorly suited to contemporary stochastic approaches that rely on continuum approximations, such as Langevin methods. Fortunately, theoretical advances in the modeling of transcription have assembled some general results that can be readily applied to systems being explored only through a deterministic approach. Here, I review some of the recent experimental evidence for the importance of genetically regulating stochastic effects during embryonic development and discuss key results from Markov theory that can be used to model this regulation. I then discuss several pairs of regulatory mechanisms recently investigated through a Markov approach. In each case, a deterministic treatment predicts no difference between the mechanisms, but the statistical treatment reveals the potential for substantially different distributions of transcriptional activity. In this light, features of gene regulation that seemed needlessly complex evolutionary baggage may be appreciated for their key contributions to reliability and precision of gene expression.
Collapse
|
13
|
Munteanu A, Cotterell J, Solé RV, Sharpe J. Design principles of stripe-forming motifs: the role of positive feedback. Sci Rep 2014; 4:5003. [PMID: 24830352 PMCID: PMC4023129 DOI: 10.1038/srep05003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 04/28/2014] [Indexed: 02/07/2023] Open
Abstract
Interpreting a morphogen gradient into a single stripe of gene-expression is a fundamental unit of patterning in early embryogenesis. From both experimental data and computational studies the feed-forward motifs stand out as minimal networks capable of this patterning function. Positive feedback within gene networks has been hypothesised to enhance the sharpness and precision of gene-expression borders, however a systematic analysis has not yet been reported. Here we set out to assess this hypothesis, and find an unexpected result. The addition of positive-feedback can have different effects on two different designs of feed-forward motif– it increases the parametric robustness of one design, while being neutral or detrimental to the other. These results shed light on the abundance of the former motif and especially of mutual-inhibition positive feedback in developmental networks.
Collapse
Affiliation(s)
- Andreea Munteanu
- 1] EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain [2] Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003 Barcelona, Spain
| | - James Cotterell
- 1] EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain [2] Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Ricard V Solé
- 1] Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003 Barcelona, Spain [2] Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA [3] Institució Catalana de Recerca i Estudis Avancats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | - James Sharpe
- 1] EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain [2] Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003 Barcelona, Spain [3] Institució Catalana de Recerca i Estudis Avancats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
14
|
Kim AR, Martinez C, Ionides J, Ramos AF, Ludwig MZ, Ogawa N, Sharp DH, Reinitz J. Rearrangements of 2.5 kilobases of noncoding DNA from the Drosophila even-skipped locus define predictive rules of genomic cis-regulatory logic. PLoS Genet 2013; 9:e1003243. [PMID: 23468638 PMCID: PMC3585115 DOI: 10.1371/journal.pgen.1003243] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 11/30/2012] [Indexed: 01/19/2023] Open
Abstract
Rearrangements of about 2.5 kilobases of regulatory DNA located 5′ of the transcription start site of the Drosophila even-skipped locus generate large-scale changes in the expression of even-skipped stripes 2, 3, and 7. The most radical effects are generated by juxtaposing the minimal stripe enhancers MSE2 and MSE3 for stripes 2 and 3 with and without small “spacer” segments less than 360 bp in length. We placed these fusion constructs in a targeted transformation site and obtained quantitative expression data for these transformants together with their controlling transcription factors at cellular resolution. These data demonstrated that the rearrangements can alter expression levels in stripe 2 and the 2–3 interstripe by a factor of more than 10. We reasoned that this behavior would place tight constraints on possible rules of genomic cis-regulatory logic. To find these constraints, we confronted our new expression data together with previously obtained data on other constructs with a computational model. The model contained representations of thermodynamic protein–DNA interactions including steric interference and cooperative binding, short-range repression, direct repression, activation, and coactivation. The model was highly constrained by the training data, which it described within the limits of experimental error. The model, so constrained, was able to correctly predict expression patterns driven by enhancers for other Drosophila genes; even-skipped enhancers not included in the training set; stripe 2, 3, and 7 enhancers from various Drosophilid and Sepsid species; and long segments of even-skipped regulatory DNA that contain multiple enhancers. The model further demonstrated that elevated expression driven by a fusion of MSE2 and MSE3 was a consequence of the recruitment of a portion of MSE3 to become a functional component of MSE2, demonstrating that cis-regulatory “elements” are not elementary objects. Metazoan genes, including those of humans, contain large noncoding regions that are required for viability. Sequence variations in these regions are statistically associated with human disease, but the mechanisms underlying these associations are not well understood. These regions regulate transcription and are frequently larger than the gene's transcript by an order of magnitude. In this paper we attempt to elucidate the regulatory code of these noncoding segments of DNA by means of quantitative spatially resolved gene expression data and a computational model. The expression data comes from the early embryo of the fruit fly Drosophila melanogaster. We chose a family of DNA constructs to analyze that drive very different patterns of expression when very small changes in DNA sequence are made, reasoning that this sensitivity would reveal important properties of the regulatory code. The model reproduced the training data with precision greater than the expected accuracy of the training data itself. It was able to correctly predict from DNA sequence the expression of 44 segments of DNA from many genes and species.
Collapse
Affiliation(s)
- Ah-Ram Kim
- Department of Ecology and Evolution, Chicago Center for Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, New York, United States of America
| | - Carlos Martinez
- Department of Ecology and Evolution, Chicago Center for Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - John Ionides
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Alexandre F. Ramos
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, Brazil
| | - Michael Z. Ludwig
- Department of Ecology and Evolution, Chicago Center for Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Nobuo Ogawa
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - David H. Sharp
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - John Reinitz
- Department of Ecology and Evolution, Chicago Center for Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Statistics, Department of Molecular Genetics and Cell Biology, and Institute of Genomics and Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
15
|
Sokolowski TR, Erdmann T, ten Wolde PR. Mutual repression enhances the steepness and precision of gene expression boundaries. PLoS Comput Biol 2012; 8:e1002654. [PMID: 22956897 PMCID: PMC3431325 DOI: 10.1371/journal.pcbi.1002654] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Accepted: 07/07/2012] [Indexed: 11/18/2022] Open
Abstract
Embryonic development is driven by spatial patterns of gene expression that determine the fate of each cell in the embryo. While gene expression is often highly erratic, embryonic development is usually exceedingly precise. In particular, gene expression boundaries are robust not only against intra-embryonic fluctuations such as noise in gene expression and protein diffusion, but also against embryo-to-embryo variations in the morphogen gradients, which provide positional information to the differentiating cells. How development is robust against intra- and inter-embryonic variations is not understood. A common motif in the gene regulation networks that control embryonic development is mutual repression between pairs of genes. To assess the role of mutual repression in the robust formation of gene expression patterns, we have performed large-scale stochastic simulations of a minimal model of two mutually repressing gap genes in Drosophila, hunchback (hb) and knirps (kni). Our model includes not only mutual repression between hb and kni, but also the stochastic and cooperative activation of hb by the anterior morphogen Bicoid (Bcd) and of kni by the posterior morphogen Caudal (Cad), as well as the diffusion of Hb and Kni between neighboring nuclei. Our analysis reveals that mutual repression can markedly increase the steepness and precision of the gap gene expression boundaries. In contrast to other mechanisms such as spatial averaging and cooperative gene activation, mutual repression thus allows for gene-expression boundaries that are both steep and precise. Moreover, mutual repression dramatically enhances their robustness against embryo-to-embryo variations in the morphogen levels. Finally, our simulations reveal that diffusion of the gap proteins plays a critical role not only in reducing the width of the gap gene expression boundaries via the mechanism of spatial averaging, but also in repairing patterning errors that could arise because of the bistability induced by mutual repression.
Collapse
Affiliation(s)
| | - Thorsten Erdmann
- University of Heidelberg, Institute for Theoretical Physics, Heidelberg, Germany
| | | |
Collapse
|
16
|
Frank TD, Carmody AM, Kholodenko BN. Versatility of cooperative transcriptional activation: a thermodynamical modeling analysis for greater-than-additive and less-than-additive effects. PLoS One 2012; 7:e34439. [PMID: 22506020 PMCID: PMC3323628 DOI: 10.1371/journal.pone.0034439] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 03/02/2012] [Indexed: 11/20/2022] Open
Abstract
We derive a statistical model of transcriptional activation using equilibrium thermodynamics of chemical reactions. We examine to what extent this statistical model predicts synergy effects of cooperative activation of gene expression. We determine parameter domains in which greater-than-additive and less-than-additive effects are predicted for cooperative regulation by two activators. We show that the statistical approach can be used to identify different causes of synergistic greater-than-additive effects: nonlinearities of the thermostatistical transcriptional machinery and three-body interactions between RNA polymerase and two activators. In particular, our model-based analysis suggests that at low transcription factor concentrations cooperative activation cannot yield synergistic greater-than-additive effects, i.e., DNA transcription can only exhibit less-than-additive effects. Accordingly, transcriptional activity turns from synergistic greater-than-additive responses at relatively high transcription factor concentrations into less-than-additive responses at relatively low concentrations. In addition, two types of re-entrant phenomena are predicted. First, our analysis predicts that under particular circumstances transcriptional activity will feature a sequence of less-than-additive, greater-than-additive, and eventually less-than-additive effects when for fixed activator concentrations the regulatory impact of activators on the binding of RNA polymerase to the promoter increases from weak, to moderate, to strong. Second, for appropriate promoter conditions when activator concentrations are increased then the aforementioned re-entrant sequence of less-than-additive, greater-than-additive, and less-than-additive effects is predicted as well. Finally, our model-based analysis suggests that even for weak activators that individually induce only negligible increases in promoter activity, promoter activity can exhibit greater-than-additive responses when transcription factors and RNA polymerase interact by means of three-body interactions. Overall, we show that versatility of transcriptional activation is brought about by nonlinearities of transcriptional response functions and interactions between transcription factors, RNA polymerase and DNA.
Collapse
Affiliation(s)
- Till D Frank
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland.
| | | | | |
Collapse
|
17
|
Gene length may contribute to graded transcriptional responses in the Drosophila embryo. Dev Biol 2011; 360:230-40. [PMID: 21920356 DOI: 10.1016/j.ydbio.2011.08.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 08/28/2011] [Indexed: 01/22/2023]
Abstract
An important question in developmental biology is how relatively shallow gradients of morphogens can reliably establish a series of distinct transcriptional readouts. Current models emphasize interactions between transcription factors binding in distinct modes to cis-acting sequences of target genes. Another recent idea is that the cis-acting interactions may amplify preexisting biases or prepatterns to establish robust transcriptional responses. In this study, we examine the possible contribution of one such source of prepattern, namely gene length. We developed quantitative imaging tools to measure gene expression levels for several loci at a time on a single-cell basis and applied these quantitative imaging tools to dissect the establishment of a gene expression border separating the mesoderm and neuroectoderm in the early Drosophila embryo. We first characterized the formation of a transient ventral-to-dorsal gradient of the Snail (Sna) repressor and then examined the relationship between this gradient and repression of neural target genes in the mesoderm. We found that neural genes are repressed in a nested pattern within a zone of the mesoderm abutting the neuroectoderm, where Sna levels are graded. While several factors may contribute to the transient graded response to the Sna gradient, our analysis suggests that gene length may play an important, albeit transient, role in establishing these distinct transcriptional responses. One prediction of the gene-length-dependent transcriptional patterning model is that the co-regulated genes knirps (a short gene) and knirps-related (a long gene) should be transiently expressed in domains of differing widths, which we confirmed experimentally. These findings suggest that gene length may contribute to establishing graded responses to morphogen gradients by providing transient prepatterns that are subsequently amplified and stabilized by traditional cis-regulatory interactions.
Collapse
|
18
|
Papatsenko D, Levine M. The Drosophila gap gene network is composed of two parallel toggle switches. PLoS One 2011; 6:e21145. [PMID: 21747931 PMCID: PMC3128594 DOI: 10.1371/journal.pone.0021145] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Accepted: 05/20/2011] [Indexed: 11/30/2022] Open
Abstract
Drosophila “gap” genes provide the first response to maternal gradients in the early fly embryo. Gap genes are expressed in a series of broad bands across the embryo during first hours of development. The gene network controlling the gap gene expression patterns includes inputs from maternal gradients and mutual repression between the gap genes themselves. In this study we propose a modular design for the gap gene network, involving two relatively independent network domains. The core of each network domain includes a toggle switch corresponding to a pair of mutually repressive gap genes, operated in space by maternal inputs. The toggle switches present in the gap network are evocative of the phage lambda switch, but they are operated positionally (in space) by the maternal gradients, so the synthesis rates for the competing components change along the embryo anterior-posterior axis. Dynamic model, constructed based on the proposed principle, with elements of fractional site occupancy, required 5–7 parameters to fit quantitative spatial expression data for gap gradients. The identified model solutions (parameter combinations) reproduced major dynamic features of the gap gradient system and explained gap expression in a variety of segmentation mutants.
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Gene and Cell Medicine, Mount Sinai School of Medicine, Black Family Stem Cell Institute, New York, New York, United States of America.
| | | |
Collapse
|
19
|
Abstract
Gap genes are involved in segment determination during the early development of the fruit fly Drosophila melanogaster as well as in other insects. This review attempts to synthesize the current knowledge of the gap gene network through a comprehensive survey of the experimental literature. I focus on genetic and molecular evidence, which provides us with an almost-complete picture of the regulatory interactions responsible for trunk gap gene expression. I discuss the regulatory mechanisms involved, and highlight the remaining ambiguities and gaps in the evidence. This is followed by a brief discussion of molecular regulatory mechanisms for transcriptional regulation, as well as precision and size-regulation provided by the system. Finally, I discuss evidence on the evolution of gap gene expression from species other than Drosophila. My survey concludes that studies of the gap gene system continue to reveal interesting and important new insights into the role of gene regulatory networks in development and evolution.
Collapse
Affiliation(s)
- Johannes Jaeger
- Centre de Regulació Genòmica, Universtitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
20
|
Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression. PLoS Comput Biol 2010; 6. [PMID: 20862354 PMCID: PMC2940721 DOI: 10.1371/journal.pcbi.1000935] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2010] [Accepted: 08/17/2010] [Indexed: 01/08/2023] Open
Abstract
Quantitative models of cis-regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled, or heuristic approximations of the underlying regulatory mechanisms. We have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence, as a function of transcription factor concentrations and their DNA-binding specificities. It uses statistical thermodynamics theory to model not only protein-DNA interaction, but also the effect of DNA-bound activators and repressors on gene expression. In addition, the model incorporates mechanistic features such as synergistic effect of multiple activators, short range repression, and cooperativity in transcription factor-DNA binding, allowing us to systematically evaluate the significance of these features in the context of available expression data. Using this model on segmentation-related enhancers in Drosophila, we find that transcriptional synergy due to simultaneous action of multiple activators helps explain the data beyond what can be explained by cooperative DNA-binding alone. We find clear support for the phenomenon of short-range repression, where repressors do not directly interact with the basal transcriptional machinery. We also find that the binding sites contributing to an enhancer's function may not be conserved during evolution, and a noticeable fraction of these undergo lineage-specific changes. Our implementation of the model, called GEMSTAT, is the first publicly available program for simultaneously modeling the regulatory activities of a given set of sequences. The development of complex multicellular organisms requires genes to be expressed at specific stages and in specific tissues. Regulatory DNA sequences, often called cis-regulatory modules, drive the desired gene expression patterns by integrating information about the environment in the form of the activities of transcription factors. The rules by which regulatory sequences read this type of information, however, are unclear. In this work, we developed quantitative models based on physicochemical principles that directly map regulatory sequences to the expression profiles they generate. We evaluated these models on the segmentation network of the model organism Drosophila melanogaster. Our models incorporate mechanistic features that attempt to capture how activating and repressing transcription factors work in the segmentation system. By evaluating the importance of these features, we were able to gain insights on the quantitative regulatory rules. We found that two different mechanisms may contribute to cooperative gene activation and that repressors often have a short range of influence in DNA sequences. Combining the quantitative modeling with comparative sequence analysis, we also found that even functional sequences may be lost during evolution.
Collapse
|
21
|
Challenges for modeling global gene regulatory networks during development: Insights from Drosophila. Dev Biol 2010; 340:161-9. [DOI: 10.1016/j.ydbio.2009.10.032] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Revised: 10/14/2009] [Accepted: 10/21/2009] [Indexed: 12/26/2022]
|
22
|
He X, Chen CC, Hong F, Fang F, Sinha S, Ng HH, Zhong S. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS One 2009; 4:e8155. [PMID: 19956545 PMCID: PMC2780727 DOI: 10.1371/journal.pone.0008155] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 11/10/2009] [Indexed: 11/19/2022] Open
Abstract
Background How transcription factors (TFs) interact with cis-regulatory sequences and interact with each other is a fundamental, but not well understood, aspect of gene regulation. Methodology/Principal Findings We present a computational method to address this question, relying on the established biophysical principles. This method, STAP (sequence to affinity prediction), takes into account all combinations and configurations of strong and weak binding sites to analyze large scale transcription factor (TF)-DNA binding data to discover cooperative interactions among TFs, infer sequence rules of interaction and predict TF target genes in new conditions with no TF-DNA binding data. The distinctions between STAP and other statistical approaches for analyzing cis-regulatory sequences include the utility of physical principles and the treatment of the DNA binding data as quantitative representation of binding strengths. Applying this method to the ChIP-seq data of 12 TFs in mouse embryonic stem (ES) cells, we found that the strength of TF-DNA binding could be significantly modulated by cooperative interactions among TFs with adjacent binding sites. However, further analysis on five putatively interacting TF pairs suggests that such interactions may be relatively insensitive to the distance and orientation of binding sites. Testing a set of putative Nanog motifs, STAP showed that a novel Nanog motif could better explain the ChIP-seq data than previously published ones. We then experimentally tested and verified the new Nanog motif. A series of comparisons showed that STAP has more predictive power than several state-of-the-art methods for cis-regulatory sequence analysis. We took advantage of this power to study the evolution of TF-target relationship in Drosophila. By learning the TF-DNA interaction models from the ChIP-chip data of D. melanogaster (Mel) and applying them to the genome of D. pseudoobscura (Pse), we found that only about half of the sequences strongly bound by TFs in Mel have high binding affinities in Pse. We show that prediction of functional TF targets from ChIP-chip data can be improved by using the conservation of STAP predicted affinities as an additional filter. Conclusions/Significance STAP is an effective method to analyze binding site arrangements, TF cooperativity, and TF target genes from genome-wide TF-DNA binding data.
Collapse
Affiliation(s)
- Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Chieh-Chun Chen
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Feng Hong
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Fang Fang
- Gene Regulation Laboratory, Genome Institute of Singapore, Singapore, Singapore
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Huck-Hui Ng
- Gene Regulation Laboratory, Genome Institute of Singapore, Singapore, Singapore
| | - Sheng Zhong
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
- * E-mail:
| |
Collapse
|
23
|
Papatsenko D. Stripe formation in the early fly embryo: principles, models, and networks. Bioessays 2009; 31:1172-80. [DOI: 10.1002/bies.200900096] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
24
|
Abstract
I provide a historical overview on the use of mathematical models to gain insight into pattern formation during early development of the fruit fly Drosophila melanogaster. It is my intention to illustrate how the aims and methodology of modelling have changed from the early beginnings of a theoretical developmental biology in the 1960s to modern-day systems biology. I show that even early modelling attempts addressed interesting and relevant questions, which were not tractable by experimental approaches. Unfortunately, their validation was severely hampered by a lack of specificity and appropriate experimental evidence. There is a simple lesson to be learned from this: we cannot deduce general rules for pattern formation from first principles or spurious reproduction of developmental phenomena. Instead, we must infer such rules (if any) from detailed and accurate studies of specific developmental systems. To achieve this, mathematical modelling must be closely integrated with experimental approaches. I report on progress that has been made in this direction in the past few years and illustrate the kind of novel insights that can be gained from such combined approaches. These insights demonstrate the great potential (and some pitfalls) of an integrative, systems-level investigation of pattern formation.
Collapse
Affiliation(s)
- Johannes Jaeger
- EMBL/CRG Research Unit in Systems Biology, CRG-Centre de Regulació Genòmica, Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona, Spain.
| |
Collapse
|
25
|
Papatsenko D, Goltsev Y, Levine M. Organization of developmental enhancers in the Drosophila embryo. Nucleic Acids Res 2009; 37:5665-77. [PMID: 19651877 PMCID: PMC2761283 DOI: 10.1093/nar/gkp619] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Most cell-specific enhancers are thought to lack an inherent organization, with critical binding sites distributed in a more or less random fashion. However, there are examples of fixed arrangements of binding sites, such as helical phasing, that promote the formation of higher-order protein complexes on the enhancer DNA template. Here, we investigate the regulatory ‘grammar’ of nearly 100 characterized enhancers for developmental control genes active in the early Drosophila embryo. The conservation of grammar is examined in seven divergent Drosophila genomes. Linked binding sites are observed for particular combinations of binding motifs, including Bicoid–Bicoid, Hunchback–Hunchback, Bicoid–Dorsal, Bicoid–Caudal and Dorsal–Twist. Direct evidence is presented for the importance of Bicoid–Dorsal linkage in the integration of the anterior–posterior and dorsal–ventral patterning systems. Hunchback–Hunchback interactions help explain unresolved aspects of segmentation, including the differential regulation of the eve stripe 3 + 7 and stripe 4 + 6 enhancers. We also present evidence that there is an under-representation of nucleosome positioning sequences in many enhancers, raising the possibility for a subtle higher-order structure extending across certain enhancers. We conclude that grammar of gene control regions is pervasively used in the patterning of the Drosophila embryo.
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Molecular Cell Biology, Division of Genetics, Genomics & Development, Center for Integrative Genomics, University of California, Berkeley, CA 94720-200, USA.
| | | | | |
Collapse
|
26
|
Chopra VS, Levine M. Combinatorial patterning mechanisms in the Drosophila embryo. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:243-9. [PMID: 19651703 DOI: 10.1093/bfgp/elp026] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The classical concept of the morphogen gradient proposes that small differences in the levels of a signalling molecule or transcription factor are responsible for producing a continuous spectrum of distinctive cellular identities across a naïve field of cells. In this review, we discuss how the Dorsal gradient controls the dorsal-ventral patterning of the early Drosophila embryo. This gradient extends from the ventral midline of the embryo into dorso-lateral regions, encompassing a cross-sectional field of approximately 20 cells. There is no evidence that these cells acquire distinctive identities due to subtle changes in the nuclear concentrations of the Dorsal protein. Rather, a variety of evidence suggests that the Dorsal gradient generates just three primary thresholds of gene activity. High levels activate gene expression in the presumptive mesoderm, while intermediate and low levels activate gene expression in the ventral and dorsal neurogenic ectoderm, respectively. We discuss how these primary readouts of the gradient establish localized domains of cell signalling, which work in a combinatorial manner with transcriptional networks to produce complex patterns of gene expression and tissue differentiation.
Collapse
Affiliation(s)
- Vivek S Chopra
- Department Molecular & Cell Biology, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
27
|
Abstract
Motivation: Modeling transcriptional regulation using thermo-dynamic modeling approaches has become increasingly relevant as a way to gain a detailed understanding of transcriptional regulation. Thermodynamic models are able to model the interactions between transcription factors (TFs) and DNA that lead to a specific transcriptional output of the target gene. Such models can be ‘trained’ by fitting their free parameters to data on the transcription rate of a gene and the concentrations of its regulating factors. However, the parameter fitting process is computationally very expensive and this limits the number of alternative types of model that can be explored. Results: In this study, we evaluate the ‘optimization landscape’ of a class of static, quantitative models of regulation and explore the efficiency of a range of optimization methods. We evaluate eight optimization methods: two variants of simulated annealing (SA), four variants of gradient descent (GD), a hybrid SA/GD algorithm and a genetic algorithm. We show that the optimization landscape has numerous local optima, resulting in poor performance for the GD methods. SA with a simple geometric cooling schedule performs best among all tested methods. In particular, we see no advantage to using the more sophisticated ‘LAM’ cooling schedule. Overall, a good approximate solution is achievable in minutes using SA with a simple cooling schedule. Contact:d.bauer@uq.edu.au; t.bailey@imb.uq.edu.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Denis C Bauer
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia.
| | | |
Collapse
|
28
|
Modelling of the activation of G-protein coupled receptors: drug free constitutive receptor activity. J Math Biol 2009; 60:313-46. [PMID: 19347339 DOI: 10.1007/s00285-009-0268-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Revised: 03/13/2009] [Indexed: 10/20/2022]
Abstract
G-protein coupled receptors (GPCRs) form a crucial component of approximately 80% of hormone pathways. In this paper, the most popular mechanism for activation of GPCRs-the shuttling mechanism-is modelled mathematically. An asymptotic analysis of this model clarifies the dynamics of the system in the absence of drug, in particular which reactions dominate during the different timescales. Equilibrium analysis of the model demonstrates the model's ability to predict constitutive receptor activity.
Collapse
|
29
|
How the Dorsal gradient works: insights from postgenome technologies. Proc Natl Acad Sci U S A 2008; 105:20072-6. [PMID: 19104040 DOI: 10.1073/pnas.0806476105] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Gradients of extracellular signaling molecules and transcription factors are used in a variety of developmental processes, including the patterning of the Drosophila embryo, the establishment of diverse neuronal cell types in the vertebrate neural tube, and the anterior-posterior patterning of vertebrate limbs. Here, we discuss how a gradient of the maternal transcription factor Dorsal produces complex patterns of gene expression across the dorsal-ventral (DV) axis of the early Drosophila embryo. The identification of 60-70 Dorsal target genes, along with the characterization of approximately 35 associated regulatory DNAs, suggests that there are at least six different regulatory codes driving diverse DV expression profiles.
Collapse
|
30
|
Bauer DC, Bailey TL. Studying the functional conservation of cis-regulatory modules and their transcriptional output. BMC Bioinformatics 2008; 9:220. [PMID: 18442418 PMCID: PMC2386823 DOI: 10.1186/1471-2105-9-220] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Accepted: 04/29/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cis-regulatory modules (CRMs) are distinct, genomic regions surrounding the target gene that can independently activate the promoter to drive transcription. The activation of a CRM is controlled by the binding of a certain combination of transcription factors (TFs). It would be of great benefit if the transcriptional output mediated by a specific CRM could be predicted. Of equal benefit would be identifying in silico a specific CRM as the driver of the expression in a specific tissue or situation. We extend a recently developed biochemical modeling approach to manage both prediction tasks. Given a set of TFs, their protein concentrations, and the positions and binding strengths of each of the TFs in a putative CRM, the model predicts the transcriptional output of the gene. Our approach predicts the location of the regulating CRM by using predicted TF binding sites in regions near the gene as input to the model and searching for the region that yields a predicted transcription rate most closely matching the known rate. RESULTS Here we show the ability of the model on the example of one of the CRMs regulating the eve gene, MSE2. A model trained on the MSE2 in D. melanogaster was applied to the surrounding sequence of the eve gene in seven other Drosophila species. The model successfully predicts the correct MSE2 location and output in six out of eight Drosophila species we examine. CONCLUSION The model is able to generalize from D. melanogaster to other Drosophila species and accurately predicts the location and transcriptional output of MSE2 in those species. However, we also show that the current model is not specific enough to function as a genome-wide CRM scanner, because it incorrectly predicts other genomic regions to be MSE2s.
Collapse
Affiliation(s)
- Denis C Bauer
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Qld. 4072 Australia.
| | | |
Collapse
|
31
|
Surkova S, Myasnikova E, Janssens H, Kozlov KN, Samsonova AA, Reinitz J, Samsonova M. Pipeline for acquisition of quantitative data on segmentation gene expression from confocal images. Fly (Austin) 2008; 2:58-66. [PMID: 18820476 PMCID: PMC2803333 DOI: 10.4161/fly.6060] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
We describe a data pipeline developed to extract the quantitative data on segmentation gene expression from confocal images of gene expression patterns in Drosophila. The pipeline consists of five steps: image segmentation, background removal, temporal characterization of an embryo, data registration and data averaging. This pipeline was successfully applied to obtain quantitative gene expression data at cellular resolution in space and at the 6.5-minute resolution in time, as well as to construct a spatiotemporal atlas of segmentation gene expression. Each data pipeline step can be easily adapted to process a wide range of images of gene expression patterns.
Collapse
Affiliation(s)
- Svetlana Surkova
- Department of Computational Biology; Center for Advanced Studies; St. Petersburg State Polytechnical University; St. Petersburg, Russia
| | - Ekaterina Myasnikova
- Department of Computational Biology; Center for Advanced Studies; St. Petersburg State Polytechnical University; St. Petersburg, Russia
| | - Hilde Janssens
- FlyMine; Department of Genetics; University of Cambridge; Cambridge, United Kingdom
| | - Konstantin N. Kozlov
- Department of Computational Biology; Center for Advanced Studies; St. Petersburg State Polytechnical University; St. Petersburg, Russia
| | | | - John Reinitz
- Department of Applied Mathematics and Statistics and Center for Developmental Genetics; Stony Brook University; Stony Brook, New York USA
| | - Maria Samsonova
- Department of Computational Biology; Center for Advanced Studies; St. Petersburg State Polytechnical University; St. Petersburg, Russia
| |
Collapse
|
32
|
Abstract
The regulation of segmentation gene expression is investigated by computational modeling using quantitative expression data. Previous tissue culture assays and transgene analyses raised the possibility that Hunchback (Hb) might function as both an activator and repressor of transcription. At low concentrations, Hb activates gene expression, whereas at high concentrations it mediates repression. Under the same experimental conditions, transcription factors encoded by other gap genes appear to function as dedicated repressors. Models based on dual regulation suggest that the Hb gradient can be sufficient for establishing the initial Kruppel (Kr) expression pattern in central regions of the precellular embryo. The subsequent refinement of the Kr pattern depends on the combination of Hb and the Giant (Gt) repressor. The dual-regulation models developed for Kr also explain some of the properties of the even-skipped (eve) stripe 3+7 enhancer. Computational simulations suggest that repression results from the dimerization of Hb monomers on the DNA template.
Collapse
|
33
|
Zartman JJ, Shvartsman SY. Enhancer Organization: Transistor with a Twist or Something in a Different Vein? Curr Biol 2007; 17:R1048-50. [DOI: 10.1016/j.cub.2007.10.036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|