1
|
Tabe-Bordbar S, Song YJ, Lunt BJ, Alavi Z, Prasanth KV, Sinha S. Mechanistic analysis of enhancer sequences in the estrogen receptor transcriptional program. Commun Biol 2024; 7:719. [PMID: 38862711 PMCID: PMC11167054 DOI: 10.1038/s42003-024-06400-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 05/30/2024] [Indexed: 06/13/2024] Open
Abstract
Estrogen Receptor α (ERα) is a major lineage determining transcription factor (TF) in mammary gland development. Dysregulation of ERα-mediated transcriptional program results in cancer. Transcriptomic and epigenomic profiling of breast cancer cell lines has revealed large numbers of enhancers involved in this regulatory program, but how these enhancers encode function in their sequence remains poorly understood. A subset of ERα-bound enhancers are transcribed into short bidirectional RNA (enhancer RNA or eRNA), and this property is believed to be a reliable marker of active enhancers. We therefore analyze thousands of ERα-bound enhancers and build quantitative, mechanism-aware models to discriminate eRNAs from non-transcribing enhancers based on their sequence. Our thermodynamics-based models provide insights into the roles of specific TFs in ERα-mediated transcriptional program, many of which are supported by the literature. We use in silico perturbations to predict TF-enhancer regulatory relationships and integrate these findings with experimentally determined enhancer-promoter interactions to construct a gene regulatory network. We also demonstrate that the model can prioritize breast cancer-related sequence variants while providing mechanistic explanations for their function. Finally, we experimentally validate the model-proposed mechanisms underlying three such variants.
Collapse
Affiliation(s)
- Shayan Tabe-Bordbar
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - You Jin Song
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Bryan J Lunt
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Zahra Alavi
- Department of Physics, Loyola Marymount University, Los Angeles, CA, USA
| | - Kannanganattu V Prasanth
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Saurabh Sinha
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
2
|
Bhogale S, Sinha S. Thermodynamics-based modeling reveals regulatory effects of indirect transcription factor-DNA binding. iScience 2022; 25:104152. [PMID: 35465052 PMCID: PMC9018382 DOI: 10.1016/j.isci.2022.104152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/28/2021] [Accepted: 03/21/2022] [Indexed: 11/30/2022] Open
Abstract
Transcription factors (TFs) influence gene expression by binding to DNA, yet experimental data suggests that they also frequently bind regulatory DNA indirectly by interacting with other DNA-bound proteins. Here, we used a data modeling approach to test if such indirect binding by TFs plays a significant role in gene regulation. We first incorporated regulatory function of indirectly bound TFs into a thermodynamics-based model for predicting enhancer-driven expression from its sequence. We then fit the new model to a rich data set comprising hundreds of enhancers and their regulatory activities during mesoderm specification in Drosophila embryogenesis and showed that the newly incorporated mechanism results in significantly better agreement with data. In the process, we derived the first sequence-level model of this extensively characterized regulatory program. We further showed that allowing indirect binding of a TF explains its localization at enhancers more accurately than with direct binding only. Our model also provided a simple explanation of how a TF may switch between activating and repressive roles depending on context. Inclusion of indirect DNA binding of transcription factor improves enhancer function prediction Context specific activating or repressive roles of TFs Indirect binding improves fits to experimental TF-DNA binding data Role of Tinman depends on its DNA-binding mode (direct or indirect)
Collapse
|
3
|
Koopmans L, Youk H. Predictive landscapes hidden beneath biological cellular automata. J Biol Phys 2021; 47:355-369. [PMID: 34739687 PMCID: PMC8603977 DOI: 10.1007/s10867-021-09592-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 10/14/2021] [Indexed: 11/11/2022] Open
Abstract
To celebrate Hans Frauenfelder's achievements, we examine energy(-like) "landscapes" for complex living systems. Energy landscapes summarize all possible dynamics of some physical systems. Energy(-like) landscapes can explain some biomolecular processes, including gene expression and, as Frauenfelder showed, protein folding. But energy-like landscapes and existing frameworks like statistical mechanics seem impractical for describing many living systems. Difficulties stem from living systems being high dimensional, nonlinear, and governed by many, tightly coupled constituents that are noisy. The predominant modeling approach is devising differential equations that are tailored to each living system. This ad hoc approach faces the notorious "parameter problem": models have numerous nonlinear, mathematical functions with unknown parameter values, even for describing just a few intracellular processes. One cannot measure many intracellular parameters or can only measure them as snapshots in time. Another modeling approach uses cellular automata to represent living systems as discrete dynamical systems with binary variables. Quantitative (Hamiltonian-based) rules can dictate cellular automata (e.g., Cellular Potts Model). But numerous biological features, in current practice, are qualitatively described rather than quantitatively (e.g., gene is (highly) expressed or not (highly) expressed). Cellular automata governed by verbal rules are useful representations for living systems and can mitigate the parameter problem. However, they can yield complex dynamics that are difficult to understand because the automata-governing rules are not quantitative and much of the existing mathematical tools and theorems apply to continuous but not discrete dynamical systems. Recent studies found ways to overcome this challenge. These studies either discovered or suggest an existence of predictive "landscapes" whose shapes are described by Lyapunov functions and yield "equations of motion" for a "pseudo-particle." The pseudo-particle represents the entire cellular lattice and moves on the landscape, thereby giving a low-dimensional representation of the cellular automata dynamics. We outline this promising modeling strategy.
Collapse
Affiliation(s)
- Lars Koopmans
- Program in Applied Physics, Delft University of Technology, Delft, The Netherlands
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Hyun Youk
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA.
| |
Collapse
|
4
|
Dibaeinia P, Sinha S. Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks. Nucleic Acids Res 2021; 49:10309-10327. [PMID: 34508359 PMCID: PMC8501998 DOI: 10.1093/nar/gkab765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/18/2021] [Accepted: 08/25/2021] [Indexed: 11/18/2022] Open
Abstract
Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer 'grammar' in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, such as different cell types and experimental conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference. CoNSEPT is freely available at: https://github.com/PayamDiba/CoNSEPT.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
5
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|
6
|
Abstract
Determining whether and how a gene is transcribed are two of the central processes of life. The conceptual basis for understanding such gene regulation arose from pioneering biophysical studies in eubacteria. However, eukaryotic genomes exhibit vastly greater complexity, which raises questions not addressed by this bacterial paradigm. First, how is information integrated from many widely separated binding sites to determine how a gene is transcribed? Second, does the presence of multiple energy-expending mechanisms, which are absent from eubacterial genomes, indicate that eukaryotes are capable of improved forms of genetic information processing? An updated biophysical foundation is needed to answer such questions. We describe the linear framework, a graph-based approach to Markov processes, and show that it can accommodate many previous studies in the field. Under the assumption of thermodynamic equilibrium, we introduce a language of higher-order cooperativities and show how it can rigorously quantify gene regulatory properties suggested by experiment. We point out that fundamental limits to information processing arise at thermodynamic equilibrium and can only be bypassed through energy expenditure. Finally, we outline some of the mathematical challenges that must be overcome to construct an improved biophysical understanding of gene regulation.
Collapse
Affiliation(s)
- Felix Wong
- Institute for Medical Engineering & Science, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Jeremy Gunawardena
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA;
| |
Collapse
|
7
|
Rivera J, Keränen SVE, Gallo SM, Halfon MS. REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Res 2020; 47:D828-D834. [PMID: 30329093 PMCID: PMC6323911 DOI: 10.1093/nar/gky957] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 10/04/2018] [Indexed: 12/21/2022] Open
Abstract
The REDfly database provides a comprehensive curation of experimentally-validated Drosophila transcriptional cis-regulatory elements and includes information on DNA sequence, experimental evidence, patterns of regulated gene expression, and more. Now in its thirteenth year, REDfly has grown to over 23 000 records of tested reporter gene constructs and 2200 tested transcription factor binding sites. Recent developments include the start of curation of predicted cis-regulatory modules in addition to experimentally-verified ones, improved search and filtering, and increased interaction with the authors of curated papers. An expanded data model that will capture information on temporal aspects of gene regulation, regulation in response to environmental and other non-developmental cues, sexually dimorphic gene regulation, and non-endogenous (ectopic) aspects of reporter gene expression is under development and expected to be in place within the coming year. REDfly is freely accessible at http://redfly.ccr.buffalo.edu, and news about database updates and new features can be followed on Twitter at @REDfly_database.
Collapse
Affiliation(s)
- John Rivera
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | | | - Steven M Gallo
- Center for Computational Research, State University of New York at Buffalo, Buffalo, NY 14203, USA.,New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA
| | - Marc S Halfon
- New York State Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biomedical Informatics, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14203, USA.,Department of Molecular and Cellular Biology and Program in Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
8
|
Abstract
ABSTRACT
There is now compelling evidence that many arthropods pattern their segments using a clock-and-wavefront mechanism, analogous to that operating during vertebrate somitogenesis. In this Review, we discuss how the arthropod segmentation clock generates a repeating sequence of pair-rule gene expression, and how this is converted into a segment-polarity pattern by ‘timing factor’ wavefronts associated with axial extension. We argue that the gene regulatory network that patterns segments may be relatively conserved, although the timing of segmentation varies widely, and double-segment periodicity appears to have evolved at least twice. Finally, we describe how the repeated evolution of a simultaneous (Drosophila-like) mode of segmentation within holometabolan insects can be explained by heterochronic shifts in timing factor expression plus extensive pre-patterning of the pair-rule genes.
Collapse
Affiliation(s)
- Erik Clark
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK
| | - Andrew D. Peel
- School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Michael Akam
- Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ, UK
| |
Collapse
|
9
|
Dutta S, Djabrayan NJV, Torquato S, Shvartsman SY, Krajnc M. Self-Similar Dynamics of Nuclear Packing in the Early Drosophila Embryo. Biophys J 2019; 117:743-750. [PMID: 31378311 DOI: 10.1016/j.bpj.2019.07.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/18/2019] [Accepted: 07/09/2019] [Indexed: 10/26/2022] Open
Abstract
Embryonic development starts with cleavages, a rapid sequence of reductive divisions that result in an exponential increase of cell number without changing the overall size of the embryo. In Drosophila, the final four rounds of cleavages occur at the surface of the embryo and give rise to ∼6000 nuclei under a common plasma membrane. We use live imaging to study the dynamics of this process and to characterize the emergent nuclear packing in this system. We show that the characteristic length scale of the internuclear interaction scales with the density, which allows the densifying embryo to sustain the level of structural order at progressively smaller length scales. This is different from nonliving materials, which typically undergo disorder-order transition upon compression. To explain this dynamics, we use a particle-based model that accounts for density-dependent nuclear interactions and synchronous divisions. We reproduce the pair statistics of the disordered packings observed in embryos and recover the scaling relation between the characteristic length scale and the density both in real and reciprocal space. This result reveals how the embryo can robustly preserve the nuclear-packing structure while being densified. In addition to providing quantitative description of self-similar dynamics of nuclear packings, this model generates dynamic meshes for the computational analysis of pattern formation and tissue morphogenesis.
Collapse
Affiliation(s)
- Sayantan Dutta
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey
| | - Nareg J-V Djabrayan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey
| | - Salvatore Torquato
- Department of Chemistry, Princeton University, Princeton, New Jersey; Department of Physics, Princeton University, Princeton, New Jersey; Princeton Institute for the Science and Technology of Materials, Princeton University, Princeton, New Jersey; Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey
| | - Stanislav Y Shvartsman
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey; Department of Molecular Biology, Princeton University, Princeton, New Jersey.
| | - Matej Krajnc
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey.
| |
Collapse
|
10
|
Combs PA, Fraser HB. Spatially varying cis-regulatory divergence in Drosophila embryos elucidates cis-regulatory logic. PLoS Genet 2018; 14:e1007631. [PMID: 30383747 PMCID: PMC6211617 DOI: 10.1371/journal.pgen.1007631] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 08/14/2018] [Indexed: 12/30/2022] Open
Abstract
Spatial patterning of gene expression is a key process in development, yet how it evolves is still poorly understood. Both cis- and trans-acting changes could participate in complex interactions, so to isolate the cis-regulatory component of patterning evolution, we measured allele-specific spatial gene expression patterns in D. melanogaster × simulans hybrid embryos. RNA-seq of cryo-sectioned slices revealed 66 genes with strong spatially varying allele-specific expression. We found that hunchback, a major regulator of developmental patterning, had reduced expression of the D. simulans allele specifically in the anterior tip of hybrid embryos. Mathematical modeling of hunchback cis-regulation suggested a candidate transcription factor binding site variant, which we verified as causal using CRISPR-Cas9 genome editing. In sum, even comparing morphologically near-identical species we identified surprisingly extensive spatial variation in gene expression, suggesting not only that development is robust to many such changes, but also that natural selection may have ample raw material for evolving new body plans via changes in spatial patterning.
Collapse
Affiliation(s)
- Peter A. Combs
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Hunter B. Fraser
- Department of Biology, Stanford University, Stanford, California, United States of America
| |
Collapse
|