1
|
Lally P, Gómez-Romero L, Tierrafría VH, Aquino P, Rioualen C, Zhang X, Kim S, Baniulyte G, Plitnick J, Smith C, Babu M, Collado-Vides J, Wade JT, Galagan JE. Predictive Biophysical Neural Network Modeling of a Compendium of in vivo Transcription Factor DNA Binding Profiles for Escherichia coli. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.23.594371. [PMID: 38826350 PMCID: PMC11142182 DOI: 10.1101/2024.05.23.594371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We used these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We used BoltzNet to quantitatively design novel binding sites, which we validated with biophysical experiments on purified protein. We have generated models for 125 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
Collapse
Affiliation(s)
- Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México 14610, México
- Escuela de Medicina y Ciencias de la Salud, Tecnológico de Monterrey, Ciudad de México, México
| | - Víctor H. Tierrafría
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Patricia Aquino
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
| | - Xiaoman Zhang
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | | | - Jonathan Plitnick
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Carol Smith
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
| | - Mohan Babu
- Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK S4S 0A2, Canada
| | - Julio Collado-Vides
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, México
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Joseph T. Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA
- Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - James E. Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215
- Bioinformatics Program, Boston University, 24 Cummington Mall, Boston, MA 02215
| |
Collapse
|
2
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
3
|
Perez-Borrajero C, Heinkel F, Gsponer J, McIntosh LP. Conformational Plasticity and DNA-Binding Specificity of the Eukaryotic Transcription Factor Pax5. Biochemistry 2021; 60:104-117. [PMID: 33398994 DOI: 10.1021/acs.biochem.0c00737] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The eukaryotic transcription factor Pax5 has a DNA-binding Paired domain composed of two independent helical bundle subdomains joined by a flexible linker. Previously, we showed distinct biophysical properties of the N-terminal (NTD) and C-terminal (CTD) subdomains, with implications for how these two regions cooperate to distinguish nonspecific and cognate DNA sites [Perez-Borrajero, C., et al. (2016) J. Mol. Biol. 428, 2372-2391]. In this study, we combined experimental methods and molecular dynamics (MD) simulations to dissect the mechanisms underlying the functional differences between the Pax5 subdomains. Both subdomains showed a similar dependence of DNA-binding affinity on ionic strength. However, due to a greater contribution of non-ionic interactions, the NTD bound its cognate DNA half-site with an affinity approximately 10-fold higher than that of the CTD with its half-site. These interactions involve base-mediated contacts as evidenced by nuclear magnetic resonance spectroscopy-monitored chemical shift perturbations. Isothermal titration calorimetry revealed that favorable enthalpic and compensating unfavorable entropic changes were substantially larger for DNA binding by the NTD than by the CTD. Complementary MD simulations indicated that the DNA recognition helix H3 of the NTD is particularly flexible in the absence of DNA and undergoes the largest changes in conformational dynamics upon binding. Overall, these data suggest that the differences observed for the subdomains of Pax5 are due to the coupling of DNA binding with dampening of motions in the NTD required for specific base contacts. Thus, the conformational plasticity of the Pax5 Paired domain underpins the differing roles of its subdomains in association with nonspecific versus cognate DNA sites.
Collapse
Affiliation(s)
- Cecilia Perez-Borrajero
- Genome Sciences and Technology Program, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Department of Biochemistry and Molecular Biology, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| | - Florian Heinkel
- Genome Sciences and Technology Program, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Department of Biochemistry and Molecular Biology, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| | - Jörg Gsponer
- Department of Biochemistry and Molecular Biology, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Lawrence P McIntosh
- Department of Biochemistry and Molecular Biology, Life Sciences Institute, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada.,Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada.,Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada
| |
Collapse
|
4
|
Flickinger R. Polymorphism of simple sequence repeats may quantitatively regulate gene transcription. Exp Cell Res 2020; 390:111969. [PMID: 32199920 DOI: 10.1016/j.yexcr.2020.111969] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 02/15/2020] [Accepted: 03/17/2020] [Indexed: 02/07/2023]
Abstract
The degree of polymorphism, i.e., DNA sequence divergence, of short AT-rich tandemly arranged simple sequence repeats at or near promoters and 5'- untranslated regions of mRNA may quantitatively regulate transcription of tissue-specific genes. Less polymorphic repeats allow greater gene expression. Preferential binding of hypophosphorylated H1 histone to these repeats may diminish binding of transcription factors. Preferential binding of hypophosphorylated high mobility group chromatin proteins would increase this binding. Shorter simple sequence repeats have undergone fewer point mutations than longer repeats, hence they are less polymorphic and more conserved. The role of transcribed simple sequence repeats in frog embryo germ layer determination is considered.
Collapse
Affiliation(s)
- Reed Flickinger
- Department of Biological Sciences, State University of New York, Buffalo, N.Y. 14260, Mailing Address:P.O. Box 741 Captain Cook, HI, 96704, USA.
| |
Collapse
|
5
|
Mechanisms of Protein Search for Targets on DNA: Theoretical Insights. Molecules 2018; 23:molecules23092106. [PMID: 30131459 PMCID: PMC6225296 DOI: 10.3390/molecules23092106] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 08/13/2018] [Accepted: 08/17/2018] [Indexed: 11/17/2022] Open
Abstract
Protein-DNA interactions are critical for the successful functioning of all natural systems. The key role in these interactions is played by processes of protein search for specific sites on DNA. Although it has been studied for many years, only recently microscopic aspects of these processes became more clear. In this work, we present a review on current theoretical understanding of the molecular mechanisms of the protein target search. A comprehensive discrete-state stochastic method to explain the dynamics of the protein search phenomena is introduced and explained. Our theoretical approach utilizes a first-passage analysis and it takes into account the most relevant physical-chemical processes. It is able to describe many fascinating features of the protein search, including unusually high effective association rates, high selectivity and specificity, and the robustness in the presence of crowders and sequence heterogeneity.
Collapse
|
6
|
Imashimizu M, Lukatsky DB. Transcription pausing: biological significance of thermal fluctuations biased by repetitive genomic sequences. Transcription 2017; 9:196-203. [PMID: 29105534 PMCID: PMC5927657 DOI: 10.1080/21541264.2017.1393492] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Transcription of DNA by RNA polymerase (RNAP) takes place in a cell environment dominated by thermal fluctuations. How are transcription reactions including initiation, elongation, and termination on genomic DNA so well-controlled during such fluctuations? A recent statistical mechanical approach using high-throughput sequencing data reveals that repetitive DNA sequence elements embedded into a genomic sequence provide the key mechanism to functionally bias the fluctuations of transcription elongation complexes. In particular, during elongation pausing, such repetitive sequence elements can increase the magnitude of one-dimensional diffusion of the RNAP enzyme on the DNA upstream of the pausing site, generating a large variation in the dwell times of RNAP pausing under the control of these genomic signals.
Collapse
Affiliation(s)
- Masahiko Imashimizu
- a Institute of Medical Science , University of Tokyo , Minato-ku, Tokyo , Japan
| | - David B Lukatsky
- b Department of Chemistry , Ben-Gurion University of the Negev , Be'er Sheva , Israel
| |
Collapse
|
7
|
Kasinathan S, Zentner GE, Xin B, Rohs R, Henikoff S. Correspondence: Reply to 'DNA shape is insufficient to explain binding'. Nat Commun 2017; 8:15644. [PMID: 28580953 PMCID: PMC5465350 DOI: 10.1038/ncomms15644] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Affiliation(s)
- Sivakanthan Kasinathan
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.,Medical Scientist Training Program, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, Washington 98195, USA
| | - Gabriel E Zentner
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Beibei Xin
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California 90089, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California 90089, USA
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.,Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| |
Collapse
|
8
|
Control of transcriptional pausing by biased thermal fluctuations on repetitive genomic sequences. Proc Natl Acad Sci U S A 2016; 113:E7409-E7417. [PMID: 27830653 DOI: 10.1073/pnas.1607760113] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In the process of transcription elongation, RNA polymerase (RNAP) pauses at highly nonrandom positions across genomic DNA, broadly regulating transcription; however, molecular mechanisms responsible for the recognition of such pausing positions remain poorly understood. Here, using a combination of statistical mechanical modeling and high-throughput sequencing and biochemical data, we evaluate the effect of thermal fluctuations on the regulation of RNAP pausing. We demonstrate that diffusive backtracking of RNAP, which is biased by repetitive DNA sequence elements, causes transcriptional pausing. This effect stems from the increased microscopic heterogeneity of an elongation complex, and thus is entropy-dominated. This report shows a linkage between repetitive sequence elements encoded in the genome and regulation of RNAP pausing driven by thermal fluctuations.
Collapse
|
9
|
Shvets AA, Kolomeisky AB. Sequence heterogeneity accelerates protein search for targets on DNA. J Chem Phys 2016; 143:245101. [PMID: 26723711 DOI: 10.1063/1.4937938] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome.
Collapse
Affiliation(s)
- Alexey A Shvets
- Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| | - Anatoly B Kolomeisky
- Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
10
|
Parmar JJ, Das D, Padinhateeri R. Theoretical estimates of exposure timescales of protein binding sites on DNA regulated by nucleosome kinetics. Nucleic Acids Res 2016; 44:1630-41. [PMID: 26553807 PMCID: PMC4770213 DOI: 10.1093/nar/gkv1153] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 09/29/2015] [Accepted: 10/19/2015] [Indexed: 12/14/2022] Open
Abstract
It is being increasingly realized that nucleosome organization on DNA crucially regulates DNA-protein interactions and the resulting gene expression. While the spatial character of the nucleosome positioning on DNA has been experimentally and theoretically studied extensively, the temporal character is poorly understood. Accounting for ATPase activity and DNA-sequence effects on nucleosome kinetics, we develop a theoretical method to estimate the time of continuous exposure of binding sites of non-histone proteins (e.g. transcription factors and TATA binding proteins) along any genome. Applying the method to Saccharomyces cerevisiae, we show that the exposure timescales are determined by cooperative dynamics of multiple nucleosomes, and their behavior is often different from expectations based on static nucleosome occupancy. Examining exposure times in the promoters of GAL1 and PHO5, we show that our theoretical predictions are consistent with known experiments. We apply our method genome-wide and discover huge gene-to-gene variability of mean exposure times of TATA boxes and patches adjacent to TSS (+1 nucleosome region); the resulting timescale distributions have non-exponential tails.
Collapse
Affiliation(s)
- Jyotsana J Parmar
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Dibyendu Das
- Department of Physics, Indian Institute of Technology Bombay, Mumbai 400076, India
| | - Ranjith Padinhateeri
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai 400076, India
| |
Collapse
|
11
|
Scheidegger A, Nechaev S. RNA polymerase II pausing as a context-dependent reader of the genome. Biochem Cell Biol 2015; 94:82-92. [PMID: 26555214 DOI: 10.1139/bcb-2015-0045] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
The RNA polymerase II (Pol II) transcribes all mRNA genes in eukaryotes and is among the most highly regulated enzymes in the cell. The classic model of mRNA gene regulation involves recruitment of the RNA polymerase to gene promoters in response to environmental signals. Higher eukaryotes have an additional ability to generate multiple cell types. This extra level of regulation enables each cell to interpret the same genome by committing to one of the many possible transcription programs and executing it in a precise and robust manner. Whereas multiple mechanisms are implicated in cell type-specific transcriptional regulation, how one genome can give rise to distinct transcriptional programs and what mechanisms activate and maintain the appropriate program in each cell remains unclear. This review focuses on the process of promoter-proximal Pol II pausing during early transcription elongation as a key step in context-dependent interpretation of the metazoan genome. We highlight aspects of promoter-proximal Pol II pausing, including its interplay with epigenetic mechanisms, that may enable cell type-specific regulation, and emphasize some of the pertinent questions that remain unanswered and open for investigation.
Collapse
Affiliation(s)
- Adam Scheidegger
- Department of Basic Sciences, University of North Dakota School of Medicine, Grand Forks, ND 58201, USA.,Department of Basic Sciences, University of North Dakota School of Medicine, Grand Forks, ND 58201, USA
| | - Sergei Nechaev
- Department of Basic Sciences, University of North Dakota School of Medicine, Grand Forks, ND 58201, USA.,Department of Basic Sciences, University of North Dakota School of Medicine, Grand Forks, ND 58201, USA
| |
Collapse
|
12
|
Afek A, Cohen H, Barber-Zucker S, Gordân R, Lukatsky DB. Nonconsensus Protein Binding to Repetitive DNA Sequence Elements Significantly Affects Eukaryotic Genomes. PLoS Comput Biol 2015; 11:e1004429. [PMID: 26285121 PMCID: PMC4540582 DOI: 10.1371/journal.pcbi.1004429] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 06/30/2015] [Indexed: 01/10/2023] Open
Abstract
Recent genome-wide experiments in different eukaryotic genomes provide an unprecedented view of transcription factor (TF) binding locations and of nucleosome occupancy. These experiments revealed that a large fraction of TF binding events occur in regions where only a small number of specific TF binding sites (TFBSs) have been detected. Furthermore, in vitro protein-DNA binding measurements performed for hundreds of TFs indicate that TFs are bound with wide range of affinities to different DNA sequences that lack known consensus motifs. These observations have thus challenged the classical picture of specific protein-DNA binding and strongly suggest the existence of additional recognition mechanisms that affect protein-DNA binding preferences. We have previously demonstrated that repetitive DNA sequence elements characterized by certain symmetries statistically affect protein-DNA binding preferences. We call this binding mechanism nonconsensus protein-DNA binding in order to emphasize the point that specific consensus TFBSs do not contribute to this effect. In this paper, using the simple statistical mechanics model developed previously, we calculate the nonconsensus protein-DNA binding free energy for the entire C. elegans and D. melanogaster genomes. Using the available chromatin immunoprecipitation followed by sequencing (ChIP-seq) results on TF-DNA binding preferences for ~100 TFs, we show that DNA sequences characterized by low predicted free energy of nonconsensus binding have statistically higher experimental TF occupancy and lower nucleosome occupancy than sequences characterized by high free energy of nonconsensus binding. This is in agreement with our previous analysis performed for the yeast genome. We suggest therefore that nonconsensus protein-DNA binding assists the formation of nucleosome-free regions, as TFs outcompete nucleosomes at genomic locations with enhanced nonconsensus binding. In addition, here we perform a new, large-scale analysis using in vitro TF-DNA preferences obtained from the universal protein binding microarrays (PBM) for ~90 eukaryotic TFs belonging to 22 different DNA-binding domain types. As a result of this new analysis, we conclude that nonconsensus protein-DNA binding is a widespread phenomenon that significantly affects protein-DNA binding preferences and need not require the presence of consensus (specific) TFBSs in order to achieve genome-wide TF-DNA binding specificity. Interactions between proteins and DNA trigger many important biological processes. Therefore, to fully understand how the information encoded on the DNA transcribes into RNA, which in turn translates into proteins in the cell, we need to unravel the molecular design principles of protein-DNA interactions. It is known that many interactions occur when a protein is attracted to a specific short segment on the DNA called a specific protein-DNA binding motif. Strikingly, recent experiments revealed that many regulatory proteins reproducibly bind to different regions on the DNA lacking such specific motifs. This suggests that fundamental molecular mechanisms responsible for protein-DNA recognition specificity are not fully understood. Here, using high-throughput protein-DNA binding data obtained by two entirely different methods for ~100 TFs in each case, we show that DNA regions possessing certain repetitive sequence elements exert the statistical attractive potential on DNA-binding proteins, and as a result, such DNA regions are enriched in bound proteins. This is in agreement with our previous analysis performed for the yeast genome. We use the term nonconsensus protein-DNA binding in order to describe protein-DNA interactions that occur in the absence of specific protein-DNA binding motifs. Here we demonstrate that the identified nonconsensus effect is highly significant for a variety of organismal genomes and it affects protein-DNA binding preferences and nucleosome occupancy at the genome-wide level.
Collapse
Affiliation(s)
- Ariel Afek
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Hila Cohen
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - David B. Lukatsky
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
- * E-mail:
| |
Collapse
|
13
|
Teif VB, Kepper N, Yserentant K, Wedemann G, Rippe K. Affinity, stoichiometry and cooperativity of heterochromatin protein 1 (HP1) binding to nucleosomal arrays. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2015; 27:064110. [PMID: 25563825 DOI: 10.1088/0953-8984/27/6/064110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Heterochromatin protein 1 (HP1) participates in establishing and maintaining heterochromatin via its histone-modification-dependent chromatin interactions. In recent papers HP1 binding to nucleosomal arrays was measured in vitro and interpreted in terms of nearest-neighbour cooperative binding. This mode of chromatin interaction could lead to the spreading of HP1 along the nucleosome chain. Here, we reanalysed previous data by representing the nucleosome chain as a 1D binding lattice and showed how the experimental HP1 binding isotherms can be explained by a simpler model without cooperative interactions between neighboring HP1 dimers. Based on these calculations and spatial models of dinucleosomes and nucleosome chains, we propose that binding stoichiometry depends on the nucleosome repeat length (NRL) rather than protein interactions between HP1 dimers. According to our calculations, more open nucleosome arrays with long DNA linkers are characterized by a larger number of binding sites in comparison to chains with a short NRL. Furthermore, we demonstrate by Monte Carlo simulations that the NRL dependent folding of the nucleosome chain can induce allosteric changes of HP1 binding sites. Thus, HP1 chromatin interactions can be modulated by the change of binding stoichiometry and the type of binding to condensed (methylated) and non-condensed (unmethylated) nucleosome arrays in the absence of direct interactions between HP1 dimers.
Collapse
Affiliation(s)
- Vladimir B Teif
- Deutsches Krebsforschungszentrum & BioQuant, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany
| | | | | | | | | |
Collapse
|
14
|
Abstract
Nucleosome is a histone-DNA complex known as the fundamental repeating unit of chromatin. Up to 90% of eukaryotic DNA is wrapped around consecutive octamers made of the core histones H2A, H2B, H3 and H4. Nucleosome positioning affects numerous cellular processes that require robust and timely access to genomic DNA, which is packaged into the tight confines of the cell nucleus. In living cells, nucleosome positions are determined by intrinsic histone-DNA sequence preferences, competition between histones and other DNA-binding proteins for genomic sequence, and ATP-dependent chromatin remodelers. We discuss the major energetic contributions to nucleosome formation and remodeling, focusing especially on partial DNA unwrapping off the histone octamer surface. DNA unwrapping enables efficient access to nucleosome-buried binding sites and mediates rapid nucleosome removal through concerted action of two or more DNA-binding factors. High-resolution, genome-scale maps of distances between neighboring nucleosomes have shown that DNA unwrapping and nucleosome crowding (mutual invasion of nucleosome territories) are much more common than previously thought. Ultimately, constraints imposed by nucleosome energetics on the rates of ATP-dependent and spontaneous chromatin remodeling determine nucleosome occupancy genome-wide, and shape pathways of cellular response to environmental stresses.
Collapse
|
15
|
Beshnova DA, Cherstvy AG, Vainshtein Y, Teif VB. Regulation of the nucleosome repeat length in vivo by the DNA sequence, protein concentrations and long-range interactions. PLoS Comput Biol 2014; 10:e1003698. [PMID: 24992723 PMCID: PMC4081033 DOI: 10.1371/journal.pcbi.1003698] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2014] [Accepted: 05/16/2014] [Indexed: 12/12/2022] Open
Abstract
The nucleosome repeat length (NRL) is an integral chromatin property important for its biological functions. Recent experiments revealed several conflicting trends of the NRL dependence on the concentrations of histones and other architectural chromatin proteins, both in vitro and in vivo, but a systematic theoretical description of NRL as a function of DNA sequence and epigenetic determinants is currently lacking. To address this problem, we have performed an integrative biophysical and bioinformatics analysis in species ranging from yeast to frog to mouse where NRL was studied as a function of various parameters. We show that in simple eukaryotes such as yeast, a lower limit for the NRL value exists, determined by internucleosome interactions and remodeler action. For higher eukaryotes, also the upper limit exists since NRL is an increasing but saturating function of the linker histone concentration. Counterintuitively, smaller H1 variants or non-histone architectural proteins can initiate larger effects on the NRL due to entropic reasons. Furthermore, we demonstrate that different regimes of the NRL dependence on histone concentrations exist depending on whether DNA sequence-specific effects dominate over boundary effects or vice versa. We consider several classes of genomic regions with apparently different regimes of the NRL variation. As one extreme, our analysis reveals that the period of oscillations of the nucleosome density around bound RNA polymerase coincides with the period of oscillations of positioning sites of the corresponding DNA sequence. At another extreme, we show that although mouse major satellite repeats intrinsically encode well-defined nucleosome preferences, they have no unique nucleosome arrangement and can undergo a switch between two distinct types of nucleosome positioning.
Collapse
Affiliation(s)
- Daria A. Beshnova
- Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, Heidelberg, Germany
| | - Andrey G. Cherstvy
- Institute for Physics and Astronomy, University of Potsdam, Potsdam-Golm, Germany
| | - Yevhen Vainshtein
- Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, Heidelberg, Germany
| | - Vladimir B. Teif
- Deutsches Krebsforschungszentrum (DKFZ) and BioQuant, Heidelberg, Germany
| |
Collapse
|
16
|
Afek A, Lukatsky DB. Positive and negative design for nonconsensus protein-DNA binding affinity in the vicinity of functional binding sites. Biophys J 2014; 105:1653-60. [PMID: 24094406 DOI: 10.1016/j.bpj.2013.08.033] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 08/04/2013] [Accepted: 08/26/2013] [Indexed: 01/01/2023] Open
Abstract
Recent experiments provide an unprecedented view of protein-DNA binding in yeast and human genomes at single-nucleotide resolution. These measurements, performed over large cell populations, show quite generally that sequence-specific transcription regulators with well-defined protein-DNA consensus motifs bind only a fraction among all consensus motifs present in the genome. Alternatively, proteins in vivo often bind DNA regions lacking known consensus sequences. The rules determining whether a consensus motif is functional remain incompletely understood. Here we predict that genomic background surrounding specific protein-DNA binding motifs statistically modulates the binding of sequence-specific transcription regulators to these motifs. In particular, we show that nonconsensus protein-DNA binding in yeast is statistically enhanced, on average, around functional Reb1 motifs that are bound as compared to nonfunctional Reb1 motifs that are unbound. The landscape of nonconsensus protein-DNA binding around functional CTCF motifs in human demonstrates a more complex behavior. In particular, human genomic regions characterized by the highest CTCF occupancy, show statistically reduced level of nonconsensus protein-DNA binding. Our findings suggest that nonconsensus protein-DNA binding is fine-tuned around functional binding sites using a variety of design strategies.
Collapse
Affiliation(s)
- Ariel Afek
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | |
Collapse
|
17
|
Genome-wide organization of eukaryotic preinitiation complex is influenced by nonconsensus protein-DNA binding. Biophys J 2013; 104:1107-15. [PMID: 23473494 DOI: 10.1016/j.bpj.2013.01.038] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2012] [Revised: 01/14/2013] [Accepted: 01/28/2013] [Indexed: 01/24/2023] Open
Abstract
Genome-wide binding preferences of the key components of eukaryotic preinitiation complex (PIC) have been recently measured at high resolution in Saccharomyces cerevisiae by Rhee and Pugh. However, the rules determining the PIC binding specificity remain poorly understood. In this study, we show that nonconsensus protein-DNA binding significantly influences PIC binding preferences. We estimate that such nonconsensus binding contributes statistically at least 2-3 kcal/mol (on average) of additional attractive free energy per protein per core-promoter region. The predicted attractive effect is particularly strong at repeated poly(dA:dT) and poly(dC:dG) tracts. Overall, the computed free-energy landscape of nonconsensus protein-DNA binding shows strong correlation with the measured genome-wide PIC occupancy. Remarkably, statistical PIC preferences of binding to both TFIID-dominated and SAGA-dominated genes correlate with the nonconsensus free-energy landscape, yet these two groups of genes are distinguishable based on the average free-energy profiles. We suggest that the predicted nonconsensus binding mechanism provides a genome-wide background for specific promoter elements, such as transcription-factor binding sites, TATA-like elements, and specific binding of the PIC components to nucleosomes. We also show that nonconsensus binding has genome-wide influence on transcriptional frequency.
Collapse
|
18
|
Afek A, Lukatsky DB. Nonspecific protein-DNA binding is widespread in the yeast genome. Biophys J 2012; 102:1881-8. [PMID: 22768944 DOI: 10.1016/j.bpj.2012.03.044] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/13/2012] [Accepted: 03/20/2012] [Indexed: 11/16/2022] Open
Abstract
Recent genome-wide measurements of binding preferences of ~200 transcription regulators in the vicinity of transcription start sites in yeast, have provided a unique insight into the cis-regulatory code of a eukaryotic genome. Here, we show that nonspecific transcription factor (TF)-DNA binding significantly influences binding preferences of the majority of transcription regulators in promoter regions of the yeast genome. We show that promoters of SAGA-dominated and TFIID-dominated genes can be statistically distinguished based on the landscape of nonspecific protein-DNA binding free energy. In particular, we predict that promoters of SAGA-dominated genes possess wider regions of reduced free energy compared to promoters of TFIID-dominated genes. We also show that specific and nonspecific TF-DNA binding are functionally linked and cooperatively influence gene expression in yeast. Our results suggest that nonspecific TF-DNA binding is intrinsically encoded into the yeast genome, and it may play a more important role in transcriptional regulation than previously thought.
Collapse
Affiliation(s)
- Ariel Afek
- Department of Chemistry, Ben-Gurion University of the Negev, Be'er-Sheva, Israel
| | | |
Collapse
|
19
|
Singh V, Owen-Hughes T. Evolutionary insights into genome-wide nucleosome positioning. Genome Biol 2012. [PMID: 23017016 PMCID: PMC3491388 DOI: 10.1186/gb-2012-13-9-170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023] Open
|
20
|
Kolomeisky AB, Veksler A. How to accelerate protein search on DNA: location and dissociation. J Chem Phys 2012; 136:125101. [PMID: 22462896 DOI: 10.1063/1.3697763] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
One of the most important features of biological systems that controls their functioning is the ability of protein molecules to find and recognize quickly specific target sites on DNA. Although these phenomena have been studied extensively, detailed mechanisms of protein-DNA interactions during the search are still not well understood. Experiments suggest that proteins typically find their targets fast by combining three-dimensional and one-dimensional motions, and most of the searching time proteins are non-specifically bound to DNA. However these observations are surprising since proteins diffuse very slowly on DNA, and it seems that the observed fast search cannot be achieved under these conditions for single proteins. Here we propose two simple mechanisms that might explain some of these controversial observations. Using first-passage time analysis, it is shown explicitly that the search can be accelerated by changing the location of the target and by effectively irreversible dissociations of proteins. Our theoretical predictions are supported by Monte Carlo computer simulations.
Collapse
|
21
|
Singh V, Owen-Hughes T. Evolutionary insights into genome-wide nucleosome positioning. Genome Biol 2012; 13:170. [DOI: 10.1186/gb4046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|