1
|
Xu C, Kleinschmidt H, Yang J, Leith EM, Johnson J, Tan S, Mahony S, Bai L. Systematic dissection of sequence features affecting binding specificity of a pioneer factor reveals binding synergy between FOXA1 and AP-1. Mol Cell 2024; 84:2838-2855.e10. [PMID: 39019045 PMCID: PMC11334613 DOI: 10.1016/j.molcel.2024.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/23/2024] [Accepted: 06/21/2024] [Indexed: 07/19/2024]
Abstract
Despite the unique ability of pioneer factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called chromatin immunoprecipitation with integrated synthetic oligonucleotides (ChIP-ISO) to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1, in human A549 cells. Combining ChIP-ISO with in vitro and neural network analyses, we find that (1) FOXA1 binding is strongly affected by co-binding transcription factors (TFs) AP-1 and CEBPB; (2) FOXA1 and AP-1 show binding cooperativity in vitro; (3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin; and (4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.
Collapse
Affiliation(s)
- Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jianyu Yang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Erik M Leith
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jenna Johnson
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Song Tan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA; Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA; Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
2
|
Murthy S, Dey U, Olymon K, Abbas E, Yella VR, Kumar A. Discerning the Role of DNA Sequence, Shape, and Flexibility in Recognition by Drosophila Transcription Factors. ACS Chem Biol 2024; 19:1533-1543. [PMID: 38902964 DOI: 10.1021/acschembio.4c00202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
The precise spatial and temporal orchestration of gene expression is crucial for the ontogeny of an organism and is mainly governed by transcription factors (TFs). The mechanism of recognition of cognate sites amid millions of base pairs in the genome by TFs is still incompletely understood. In this study, we focus on DNA sequence composition, shape, and flexibility preferences of 28 quintessential TFs from Drosophila melanogaster that are critical to development and body patterning mechanisms. Our study finds that TFs exhibit distinct predilections for DNA shape, flexibility, and sequence compositions in the proximity of transcription factor binding sites (TFBSs). Notably, certain zinc finger proteins prefer GC-rich areas with less negative propeller twist, while homeodomains mainly seek AT-rich regions with a more negative propeller twist at their sites. Intriguingly, while numerous cofactors share similar binding site preferences and bind closer to each other in the genome, some cofactors that have different preferences bind farther apart. These findings shed light on TF DNA recognition and provide novel insights into possible cofactor binding and transcriptional regulation mechanisms.
Collapse
Affiliation(s)
- Smrithi Murthy
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Upalabdha Dey
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Kaushika Olymon
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Eshan Abbas
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Venkata Rajesh Yella
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur 520002, India
| | - Aditya Kumar
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| |
Collapse
|
3
|
Li J, Rohs R. Deep DNAshape webserver: prediction and real-time visualization of DNA shape considering extended k-mers. Nucleic Acids Res 2024; 52:W7-W12. [PMID: 38801070 PMCID: PMC11223853 DOI: 10.1093/nar/gkae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/30/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024] Open
Abstract
Sequence-dependent DNA shape plays an important role in understanding protein-DNA binding mechanisms. High-throughput prediction of DNA shape features has become a valuable tool in the field of protein-DNA recognition, transcription factor-DNA binding specificity, and gene regulation. However, our widely used webserver, DNAshape, relies on statistically summarized pentamer query tables to query DNA shape features. These query tables do not consider flanking regions longer than two base pairs, and acquiring a query table for hexamers or higher-order k-mers is currently still unrealistic due to limitations in achieving sufficient statistical coverage in molecular simulations or structural biology experiments. A recent deep-learning method, Deep DNAshape, can predict DNA shape features at the core of a DNA fragment considering flanking regions of up to seven base pairs, trained on limited simulation data. However, Deep DNAshape is rather complicated to install, and it must run locally compared to the pentamer-based DNAshape webserver, creating a barrier for users. Here, we present the Deep DNAshape webserver, which has the benefits of both methods while being accurate, fast, and accessible to all users. Additional improvements of the webserver include the detection of user input in real time, the ability of interactive visualization tools and different modes of analyses. URL: https://deepdnashape.usc.edu.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
4
|
Manghrani A, Rangadurai AK, Szekely O, Liu B, Guseva S, Al-Hashimi HM. Quantitative and systematic NMR measurements of sequence-dependent A-T Hoogsteen dynamics uncovers unique conformational specificity in the DNA double helix. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594415. [PMID: 38798635 PMCID: PMC11118333 DOI: 10.1101/2024.05.15.594415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The propensities to form lowly-populated short-lived conformations of DNA could vary with sequence, providing an important source of sequence-specificity in biochemical reactions. However, comprehensively measuring how these dynamics vary with sequence is challenging. Using 1H CEST and 13C R 1 ρ NMR, we measured Watson-Crick to Hoogsteen dynamics for an A-T base pair in thirteen trinucleotide sequence contexts. The Hoogsteen population and exchange rate varied 4-fold and 16-fold, respectively, and were dependent on both the 3'- and 5'-neighbors but only weakly dependent on monovalent ion concentration (25 versus 100 mM NaCl) and pH (6.8 versus 8.0). Flexible TA and CA dinucleotide steps exhibited the highest Hoogsteen populations, and their kinetics rates strongly depended on the 3'-neighbor. In contrast, the stiffer AA and GA steps had the lowest Hoogsteen population, and their kinetics were weakly dependent on the 3'-neighbor. The Hoogsteen lifetime was especially short when G-C neighbors flanked the A-T base pair. The Hoogsteen dynamics had a distinct sequence-dependence compared to duplex stability and minor groove width. Thus, our results uncover a unique source of sequence-specificity hidden within the DNA double helix in the form of A-T Hoogsteen dynamics and establish the utility of 1H CEST to quantitively measure sequence-dependent DNA dynamics.
Collapse
Affiliation(s)
- Akanksha Manghrani
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Atul Kaushik Rangadurai
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
- Program in Molecular Medicine, Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada
| | - Or Szekely
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Bei Liu
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina 27705, United States
| | - Serafima Guseva
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| | - Hashim M. Al-Hashimi
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| |
Collapse
|
5
|
Chen N, Yu J, Liu Z, Meng L, Li X, Wong KC. Discovering DNA shape motifs with multiple DNA shape features: generalization, methods, and validation. Nucleic Acids Res 2024; 52:4137-4150. [PMID: 38572749 PMCID: PMC11077088 DOI: 10.1093/nar/gkae210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 03/06/2024] [Accepted: 03/12/2024] [Indexed: 04/05/2024] Open
Abstract
DNA motifs are crucial patterns in gene regulation. DNA-binding proteins (DBPs), including transcription factors, can bind to specific DNA motifs to regulate gene expression and other cellular activities. Past studies suggest that DNA shape features could be subtly involved in DNA-DBP interactions. Therefore, the shape motif annotations based on intrinsic DNA topology can deepen the understanding of DNA-DBP binding. Nevertheless, high-throughput tools for DNA shape motif discovery that incorporate multiple features altogether remain insufficient. To address it, we propose a series of methods to discover non-redundant DNA shape motifs with the generalization to multiple motifs in multiple shape features. Specifically, an existing Gibbs sampling method is generalized to multiple DNA motif discovery with multiple shape features. Meanwhile, an expectation-maximization (EM) method and a hybrid method coupling EM with Gibbs sampling are proposed and developed with promising performance, convergence capability, and efficiency. The discovered DNA shape motif instances reveal insights into low-signal ChIP-seq peak summits, complementing the existing sequence motif discovery works. Additionally, our modelling captures the potential interplays across multiple DNA shape features. We provide a valuable platform of tools for DNA shape motif discovery. An R package is built for open accessibility and long-lasting impact: https://zenodo.org/doi/10.5281/zenodo.10558980.
Collapse
Affiliation(s)
- Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhe Liu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lingkuan Meng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun City, Jilin Province, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Hong Kong Institute of Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
6
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. Genetics 2024; 226:iyae013. [PMID: 38298127 PMCID: PMC10990422 DOI: 10.1093/genetics/iyae013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/11/2023] [Accepted: 01/05/2024] [Indexed: 02/02/2024] Open
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than polymerase slippage in replicating progenitor cells. These results echo the recent finding that DNA damage in oocytes is a significant source of de novo single nucleotide variants and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to known hotspots of oocyte mutagenesis, nor are postzygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on de novo mutation (DNM) rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at G/C-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and contradict prior attribution of replication slippage as the primary mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E Goldberg
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Aaron R Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, Salt Lake City, UT 84112, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
7
|
Li J, Chiu TP, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun 2024; 15:1243. [PMID: 38336958 PMCID: PMC10858265 DOI: 10.1038/s41467-024-45191-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 01/17/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, 90089, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
8
|
Jiang Y, Chiu TP, Mitra R, Rohs R. Probing the role of the protonation state of a minor groove-linker histidine in Exd-Hox-DNA binding. Biophys J 2024; 123:248-259. [PMID: 38130056 PMCID: PMC10808038 DOI: 10.1016/j.bpj.2023.12.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 09/22/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
DNA recognition and targeting by transcription factors (TFs) through specific binding are fundamental in biological processes. Furthermore, the histidine protonation state at the TF-DNA binding interface can significantly influence the binding mechanism of TF-DNA complexes. Nevertheless, the role of histidine in TF-DNA complexes remains underexplored. Here, we employed all-atom molecular dynamics simulations using AlphaFold2-modeled complexes based on previously solved co-crystal structures to probe the role of the His-12 residue in the Extradenticle (Exd)-Sex combs reduced (Scr)-DNA complex when binding to Scr and Ultrabithorax (Ubx) target sites. Our results demonstrate that the protonation state of histidine notably affected the DNA minor-groove width profile and binding free energy. Examining flanking sequences of various binding affinities derived from SELEX-seq experiments, we analyzed the relationship between binding affinity and specificity. We uncovered how histidine protonation leads to increased binding affinity but can lower specificity. Our findings provide new mechanistic insights into the role of histidine in modulating TF-DNA binding.
Collapse
Affiliation(s)
- Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California
| | - Raktim Mitra
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California; Department of Chemistry, University of Southern California, Los Angeles, California; Department of Physics and Astronomy, University of Southern California, Los Angeles, California; Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, California.
| |
Collapse
|
9
|
Vishnevsky OV, Bocharnikov AV, Ignatieva EV. Peak Scores Significantly Depend on the Relationships between Contextual Signals in ChIP-Seq Peaks. Int J Mol Sci 2024; 25:1011. [PMID: 38256085 PMCID: PMC10816497 DOI: 10.3390/ijms25021011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/13/2023] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) is a central genome-wide method for in vivo analyses of DNA-protein interactions in various cellular conditions. Numerous studies have demonstrated the complex contextual organization of ChIP-seq peak sequences and the presence of binding sites for transcription factors in them. We assessed the dependence of the ChIP-seq peak score on the presence of different contextual signals in the peak sequences by analyzing these sequences from several ChIP-seq experiments using our fully enumerative GPU-based de novo motif discovery method, Argo_CUDA. Analysis revealed sets of significant IUPAC motifs corresponding to the binding sites of the target and partner transcription factors. For these ChIP-seq experiments, multiple regression models were constructed, demonstrating a significant dependence of the peak scores on the presence in the peak sequences of not only highly significant target motifs but also less significant motifs corresponding to the binding sites of the partner transcription factors. A significant correlation was shown between the presence of the target motifs FOXA2 and the partner motifs HNF4G, which found experimental confirmation in the scientific literature, demonstrating the important contribution of the partner transcription factors to the binding of the target transcription factor to DNA and, consequently, their important contribution to the peak score.
Collapse
Affiliation(s)
- Oleg V. Vishnevsky
- Institute of Cytology and Genetics, 630090 Novosibirsk, Russia;
- Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia;
| | - Andrey V. Bocharnikov
- Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia;
| | - Elena V. Ignatieva
- Institute of Cytology and Genetics, 630090 Novosibirsk, Russia;
- Department of Natural Science, Novosibirsk State University, 630090 Novosibirsk, Russia;
| |
Collapse
|
10
|
Salomone J, Farrow E, Gebelein B. Homeodomain complex formation and biomolecular condensates in Hox gene regulation. Semin Cell Dev Biol 2024; 152-153:93-100. [PMID: 36517343 PMCID: PMC10258226 DOI: 10.1016/j.semcdb.2022.11.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 10/21/2022] [Accepted: 11/30/2022] [Indexed: 12/15/2022]
Abstract
Hox genes are a family of homeodomain transcription factors that regulate specialized morphological structures along the anterior-posterior axis of metazoans. Over the past few decades, researchers have focused on defining how Hox factors with similar in vitro DNA binding activities achieve sufficient target specificity to regulate distinct cell fates in vivo. In this review, we highlight how protein interactions with other transcription factors, many of which are also homeodomain proteins, result in the formation of transcription factor complexes with enhanced DNA binding specificity. These findings suggest that Hox-regulated enhancers utilize distinct combinations of homeodomain binding sites, many of which are low-affinity, to recruit specific Hox complexes. However, low-affinity sites can only yield reproducible responses with high transcription factor concentrations. To overcome this limitation, recent studies revealed how transcription factors, including Hox factors, use intrinsically disordered domains (IDRs) to form biomolecular condensates that increase protein concentrations. Moreover, Hox factors with altered IDRs have been associated with altered transcriptional activity and human disease states, demonstrating the importance of IDRs in mediating essential Hox output. Collectively, these studies highlight how Hox factors use their DNA binding domains, protein-protein interaction domains, and IDRs to form specific transcription factor complexes that yield accurate gene expression.
Collapse
Affiliation(s)
- Joseph Salomone
- Graduate Program in Molecular and Developmental Biology, Cincinnati Children's Hospital Research Foundation, Cincinnati, OH 45229, USA; Medical-Scientist Training Program, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Edward Farrow
- Graduate Program in Molecular and Developmental Biology, Cincinnati Children's Hospital Research Foundation, Cincinnati, OH 45229, USA; Medical-Scientist Training Program, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Brian Gebelein
- Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Ave, MLC 7007, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA.
| |
Collapse
|
11
|
Goldberg ME, Noyes MD, Eichler EE, Quinlan AR, Harris K. Effects of parental age and polymer composition on short tandem repeat de novo mutation rates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.22.573131. [PMID: 38187618 PMCID: PMC10769404 DOI: 10.1101/2023.12.22.573131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Short tandem repeats (STRs) are hotspots of genomic variability in the human germline because of their high mutation rates, which have long been attributed largely to polymerase slippage during DNA replication. This model suggests that STR mutation rates should scale linearly with a father's age, as progenitor cells continually divide after puberty. In contrast, it suggests that STR mutation rates should not scale with a mother's age at her child's conception, since oocytes spend a mother's reproductive years arrested in meiosis II and undergo a fixed number of cell divisions that are independent of the age at ovulation. Yet, mirroring recent findings, we find that STR mutation rates covary with paternal and maternal age, implying that some STR mutations are caused by DNA damage in quiescent cells rather than the classical mechanism of polymerase slippage in replicating progenitor cells. These results also echo the recent finding that DNA damage in quiescent oocytes is a significant source of de novo SNVs and corroborate evidence of STR expansion in postmitotic cells. However, we find that the maternal age effect is not confined to previously discovered hotspots of oocyte mutagenesis, nor are post-zygotic mutations likely to contribute significantly. STR nucleotide composition demonstrates divergent effects on DNM rates between sexes. Unlike the paternal lineage, maternally derived DNMs at A/T STRs display a significantly greater association with maternal age than DNMs at GC-containing STRs. These observations may suggest the mechanism and developmental timing of certain STR mutations and are especially surprising considering the prior belief in replication slippage as the dominant mechanism of STR mutagenesis.
Collapse
Affiliation(s)
- Michael E. Goldberg
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
| | - Michelle D. Noyes
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Howard Hughes Medical Institute, 3720 15 Ave NE, University of Washington, Seattle, WA, 98195
| | - Aaron R. Quinlan
- Departments of Human Genetics and Biomedical Informatics, University of Utah, 15 S 2030 E, Salt Lake City, UT, 84112
- These authors contributed equally to this work
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, 3720 15 Ave NE, Seattle, WA, 98195
- Computational Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA, 98109
- These authors contributed equally to this work
| |
Collapse
|
12
|
Mitra R, Li J, Sagendorf JM, Jiang Y, Chiu TP, Rohs R. DeepPBS: Geometric deep learning for interpretable prediction of protein-DNA binding specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.15.571942. [PMID: 38293168 PMCID: PMC10827229 DOI: 10.1101/2023.12.15.571942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Predicting specificity in protein-DNA interactions is a challenging yet essential task for understanding gene regulation. Here, we present Deep Predictor of Binding Specificity (DeepPBS), a geometric deep-learning model designed to predict binding specificity across protein families based on protein-DNA structures. The DeepPBS architecture allows investigation of different family-specific recognition patterns. DeepPBS can be applied to predicted structures, and can aid in the modeling of protein-DNA complexes. DeepPBS is interpretable and can be used to calculate protein heavy atom-level importance scores, demonstrated as a case-study on p53-DNA interface. When aggregated at the protein residue level, these scores conform well with alanine scanning mutagenesis experimental data. The inference time for DeepPBS is sufficiently fast for analyzing simulation trajectories, as demonstrated on a molecular-dynamics simulation of a Drosophila Hox-DNA tertiary complex with its cofactor. DeepPBS and its corresponding data resources offer a foundation for machine-aided protein-DNA interaction studies, guiding experimental choices and complex design, as well as advancing our understanding of molecular interactions.
Collapse
|
13
|
Moreno-Blanco A, Pluta R, Espinosa M, Ruiz-Cruz S, Bravo A. Promoter DNA recognition by the Enterococcus faecalis global regulator MafR. Front Mol Biosci 2023; 10:1294974. [PMID: 38192335 PMCID: PMC10773906 DOI: 10.3389/fmolb.2023.1294974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/30/2023] [Indexed: 01/10/2024] Open
Abstract
When Enterococcus faecalis is exposed to changing environmental conditions, the expression of many genes is regulated at the transcriptional level. We reported previously that the enterococcal MafR protein causes genome-wide changes in the transcriptome. Here we show that MafR activates directly the transcription of the OG1RF_10478 gene, which encodes a hypothetical protein of 111 amino acid residues. We have identified the P10478 promoter and demonstrated that MafR enhances the efficiency of this promoter by binding to a DNA site that contains the -35 element. Moreover, our analysis of the OG1RF_10478 protein AlphaFold model indicates high similarity to 1) structures of EIIB components of the bacterial phosphoenolpyruvate:carbohydrate phosphotransferase system, and 2) structures of receiver domains that are found in response regulators of two-component signal transduction systems. However, unlike typical EIIB components, OG1RF_10478 lacks a Cys or His residue at the conserved phosphorylation site, and, unlike typical receiver domains, OG1RF_10478 lacks a conserved Asp residue at the position usually required for phosphorylation. Different from EIIB components and receiver domains, OG1RF_10478 contains an insertion between residues 10 and 30 that, according to ColabFold prediction, may serve as a dimerization interface. We propose that OG1RF_10478 could participate in regulatory functions by protein-protein interactions.
Collapse
Affiliation(s)
- Ana Moreno-Blanco
- Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Radoslaw Pluta
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Manuel Espinosa
- Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Sofía Ruiz-Cruz
- Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Alicia Bravo
- Centro de Investigaciones Biológicas Margarita Salas, Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| |
Collapse
|
14
|
Xu C, Kleinschmidt H, Yang J, Leith E, Johnson J, Tan S, Mahony S, Bai L. Systematic Dissection of Sequence Features Affecting the Binding Specificity of a Pioneer Factor Reveals Binding Synergy Between FOXA1 and AP-1. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.08.566246. [PMID: 37986839 PMCID: PMC10659273 DOI: 10.1101/2023.11.08.566246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Despite the unique ability of pioneer transcription factors (PFs) to target nucleosomal sites in closed chromatin, they only bind a small fraction of their genomic motifs. The underlying mechanism of this selectivity is not well understood. Here, we design a high-throughput assay called ChIP-ISO to systematically dissect sequence features affecting the binding specificity of a classic PF, FOXA1. Combining ChIP-ISO with in vitro and neural network analyses, we find that 1) FOXA1 binding is strongly affected by co-binding TFs AP-1 and CEBPB, 2) FOXA1 and AP-1 show binding cooperativity in vitro, 3) FOXA1's binding is determined more by local sequences than chromatin context, including eu-/heterochromatin, and 4) AP-1 is partially responsible for differential binding of FOXA1 in different cell types. Our study presents a framework for elucidating genetic rules underlying PF binding specificity and reveals a mechanism for context-specific regulation of its binding.
Collapse
Affiliation(s)
- Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jianyu Yang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Erik Leith
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jenna Johnson
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Song Tan
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Physics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
15
|
Li J, Chiu TP, Rohs R. Deep DNAshape: Predicting DNA shape considering extended flanking regions using a deep learning method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563383. [PMID: 37961633 PMCID: PMC10634709 DOI: 10.1101/2023.10.22.563383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA shape plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, refined DNA shape features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing a deeper understanding of the effects of flanking regions on DNA shape in a target region of a sequence. Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as a versatile and powerful tool for diverse DNA structure-related studies.
Collapse
|
16
|
Memon AA, Fu X, Fan XY, Xu L, Xiao J, Rahman MU, Yang X, Yao YF, Deng Z, Ma W. Substrate DNA Promoting Binding of Mycobacterium tuberculosis MtrA by Facilitating Dimerization and Interpretation of Affinity by Minor Groove Width. Microorganisms 2023; 11:2505. [PMID: 37894163 PMCID: PMC10609481 DOI: 10.3390/microorganisms11102505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/21/2023] [Accepted: 09/28/2023] [Indexed: 10/29/2023] Open
Abstract
In order to deepen the understanding of the role and regulation mechanisms of prokaryotic global transcription regulators in complex processes, including virulence, the associations between the affinity and binding sequences of Mycobacterium tuberculosis MtrA have been explored extensively. Analysis of MtrA 294 diversified 26 bp binding sequences revealed that the sequence similarity of fragments was not simply associated with affinity. The unique variation patterns of GC content and periodical and sequential fluctuation of affinity contribution curves were observed along the sequence in this study. Furthermore, docking analysis demonstrated that the structure of the dimer MtrA-DNA (high affinity) was generally consistent with other OmpR family members, while Arg 219 and Gly 220 of the wing domain interacted with the minor groove. The results of the binding box replacement experiment proved that box 2 was essential for binding, which implied the differential roles of the two boxes in the binding process. Furthermore, the results of the substitution of the nucleotide at the 20th and/or 21st positions indicated that the affinity was negatively associated with the value of minor groove width precisely at the 21st position. The dimerization of the unphosphorylated MtrA facilitated by a low-affinity DNA fragment was observed for the first time. However, the proportion of the dimer was associated with the affinity of substrate DNA, which further suggested that the affinity was actually one characteristic of the stability of dimers. Based on the finding of 17 inter-molecule hydrogen bonds identified in the interface of the MtrA dimer, including 8 symmetric complementary ones in the conserved α4-β5-α5 face, we propose that hydrogen bonds should be considered just as important as salt bridges and the hydrophobic patch in the dimerization. Our comprehensive study on a large number of binding fragments with quantitative affinity values provided new insight into the molecular mechanism of dimerization, binding specificity and affinity determination of MtrA and clues for solving the puzzle of how global transcription factors regulate a large quantity of target genes.
Collapse
Affiliation(s)
- Aadil Ahmed Memon
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Xiang Fu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Xiao-Yong Fan
- Shanghai Institute of Infectious Diseases and Biosecurity, Shanghai Public Health Clinical Center, Fudan University, Shanghai 200032, China
| | - Lingyun Xu
- Shanghai Huaxin Biotechnology Co., Ltd., Room 604, Building 1, Tongji Chuangyuan, No. 99 South Changjiang Road, Baoshan District, Shanghai 200441, China
| | - Jihua Xiao
- Shanghai Huaxin Biotechnology Co., Ltd., Room 604, Building 1, Tongji Chuangyuan, No. 99 South Changjiang Road, Baoshan District, Shanghai 200441, China
| | - Mueed Ur Rahman
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Xiaoqi Yang
- Shanghai Huaxin Biotechnology Co., Ltd., Room 604, Building 1, Tongji Chuangyuan, No. 99 South Changjiang Road, Baoshan District, Shanghai 200441, China
| | - Yu-Feng Yao
- Laboratory of Bacterial Pathogenesis, Institutes of Medical Sciences, School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China
| | - Zixin Deng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Wei Ma
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| |
Collapse
|
17
|
Liu Z, Samee M. Structural underpinnings of mutation rate variations in the human genome. Nucleic Acids Res 2023; 51:7184-7197. [PMID: 37395403 PMCID: PMC10415140 DOI: 10.1093/nar/gkad551] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 06/06/2023] [Accepted: 06/15/2023] [Indexed: 07/04/2023] Open
Abstract
Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Importantly, the rates vary substantially across the genome and the principles underlying such variations remain poorly understood. A recent model explained much of this variation by considering higher-order nucleotide interactions in the 7-mer sequence context around mutated nucleotides. This model's success implicates a connection between DNA shape and mutation rates. DNA shape, i.e. structural properties like helical twist and tilt, is known to capture interactions between nucleotides within a local context. Thus, we hypothesized that changes in DNA shape features at and around mutated positions can explain mutation rate variations in the human genome. Indeed, DNA shape-based models of mutation rates showed similar or improved performance over current nucleotide sequence-based models. These models accurately characterized mutation hotspots in the human genome and revealed the shape features whose interactions underlie mutation rate variations. DNA shape also impacts mutation rates within putative functional regions like transcription factor binding sites where we find a strong association between DNA shape and position-specific mutation rates. This work demonstrates the structural underpinnings of nucleotide mutations in the human genome and lays the groundwork for future models of genetic variations to incorporate DNA shape.
Collapse
Affiliation(s)
- Zian Liu
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
18
|
Cooper BH, Dantas Machado AC, Gan Y, Aparicio O, Rohs R. DNA binding specificity of all four Saccharomyces cerevisiae forkhead transcription factors. Nucleic Acids Res 2023; 51:5621-5633. [PMID: 37177995 PMCID: PMC10287902 DOI: 10.1093/nar/gkad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 04/19/2023] [Accepted: 04/27/2023] [Indexed: 05/15/2023] Open
Abstract
Quantifying the nucleotide preferences of DNA binding proteins is essential to understanding how transcription factors (TFs) interact with their targets in the genome. High-throughput in vitro binding assays have been used to identify the inherent DNA binding preferences of TFs in a controlled environment isolated from confounding factors such as genome accessibility, DNA methylation, and TF binding cooperativity. Unfortunately, many of the most common approaches for measuring binding preferences are not sensitive enough for the study of moderate-to-low affinity binding sites, and are unable to detect small-scale differences between closely related homologs. The Forkhead box (FOX) family of TFs is known to play a crucial role in regulating a variety of key processes from proliferation and development to tumor suppression and aging. By using the high-sequencing depth SELEX-seq approach to study all four FOX homologs in Saccharomyces cerevisiae, we have been able to precisely quantify the contribution and importance of nucleotide positions all along an extended binding site. Essential to this process was the alignment of our SELEX-seq reads to a set of candidate core sequences determined using a recently developed tool for the alignment of enriched k-mers and a newly developed approach for the reprioritization of candidate cores.
Collapse
Affiliation(s)
- Brendon H Cooper
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Ana Carolina Dantas Machado
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Yan Gan
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Oscar M Aparicio
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
- Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
19
|
Tan DS, Cheung SL, Gao Y, Weinbuch M, Hu H, Shi L, Ti SC, Hutchins AP, Cojocaru V, Jauch R. The homeodomain of Oct4 is a dimeric binder of methylated CpG elements. Nucleic Acids Res 2023; 51:1120-1138. [PMID: 36631980 PMCID: PMC9943670 DOI: 10.1093/nar/gkac1262] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 12/14/2022] [Accepted: 12/19/2022] [Indexed: 01/13/2023] Open
Abstract
Oct4 is essential to maintain pluripotency and has a pivotal role in establishing the germline. Its DNA-binding POU domain was recently found to bind motifs with methylated CpG elements normally associated with epigenetic silencing. However, the mode of binding and the consequences of this capability has remained unclear. Here, we show that Oct4 binds to a compact palindromic DNA element with a methylated CpG core (CpGpal) in alternative states of pluripotency and during cellular reprogramming towards induced pluripotent stem cells (iPSCs). During cellular reprogramming, typical Oct4 bound enhancers are uniformly demethylated, with the prominent exception of the CpGpal sites where DNA methylation is often maintained. We demonstrate that Oct4 cooperatively binds the CpGpal element as a homodimer, which contrasts with the ectoderm-expressed POU factor Brn2. Indeed, binding to CpGpal is Oct4-specific as other POU factors expressed in somatic cells avoid this element. Binding assays combined with structural analyses and molecular dynamic simulations show that dimeric Oct4-binding to CpGpal is driven by the POU-homeodomain whilst the POU-specific domain is detached from DNA. Collectively, we report that Oct4 exerts parts of its regulatory function in the context of methylated DNA through a DNA recognition mechanism that solely relies on its homeodomain.
Collapse
Affiliation(s)
- Daisylyn Senna Tan
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Shun Lai Cheung
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Ya Gao
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Maike Weinbuch
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China,Institute for Molecular Medicine, Ulm University, Ulm, Germany
| | - Haoqing Hu
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Liyang Shi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Shih-Chieh Ti
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Andrew P Hutchins
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Vlad Cojocaru
- STAR-UBB Institute, Babeş-Bolyai University, Cluj-Napoca, Romania,Computational Structural Biology Group, Utrecht University, The Netherlands,Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Ralf Jauch
- To whom correspondence should be addressed. Tel: +852 3917 9511; Fax: +852 28559730;
| |
Collapse
|
20
|
The pioneering function of the hox transcription factors. Semin Cell Dev Biol 2022:S1084-9521(22)00354-8. [PMID: 36517345 DOI: 10.1016/j.semcdb.2022.11.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/13/2022] [Accepted: 11/30/2022] [Indexed: 12/14/2022]
Abstract
Ever since the discovery that the Hox family of transcription factors establish morphological diversity in the developing embryo, major efforts have been directed towards understanding Hox-dependent patterning. This has led to important discoveries, notably on the mechanisms underlying the collinear expression of Hox genes and Hox binding specificity. More recently, several studies have provided evidence that Hox factors have the capacity to bind their targets in an inaccessible chromatin context and trigger the switch to an accessible, transcriptional permissive, chromatin state. In this review, we provide an overview of the evidences supporting that Hox factors behave as pioneer factors and discuss the potential mechanisms implicated in Hox pioneer activity as well as the significance of this functional property in Hox-dependent patterning.
Collapse
|
21
|
Lountos GT, Cherry S, Tropea JE, Wlodawer A, Miller M. Structural basis for cell type specific DNA binding of C/EBPβ: The case of cell cycle inhibitor p15INK4b promoter. J Struct Biol 2022; 214:107918. [PMID: 36343842 PMCID: PMC9909937 DOI: 10.1016/j.jsb.2022.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/22/2022] [Accepted: 10/31/2022] [Indexed: 11/06/2022]
Abstract
C/EBPβ is a key regulator of numerous cellular processes, but it can also contribute to tumorigenesis and viral diseases. It binds to specific DNA sequences (C/EBP sites) and interacts with other transcription factors to control expression of multiple eukaryotic genes in a tissue and cell-type dependent manner. A body of evidence has established that cell-type-specific regulatory information is contained in the local DNA sequence of the binding motif. In human epithelial cells, C/EBPβ is an essential cofactor for TGFβ signaling in the case of Smad2/3/4 and FoxO-dependent induction of the cell cycle inhibitor, p15INK4b. In the TGFβ-responsive region 2 of the p15INK4b promoter, the Smad binding site is flanked by a C/EBP site, CTTAA•GAAAG, which differs from the canonical, palindromic ATTGC•GCAAT motif. The X-ray crystal structure of C/EBPβ bound to the p15INK4b promoter fragment shows how GCGC-to-AAGA substitution generates changes in the intermolecular interactions in the protein-DNA interface that enhances C/EBPβ binding specificity, limits possible epigenetic regulation of the promoter, and generates a DNA element with a unique pattern of methyl groups in the major groove. Significantly, CT/GA dinucleotides located at the 5'ends of the double stranded element maintain local narrowing of the DNA minor groove width that is necessary for DNA recognition. Our results suggest that C/EBPβ would accept all forms of modified cytosine in the context of the CpT site. This contrasts with the effect on the consensus motif, where C/EBPβ binding is modestly increased by cytosine methylation, but substantially decreased by hydroxymethylation.
Collapse
Affiliation(s)
- George T Lountos
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA.
| | - Scott Cherry
- Protein Purification Core, Center for Structural Biology, National Cancer Institute, Frederick, MD 21702-1201, USA
| | - Joseph E Tropea
- Protein Purification Core, Center for Structural Biology, National Cancer Institute, Frederick, MD 21702-1201, USA
| | - Alexander Wlodawer
- Protein Structure Section, Center for Structural Biology, National Cancer Institute, Frederick, MD 21702-1201 USA
| | - Maria Miller
- Protein Structure Section, Center for Structural Biology, National Cancer Institute, Frederick, MD 21702-1201 USA
| |
Collapse
|
22
|
Pacesa M, Lin CH, Cléry A, Saha A, Arantes PR, Bargsten K, Irby MJ, Allain FHT, Palermo G, Cameron P, Donohoue PD, Jinek M. Structural basis for Cas9 off-target activity. Cell 2022; 185:4067-4081.e21. [PMID: 36306733 PMCID: PMC10103147 DOI: 10.1016/j.cell.2022.09.026] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 07/01/2022] [Accepted: 09/15/2022] [Indexed: 11/06/2022]
Abstract
The target DNA specificity of the CRISPR-associated genome editor nuclease Cas9 is determined by complementarity to a 20-nucleotide segment in its guide RNA. However, Cas9 can bind and cleave partially complementary off-target sequences, which raises safety concerns for its use in clinical applications. Here, we report crystallographic structures of Cas9 bound to bona fide off-target substrates, revealing that off-target binding is enabled by a range of noncanonical base-pairing interactions within the guide:off-target heteroduplex. Off-target substrates containing single-nucleotide deletions relative to the guide RNA are accommodated by base skipping or multiple noncanonical base pairs rather than RNA bulge formation. Finally, PAM-distal mismatches result in duplex unpairing and induce a conformational change in the Cas9 REC lobe that perturbs its conformational activation. Together, these insights provide a structural rationale for the off-target activity of Cas9 and contribute to the improved rational design of guide RNAs and off-target prediction algorithms.
Collapse
Affiliation(s)
- Martin Pacesa
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Chun-Han Lin
- Caribou Biosciences, 2929 Seventh Street Suite 105, Berkeley, CA 94710, USA
| | - Antoine Cléry
- Institute of Biochemistry, ETH Zurich, Hönggerbergring 64, 8093 Zurich, Switzerland
| | - Aakash Saha
- Department of Bioengineering, University of California Riverside, 900 University Avenue, Riverside, CA 52512, USA
| | - Pablo R Arantes
- Department of Bioengineering, University of California Riverside, 900 University Avenue, Riverside, CA 52512, USA
| | - Katja Bargsten
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Matthew J Irby
- Caribou Biosciences, 2929 Seventh Street Suite 105, Berkeley, CA 94710, USA
| | - Frédéric H-T Allain
- Institute of Biochemistry, ETH Zurich, Hönggerbergring 64, 8093 Zurich, Switzerland
| | - Giulia Palermo
- Department of Bioengineering, University of California Riverside, 900 University Avenue, Riverside, CA 52512, USA
| | - Peter Cameron
- Caribou Biosciences, 2929 Seventh Street Suite 105, Berkeley, CA 94710, USA
| | - Paul D Donohoue
- Caribou Biosciences, 2929 Seventh Street Suite 105, Berkeley, CA 94710, USA
| | - Martin Jinek
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| |
Collapse
|
23
|
Chiu TP, Li J, Jiang Y, Rohs R. It is in the flanks: Conformational flexibility of transcription factor binding sites. Biophys J 2022; 121:3765-3767. [PMID: 36182667 PMCID: PMC9674972 DOI: 10.1016/j.bpj.2022.09.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 11/23/2022] Open
Affiliation(s)
- Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California
| | - Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California
| | - Yibei Jiang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California; Departments of Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, California.
| |
Collapse
|
24
|
Ghoshdastidar D, Bansal M. Flexibility of flanking DNA is a key determinant of transcription factor affinity for the core motif. Biophys J 2022; 121:3987-4000. [PMID: 35978548 PMCID: PMC9674967 DOI: 10.1016/j.bpj.2022.08.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/28/2022] [Accepted: 08/15/2022] [Indexed: 11/02/2022] Open
Abstract
Selective gene regulation is mediated by recognition of specific DNA sequences by transcription factors (TFs). The extremely challenging task of searching out specific cognate DNA binding sites among several million putative sites within the eukaryotic genome is achieved by complex molecular recognition mechanisms. Elements of this recognition code include the core binding sequence, the flanking sequence context, and the shape and conformational flexibility of the composite binding site. To unravel the extent to which DNA flexibility modulates TF binding, in this study, we employed experimentally guided molecular dynamics simulations of ternary complex of closely related Hox heterodimers Exd-Ubx and Exd-Scr with DNA. Results demonstrate that flexibility signatures embedded in the flanking sequences impact TF binding at the cognate binding site. A DNA sequence has intrinsic shape and flexibility features. While shape features are localized, our analyses reveal that flexibility features of the flanking sequences percolate several basepairs and allosterically modulate TF binding at the core. We also show that lack of flexibility in the motif context can render the cognate site resistant to protein-induced shape changes and subsequently lower TF binding affinity. Overall, this study suggests that flexibility-guided DNA shape, and not merely the static shape, is a key unexplored component of the complex DNA-TF recognition code.
Collapse
Affiliation(s)
| | - Manju Bansal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, Karnataka, India.
| |
Collapse
|
25
|
Cooper BH, Chiu TP, Rohs R. Top-Down Crawl: a method for the ultra-rapid and motif-free alignment of sequences with associated binding metrics. Bioinformatics 2022; 38:5121-5123. [PMID: 36179084 PMCID: PMC9665867 DOI: 10.1093/bioinformatics/btac653] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 09/21/2022] [Accepted: 09/29/2022] [Indexed: 12/24/2022] Open
Abstract
SUMMARY Several high-throughput protein-DNA binding methods currently available produce highly reproducible measurements of binding affinity at the level of the k-mer. However, understanding where a k-mer is positioned along a binding site sequence depends on alignment. Here, we present Top-Down Crawl (TDC), an ultra-rapid tool designed for the alignment of k-mer level data in a rank-dependent and position weight matrix (PWM)-independent manner. As the framework only depends on the rank of the input, the method can accept input from many types of experiments (protein binding microarray, SELEX-seq, SMiLE-seq, etc.) without the need for specialized parameterization. Measuring the performance of the alignment using multiple linear regression with 5-fold cross-validation, we find TDC to perform as well as or better than computationally expensive PWM-based methods. AVAILABILITY AND IMPLEMENTATION TDC can be run online at https://topdowncrawl.usc.edu or locally as a python package available through pip at https://pypi.org/project/TopDownCrawl. CONTACT SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brendon H Cooper
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- To whom correspondence should be addressed.
| |
Collapse
|
26
|
DNA methyltransferase DNMT3A forms interaction networks with the CpG site and flanking sequence elements for efficient methylation. J Biol Chem 2022; 298:102462. [PMID: 36067881 PMCID: PMC9530848 DOI: 10.1016/j.jbc.2022.102462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 08/30/2022] [Accepted: 09/01/2022] [Indexed: 11/21/2022] Open
Abstract
Specific DNA methylation at CpG and non-CpG sites is essential for chromatin regulation. The DNA methyltransferase DNMT3A interacts with target sites surrounded by variable DNA sequences with its TRD and RD loops, but the functional necessity of these interactions is unclear. We investigated CpG and non-CpG methylation in a randomized sequence context using WT DNMT3A and several DNMT3A variants containing mutations at DNA-interacting residues. Our data revealed that the flanking sequence of target sites between the −2 and up to the +8 position modulates methylation rates >100-fold. Non-CpG methylation flanking preferences were even stronger and favor C(+1). R836 and N838 in concert mediate recognition of the CpG guanine. R836 changes its conformation in a flanking sequence-dependent manner and either contacts the CpG guanine or the +1/+2 flank, thereby coupling the interaction with both sequence elements. R836 suppresses activity at CNT sites but supports methylation of CAC substrates, the preferred target for non-CpG methylation of DNMT3A in cells. N838 helps to balance this effect and prevent the preference for C(+1) from becoming too strong. Surprisingly, we found L883 reduces DNMT3A activity despite being highly conserved in evolution. However, mutations at L883 disrupt the DNMT3A-specific DNA interactions of the RD loop, leading to altered flanking sequence preferences. Similar effects occur after the R882H mutation in cancer cells. Our data reveal that DNMT3A forms flexible and interdependent interaction networks with the CpG guanine and flanking residues that ensure recognition of the CpG and efficient methylation of the cytosine in contexts of variable flanking sequences.
Collapse
|
27
|
Tomaž Š, Gruden K, Coll A. TGA transcription factors-Structural characteristics as basis for functional variability. FRONTIERS IN PLANT SCIENCE 2022; 13:935819. [PMID: 35958211 PMCID: PMC9360754 DOI: 10.3389/fpls.2022.935819] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 07/04/2022] [Indexed: 06/15/2023]
Abstract
TGA transcription factors are essential regulators of various cellular processes, their activity connected to different hormonal pathways, interacting proteins and regulatory elements. Belonging to the basic region leucine zipper (bZIP) family, TGAs operate by binding to their target DNA sequence as dimers through a conserved bZIP domain. Despite sharing the core DNA-binding sequence, the TGA paralogues exert somewhat different DNA-binding preferences. Sequence variability of their N- and C-terminal protein parts indicates their importance in defining TGA functional specificity through interactions with diverse proteins, affecting their DNA-binding properties. In this review, we provide a short and concise summary on plant TGA transcription factors from a structural point of view, including the relation of their structural characteristics to their functional roles in transcription regulation.
Collapse
Affiliation(s)
- Špela Tomaž
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
- Jožef Stefan International Postgraduate School, Ljubljana, Slovenia
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Anna Coll
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| |
Collapse
|
28
|
Krieger G, Lupo O, Wittkopp P, Barkai N. Evolution of transcription factor binding through sequence variations and turnover of binding sites. Genome Res 2022; 32:1099-1111. [PMID: 35618416 PMCID: PMC9248875 DOI: 10.1101/gr.276715.122] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/20/2022] [Indexed: 01/08/2023]
Abstract
Variations in noncoding regulatory sequences play a central role in evolution. Interpreting such variations, however, remains difficult even in the context of defined attributes such as transcription factor (TF) binding sites. Here, we systematically link variations in cis-regulatory sequences to TF binding by profiling the allele-specific binding of 27 TFs expressed in a yeast hybrid, in which two related genomes are present within the same nucleus. TFs localize preferentially to sites containing their known consensus motifs but occupy only a small fraction of the motif-containing sites available within the genomes. Differential binding of TFs to the orthologous alleles was well explained by variations that alter motif sequence, whereas differences in chromatin accessibility between alleles were of little apparent effect. Motif variations that abolished binding when present in only one allele were still bound when present in both alleles, suggesting evolutionary compensation, with a potential role for sequence conservation at the motif's vicinity. At the level of the full promoter, we identify cases of binding-site turnover, in which binding sites are reciprocally gained and lost, yet most interspecific differences remained uncompensated. Our results show the flexibility of TFs to bind imprecise motifs and the fast evolution of TF binding sites between related species.
Collapse
Affiliation(s)
- Gat Krieger
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Offir Lupo
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Patricia Wittkopp
- Department of Ecology and Evolutionary Biology, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
29
|
Singh NP, Krumlauf R. Diversification and Functional Evolution of HOX Proteins. Front Cell Dev Biol 2022; 10:798812. [PMID: 35646905 PMCID: PMC9136108 DOI: 10.3389/fcell.2022.798812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 04/08/2022] [Indexed: 01/07/2023] Open
Abstract
Gene duplication and divergence is a major contributor to the generation of morphological diversity and the emergence of novel features in vertebrates during evolution. The availability of sequenced genomes has facilitated our understanding of the evolution of genes and regulatory elements. However, progress in understanding conservation and divergence in the function of proteins has been slow and mainly assessed by comparing protein sequences in combination with in vitro analyses. These approaches help to classify proteins into different families and sub-families, such as distinct types of transcription factors, but how protein function varies within a gene family is less well understood. Some studies have explored the functional evolution of closely related proteins and important insights have begun to emerge. In this review, we will provide a general overview of gene duplication and functional divergence and then focus on the functional evolution of HOX proteins to illustrate evolutionary changes underlying diversification and their role in animal evolution.
Collapse
Affiliation(s)
| | - Robb Krumlauf
- Stowers Institute for Medical Research, Kansas City, MO, United States
- Department of Anatomy and Cell Biology, Kansas University Medical Center, Kansas City, KS, United States
- *Correspondence: Robb Krumlauf,
| |
Collapse
|
30
|
Murray JI, Preston E, Crawford JP, Rumley JD, Amom P, Anderson BD, Sivaramakrishnan P, Patel SD, Bennett BA, Lavon TD, Hsiao E, Peng F, Zacharias AL. The anterior Hox gene ceh-13 and elt-1/GATA activate the posterior Hox genes nob-1 and php-3 to specify posterior lineages in the C. elegans embryo. PLoS Genet 2022; 18:e1010187. [PMID: 35500030 PMCID: PMC9098060 DOI: 10.1371/journal.pgen.1010187] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 05/12/2022] [Accepted: 04/04/2022] [Indexed: 12/18/2022] Open
Abstract
Hox transcription factors play a conserved role in specifying positional identity during animal development, with posterior Hox genes typically repressing the expression of more anterior Hox genes. Here, we dissect the regulation of the posterior Hox genes nob-1 and php-3 in the nematode C. elegans. We show that nob-1 and php-3 are co-expressed in gastrulation-stage embryos in cells that previously expressed the anterior Hox gene ceh-13. This expression is controlled by several partially redundant transcriptional enhancers. These enhancers act in a ceh-13-dependant manner, providing a striking example of an anterior Hox gene positively regulating a posterior Hox gene. Several other regulators also act positively through nob-1/php-3 enhancers, including elt-1/GATA, ceh-20/ceh-40/Pbx, unc-62/Meis, pop-1/TCF, ceh-36/Otx, and unc-30/Pitx. We identified defects in both cell position and cell division patterns in ceh-13 and nob-1;php-3 mutants, suggesting that these factors regulate lineage identity in addition to positional identity. Together, our results highlight the complexity and flexibility of Hox gene regulation and function and the ability of developmental transcription factors to regulate different targets in different stages of development.
Collapse
Affiliation(s)
- John Isaac Murray
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Elicia Preston
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jeremy P. Crawford
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Jonathan D. Rumley
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Prativa Amom
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Breana D. Anderson
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Priya Sivaramakrishnan
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Shaili D. Patel
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Barrington Alexander Bennett
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Teddy D. Lavon
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Erin Hsiao
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Felicia Peng
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Amanda L. Zacharias
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| |
Collapse
|
31
|
Soffer A, Eisdorfer SA, Ifrach M, Ilic S, Afek A, Schussheim H, Vilenchik D, Akabayov B. Inferring primase-DNA specific recognition using a data driven approach. Nucleic Acids Res 2021; 49:11447-11458. [PMID: 34718733 PMCID: PMC8599759 DOI: 10.1093/nar/gkab956] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 10/01/2021] [Accepted: 10/04/2021] [Indexed: 12/11/2022] Open
Abstract
DNA–protein interactions play essential roles in all living cells. Understanding of how features embedded in the DNA sequence affect specific interactions with proteins is both challenging and important, since it may contribute to finding the means to regulate metabolic pathways involving DNA–protein interactions. Using a massive experimental benchmark dataset of binding scores for DNA sequences and a machine learning workflow, we describe the binding to DNA of T7 primase, as a model system for specific DNA–protein interactions. Effective binding of T7 primase to its specific DNA recognition sequences triggers the formation of RNA primers that serve as Okazaki fragment start sites during DNA replication.
Collapse
Affiliation(s)
- Adam Soffer
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,Data Science Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,School of Computer and Electrical Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Sarah A Eisdorfer
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Morya Ifrach
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Stefan Ilic
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Ariel Afek
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Hallel Schussheim
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Dan Vilenchik
- Data Science Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,School of Computer and Electrical Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Barak Akabayov
- Department of Chemistry, Ben-Gurion University of the Negev, Beer-Sheva, Israel.,Data Science Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
32
|
Sielemann J, Wulf D, Schmidt R, Bräutigam A. Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana. Nat Commun 2021; 12:6549. [PMID: 34772949 PMCID: PMC8590021 DOI: 10.1038/s41467-021-26819-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 10/21/2021] [Indexed: 11/20/2022] Open
Abstract
Understanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence. Methods to predict transcription factor binding sites typically focus on sequence motifs without considering DNA shape. Here the authors use a random forest machine learning approach that incorporates DNA shape and improves binding site prediction for Arabidopsis thaliana transcription factors.
Collapse
Affiliation(s)
- Janik Sielemann
- Computational Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615, Bielefeld, Germany.,Computational Biology, Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany.,Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany
| | - Donat Wulf
- Computational Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615, Bielefeld, Germany.,Computational Biology, Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany.,Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany
| | - Romy Schmidt
- Plant Biotechnology, Bielefeld University, 33615, Bielefeld, Germany
| | - Andrea Bräutigam
- Computational Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615, Bielefeld, Germany. .,Computational Biology, Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany. .,Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany.
| |
Collapse
|
33
|
López-Vidriero I, Godoy M, Grau J, Peñuelas M, Solano R, Franco-Zorrilla JM. DNA features beyond the transcription factor binding site specify target recognition by plant MYC2-related bHLH proteins. PLANT COMMUNICATIONS 2021; 2:100232. [PMID: 34778747 PMCID: PMC8577090 DOI: 10.1016/j.xplc.2021.100232] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 07/09/2021] [Accepted: 08/10/2021] [Indexed: 05/22/2023]
Abstract
Transcription factors (TFs) regulate gene expression by binding to cis-regulatory sequences in the promoters of target genes. Recent research is helping to decipher in part the cis-regulatory code in eukaryotes, including plants, but it is not yet fully understood how paralogous TFs select their targets. Here we addressed this question by studying several proteins of the basic helix-loop-helix (bHLH) family of plant TFs, all of which recognize the same DNA motif. We focused on the MYC-related group of bHLHs, that redundantly regulate the jasmonate (JA) signaling pathway, and we observed a high correspondence between DNA-binding profiles in vitro and MYC function in vivo. We demonstrated that A/T-rich modules flanking the MYC-binding motif, conserved from bryophytes to higher plants, are essential for TF recognition. We observed particular DNA-shape features associated with A/T modules, indicating that the DNA shape may contribute to MYC DNA binding. We extended this analysis to 20 additional bHLHs and observed correspondence between in vitro binding and protein function, but it could not be attributed to A/T modules as in MYCs. We conclude that different bHLHs may have their own codes for DNA binding and specific selection of targets that, at least in the case of MYCs, depend on the TF-DNA interplay.
Collapse
Affiliation(s)
- Irene López-Vidriero
- Genomics Unit, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
| | - Marta Godoy
- Genomics Unit, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
| | - Joaquín Grau
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
| | - María Peñuelas
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
| | - Roberto Solano
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
| | - José M. Franco-Zorrilla
- Genomics Unit, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología, CSIC, C/Darwin 3, 28049 Madrid, Spain
- Corresponding author
| |
Collapse
|
34
|
Zhang H, Lu T, Liu S, Yang J, Sun G, Cheng T, Xu J, Chen F, Yen K. Comprehensive understanding of Tn5 insertion preference improves transcription regulatory element identification. NAR Genom Bioinform 2021; 3:lqab094. [PMID: 34729473 PMCID: PMC8557372 DOI: 10.1093/nargab/lqab094] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 09/20/2021] [Accepted: 09/29/2021] [Indexed: 12/11/2022] Open
Abstract
Tn5 transposase, which can efficiently tagment the genome, has been widely adopted as a molecular tool in next-generation sequencing, from short-read sequencing to more complex methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq). Here, we systematically map Tn5 insertion characteristics across several model organisms, finding critical parameters that affect its insertion. On naked genomic DNA, we found that Tn5 insertion is not uniformly distributed or random. To uncover drivers of these biases, we used a machine learning framework, which revealed that DNA shape cooperatively works with DNA motif to affect Tn5 insertion preference. These intrinsic insertion preferences can be modeled using nucleotide dependence information from DNA sequences, and we developed a computational pipeline to correct for these biases in ATAC-seq data. Using our pipeline, we show that bias correction improves the overall performance of ATAC-seq peak detection, recovering many potential false-negative peaks. Furthermore, we found that these peaks are bound by transcription factors, underscoring the biological relevance of capturing this additional information. These findings highlight the benefits of an improved understanding and precise correction of Tn5 insertion preference.
Collapse
Affiliation(s)
- Houyu Zhang
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Ting Lu
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Shan Liu
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Jianyu Yang
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Guohuan Sun
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Tao Cheng
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Jin Xu
- Division of Cell, Developmental and Integrative Biology, School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Fangyao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Kuangyu Yen
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| |
Collapse
|
35
|
Marcos-Torres FJ, Maurer D, Juniar L, Griese JJ. The bacterial iron sensor IdeR recognizes its DNA targets by indirect readout. Nucleic Acids Res 2021; 49:10120-10135. [PMID: 34417623 PMCID: PMC8464063 DOI: 10.1093/nar/gkab711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/19/2021] [Accepted: 08/02/2021] [Indexed: 01/11/2023] Open
Abstract
The iron-dependent regulator IdeR is the main transcriptional regulator controlling iron homeostasis genes in Actinobacteria, including species from the Corynebacterium, Mycobacterium and Streptomyces genera, as well as the erythromycin-producing bacterium Saccharopolyspora erythraea. Despite being a well-studied transcription factor since the identification of the Diphtheria toxin repressor DtxR three decades ago, the details of how IdeR proteins recognize their highly conserved 19-bp DNA target remain to be elucidated. IdeR makes few direct contacts with DNA bases in its target sequence, and we show here that these contacts are not required for target recognition. The results of our structural and mutational studies support a model wherein IdeR mainly uses an indirect readout mechanism, identifying its targets via the sequence-dependent DNA backbone structure rather than through specific contacts with the DNA bases. Furthermore, we show that IdeR efficiently recognizes a shorter palindromic sequence corresponding to a half binding site as compared to the full 19-bp target previously reported, expanding the number of potential target genes controlled by IdeR proteins.
Collapse
Affiliation(s)
| | - Dirk Maurer
- Department of Cell and Molecular Biology, Uppsala University, SE-751 24 Uppsala, Sweden
| | - Linda Juniar
- Department of Cell and Molecular Biology, Uppsala University, SE-751 24 Uppsala, Sweden
| | - Julia J Griese
- Department of Cell and Molecular Biology, Uppsala University, SE-751 24 Uppsala, Sweden
| |
Collapse
|
36
|
Zhang Y, Mo Q, Xue L, Luo J. Evaluation of deep learning approaches for modeling transcription factor sequence specificity. Genomics 2021; 113:3774-3781. [PMID: 34534646 DOI: 10.1016/j.ygeno.2021.09.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 07/19/2021] [Accepted: 09/11/2021] [Indexed: 11/16/2022]
Abstract
As a key component of gene regulation, transcription factors (TFs) play an important role in a number of biological processes. To fully understand the underlying mechanism of TF-mediated gene regulation, it is therefore critical to accurately identify TF binding sites and predict their affinities. Recently, deep learning (DL) algorithms have achieved promising results in the prediction of DNA-TF binding, however, various deep learning architectures have not been systematically compared, and the relative merit of each architecture remains unclear. To address this problem, we applied four different deep learning architectures to SELEX-seq and HT-SELEX data, covering three species and 35 families. We evaluated and compared the performance of different deep neural models using 10-fold cross-validation. Our results indicate that the hybrid CNN + DNN model shows the best performances. We expect that our study will be broadly applicable to modeling and predicting TF binding specificity when more high-throughput affinity data are available.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Qi Mo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou 646000, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China; Department of Pharmacy, The Affiliated Hospital of Southwest Medical University, Luzhou 646000, China; Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, Southwest Medical University, Luzhou 646000, China.
| |
Collapse
|
37
|
Hombría JCG, García-Ferrés M, Sánchez-Higueras C. Anterior Hox Genes and the Process of Cephalization. Front Cell Dev Biol 2021; 9:718175. [PMID: 34422836 PMCID: PMC8374599 DOI: 10.3389/fcell.2021.718175] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 07/16/2021] [Indexed: 11/13/2022] Open
Abstract
During evolution, bilateral animals have experienced a progressive process of cephalization with the anterior concentration of nervous tissue, sensory organs and the appearance of dedicated feeding structures surrounding the mouth. Cephalization has been achieved by the specialization of the unsegmented anterior end of the body (the acron) and the sequential recruitment to the head of adjacent anterior segments. Here we review the key developmental contribution of Hox1-5 genes to the formation of cephalic structures in vertebrates and arthropods and discuss how this evolved. The appearance of Hox cephalic genes preceded the evolution of a highly specialized head in both groups, indicating that Hox gene involvement in the control of cephalic structures was acquired independently during the evolution of vertebrates and invertebrates to regulate the genes required for head innovation.
Collapse
Affiliation(s)
- James C-G Hombría
- Centro Andaluz de Biología del Desarrollo (Consejo Superior de Investigaciones Científicas/Junta de Andalucía/Universidad Pablo de Olavide), Seville, Spain
| | - Mar García-Ferrés
- Centro Andaluz de Biología del Desarrollo (Consejo Superior de Investigaciones Científicas/Junta de Andalucía/Universidad Pablo de Olavide), Seville, Spain
| | - Carlos Sánchez-Higueras
- Centro Andaluz de Biología del Desarrollo (Consejo Superior de Investigaciones Científicas/Junta de Andalucía/Universidad Pablo de Olavide), Seville, Spain
| |
Collapse
|
38
|
Shih CH, Fay J. Cis-regulatory variants affect gene expression dynamics in yeast. eLife 2021; 10:e68469. [PMID: 34369376 PMCID: PMC8367379 DOI: 10.7554/elife.68469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 08/06/2021] [Indexed: 12/14/2022] Open
Abstract
Evolution of cis-regulatory sequences depends on how they affect gene expression and motivates both the identification and prediction of cis-regulatory variants responsible for expression differences within and between species. While much progress has been made in relating cis-regulatory variants to expression levels, the timing of gene activation and repression may also be important to the evolution of cis-regulatory sequences. We investigated allele-specific expression (ASE) dynamics within and between Saccharomyces species during the diauxic shift and found appreciable cis-acting variation in gene expression dynamics. Within-species ASE is associated with intergenic variants, and ASE dynamics are more strongly associated with insertions and deletions than ASE levels. To refine these associations, we used a high-throughput reporter assay to test promoter regions and individual variants. Within the subset of regions that recapitulated endogenous expression, we identified and characterized cis-regulatory variants that affect expression dynamics. Between species, chimeric promoter regions generate novel patterns and indicate constraints on the evolution of gene expression dynamics. We conclude that changes in cis-regulatory sequences can tune gene expression dynamics and that the interplay between expression dynamics and other aspects of expression is relevant to the evolution of cis-regulatory sequences.
Collapse
Affiliation(s)
- Ching-Hua Shih
- Department of Biology, University of RochesterRochesterUnited States
| | - Justin Fay
- Department of Biology, University of RochesterRochesterUnited States
| |
Collapse
|
39
|
Brodsky S, Jana T, Barkai N. Order through disorder: The role of intrinsically disordered regions in transcription factor binding specificity. Curr Opin Struct Biol 2021; 71:110-115. [PMID: 34303077 DOI: 10.1016/j.sbi.2021.06.011] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 05/31/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023]
Abstract
Transcription factors (TFs) must bind at specific genomic locations to accurately regulate gene expression. The ability of TFs to recognize specific DNA sequence motifs arises from the inherent preferences of their globular DNA-binding domains (DBDs). Yet, these preferences are insufficient to explain the in vivo TF binding site selection. TFs are enriched with intrinsically disordered regions (IDRs), most of which are poorly characterized. While not generally considered as determinants of TF binding specificity, IDRs guide protein-protein interactions within transcriptional condensates, and multiple examples exist in which short IDRs flanking the DBD contribute to binding specificity via direct contact with the DNA. We recently reported that long IDRs, present away from the DBD, act as major specificity determinants at the genomic scale. Here, we discuss mechanisms through which IDRs contribute to DNA binding specificity, highlighting the role of long IDRs in dictating the in vivo binding site selection.
Collapse
Affiliation(s)
- Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Tamar Jana
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel.
| |
Collapse
|
40
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
41
|
Insight into the sequence-specific elements leading to increased DNA bending and ligase-mediated circularization propensity by antitumor trabectedin. J Comput Aided Mol Des 2021; 35:707-719. [PMID: 34105031 DOI: 10.1007/s10822-021-00396-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/04/2021] [Indexed: 12/23/2022]
Abstract
DNA curvature is the result of a combination of both intrinsic features of the double helix and external distortions introduced by the environment and the binding of proteins or drugs. The propensity of certain double-stranded DNA (dsDNA) sequences to bend is essential in crucial biological processes, such as replication and transcription, in which proteins are known to either recognize noncanonical DNA conformations or promote their formation upon DNA binding. Trabectedin (Yondelis®) is a clinically used antitumor drug which, following covalent bond formation with the 2-amino group of guanine, induces DNA curvature and enhances the circularization ratio, upon DNA ligation, of several dsDNA constructs but not others. By means of unrestrained molecular dynamics simulations using explicitly solvated all-atom models, we rationalize these experimental findings in structural terms and shed light on the crucial, albeit possibly underappreciated, role played by T4 DNA ligase in stabilizing a bent DNA conformation prior to cyclization. Taken together, our results expand our current understanding on how DNA shape modification by trabectedin may affect both the sequence-specific recognition by transcription factors to promoter sites and RNA polymerase II binding.
Collapse
|
42
|
Epigenetic plasticity, selection, and tumorigenesis. Biochem Soc Trans 2021; 48:1609-1621. [PMID: 32794546 DOI: 10.1042/bst20191215] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 07/17/2020] [Accepted: 07/21/2020] [Indexed: 12/11/2022]
Abstract
Epigenetic processes converge on chromatin in order to direct a cell's gene expression profile. This includes both maintaining a stable cell identity, but also priming the cell for specific controlled transitions, such as differentiation or response to stimuli. In cancer, this normally tight control is often disrupted, leading to a wide scale hyper-plasticity of the epigenome and allowing stochastic gene activation and silencing, cell state transition, and potentiation of the effects of genetic lesions. Many of these epigenetic disruptions will confer a proliferative advantage to cells, allowing for a selection process to occur and leading to tumorigenesis even in the case of reversible or unstable epigenetic states. This review seeks to highlight how the fundamental epigenetic shifts in cancer contribute to tumorigenesis, and how understanding an integrated view of cancer genetics and epigenetics may more effectively guide research and treatment.
Collapse
|
43
|
Spiegel J, Cuesta SM, Adhikari S, Hänsel-Hertsch R, Tannahill D, Balasubramanian S. G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol 2021; 22:117. [PMID: 33892767 PMCID: PMC8063395 DOI: 10.1186/s13059-021-02324-z] [Citation(s) in RCA: 118] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/24/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The binding of transcription factors (TF) to genomic targets is critical in the regulation of gene expression. Short, double-stranded DNA sequence motifs are routinely implicated in TF recruitment, but many questions remain on how binding site specificity is governed. RESULTS Herein, we reveal a previously unappreciated role for DNA secondary structures as key features for TF recruitment. In a systematic, genome-wide study, we discover that endogenous G-quadruplex secondary structures (G4s) are prevalent TF binding sites in human chromatin. Certain TFs bind G4s with affinities comparable to double-stranded DNA targets. We demonstrate that, in a chromatin context, this binding interaction is competed out with a small molecule. Notably, endogenous G4s are prominent binding sites for a large number of TFs, particularly at promoters of highly expressed genes. CONCLUSIONS Our results reveal a novel non-canonical mechanism for TF binding whereby G4s operate as common binding hubs for many different TFs to promote increased transcription.
Collapse
Affiliation(s)
- Jochen Spiegel
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Sergio Martínez Cuesta
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
- Present Address: Data Sciences and Quantitative Biology, Discovery Sciences, AstraZeneca, Cambridge, UK
| | - Santosh Adhikari
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Robert Hänsel-Hertsch
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
- Present Address: Center for Molecular Medicine Cologne, University of Cologne, 50931, Cologne, Germany
| | - David Tannahill
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Shankar Balasubramanian
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK.
| |
Collapse
|
44
|
Käppel S, Eggeling R, Rümpler F, Groth M, Melzer R, Theißen G. DNA-binding properties of the MADS-domain transcription factor SEPALLATA3 and mutant variants characterized by SELEX-seq. PLANT MOLECULAR BIOLOGY 2021; 105:543-557. [PMID: 33486697 PMCID: PMC7892521 DOI: 10.1007/s11103-020-01108-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 12/11/2020] [Indexed: 05/13/2023]
Abstract
We studied the DNA-binding profile of the MADS-domain transcription factor SEPALLATA3 and mutant variants by SELEX-seq. DNA-binding characteristics of SEPALLATA3 mutant proteins lead us to propose a novel DNA-binding mode. MIKC-type MADS-domain proteins, which function as essential transcription factors in plant development, bind as dimers to a 10-base-pair AT-rich motif termed CArG-box. However, this consensus motif cannot fully explain how the abundant family members in flowering plants can bind different target genes in specific ways. The aim of this study was to better understand the DNA-binding specificity of MADS-domain transcription factors. Also, we wanted to understand the role of a highly conserved arginine residue for binding specificity of the MADS-domain transcription factor family. Here, we studied the DNA-binding profile of the floral homeotic MADS-domain protein SEPALLATA3 by performing SELEX followed by high-throughput sequencing (SELEX-seq). We found a diverse set of bound sequences and could estimate the in vitro binding affinities of SEPALLATA3 to a huge number of different sequences. We found evidence for the preference of AT-rich motifs as flanking sequences. Whereas different CArG-boxes can act as SEPALLATA3 binding sites, our findings suggest that the preferred flanking motifs are almost always the same and thus mostly independent of the identity of the central CArG-box motif. Analysis of SEPALLATA3 proteins with a single amino acid substitution at position 3 of the DNA-binding MADS-domain further revealed that the conserved arginine residue, which has been shown to be involved in a shape readout mechanism, is especially important for the recognition of nucleotides at positions 3 and 8 of the CArG-box motif. This leads us to propose a novel DNA-binding mode for SEPALLATA3, which is different from that of other MADS-domain proteins known.
Collapse
Affiliation(s)
- Sandra Käppel
- Matthias Schleiden Institute/Genetics, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany
| | - Ralf Eggeling
- Department of Computer Science, University of Helsinki, Pietari Kalmin katu 5, 00014, Helsinki, Finland
- Methods in Medical Informatics, Department of Computer Science, University of Tübingen, Sand 14, 72076, Tübingen, Germany
- Institute for Biomedical Informatics, University of Tübingen, Tübingen, Germany
| | - Florian Rümpler
- Matthias Schleiden Institute/Genetics, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany
| | - Marco Groth
- Leibniz Institute on Aging-Fritz Lipmann Institute (FLI), Core Facility DNA Sequencing, Beutenbergstraße 11, 07745, Jena, Germany
| | - Rainer Melzer
- School of Biology and Environmental Science and Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Günter Theißen
- Matthias Schleiden Institute/Genetics, Friedrich Schiller University Jena, Philosophenweg 12, 07743, Jena, Germany.
| |
Collapse
|
45
|
Ainsworth HC, Howard TD, Langefeld CD. Intrinsic DNA topology as a prioritization metric in genomic fine-mapping studies. Nucleic Acids Res 2020; 48:11304-11321. [PMID: 33084892 PMCID: PMC7672465 DOI: 10.1093/nar/gkaa877] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/23/2020] [Accepted: 09/25/2020] [Indexed: 12/15/2022] Open
Abstract
In genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation resources can present challenges as many lack information for novel variants and/or may be uninformative for non-coding regions. We propose a novel annotation source, sequence-dependent DNA topology, as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied Minor Groove Width (MGW) as a prioritization metric. Using an established MGW-prediction method, we generated a MGW census for 199 038 197 SNPs across the human genome. Summarizing a SNP's change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We tested this hypothesis in 116 regions analyzed by a Massively Parallel Reporter Assay and observed enrichment of large ΔMGW for functional polymorphisms (P = 0.0007). To illustrate application in fine-mapping studies, we applied our MGW-prioritization approach to three non-coding regions associated with systemic lupus erythematosus. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.
Collapse
Affiliation(s)
- Hannah C Ainsworth
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Timothy D Howard
- Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Carl D Langefeld
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA.,Comprehensive Cancer Center of Wake Forest Baptist Medical Center, Winston-Salem, NC 27157, USA
| |
Collapse
|
46
|
Le Poul Y, Xin Y, Ling L, Mühling B, Jaenichen R, Hörl D, Bunk D, Harz H, Leonhardt H, Wang Y, Osipova E, Museridze M, Dharmadhikari D, Murphy E, Rohs R, Preibisch S, Prud'homme B, Gompel N. Regulatory encoding of quantitative variation in spatial activity of a Drosophila enhancer. SCIENCE ADVANCES 2020; 6:6/49/eabe2955. [PMID: 33268361 PMCID: PMC7821883 DOI: 10.1126/sciadv.abe2955] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]
Abstract
Developmental enhancers control the expression of genes prefiguring morphological patterns. The activity of an enhancer varies among cells of a tissue, but collectively, expression levels in individual cells constitute a spatial pattern of gene expression. How the spatial and quantitative regulatory information is encoded in an enhancer sequence is elusive. To link spatial pattern and activity levels of an enhancer, we used systematic mutations of the yellow spot enhancer, active in developing Drosophila wings, and tested their effect in a reporter assay. Moreover, we developed an analytic framework based on the comprehensive quantification of spatial reporter activity. We show that the quantitative enhancer activity results from densely packed regulatory information along the sequence, and that a complex interplay between activators and multiple tiers of repressors carves the spatial pattern. Our results shed light on how an enhancer reads and integrates trans-regulatory landscape information to encode a spatial quantitative pattern.
Collapse
Affiliation(s)
- Yann Le Poul
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Yaqun Xin
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Liucong Ling
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Bettina Mühling
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Rita Jaenichen
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - David Hörl
- Human Biology and Bioimaging, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - David Bunk
- Human Biology and Bioimaging, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Hartmann Harz
- Human Biology and Bioimaging, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Heinrich Leonhardt
- Human Biology and Bioimaging, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Yingfei Wang
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Elena Osipova
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Mariam Museridze
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Deepak Dharmadhikari
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Eamonn Murphy
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany
| | - Remo Rohs
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Stephan Preibisch
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, 13092 Berlin, Germany
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA
| | - Benjamin Prud'homme
- Aix-Marseille Université, CNRS, IBDM, Institut de Biologie du Développement de Marseille, Campus de Luminy Case 907, 13288 Marseille Cedex 9, France.
| | - Nicolas Gompel
- Evolutionary Ecology, Ludwig-Maximilians Universität München, Fakultät für Biologie, Biozentrum, Grosshaderner Strasse 2, 82152 Planegg-Martinsried, Germany.
| |
Collapse
|
47
|
Zhu Y, Li F, Xiang D, Akutsu T, Song J, Jia C. Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks. Brief Bioinform 2020; 22:5998831. [PMID: 33227813 DOI: 10.1093/bib/bbaa299] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/01/2020] [Accepted: 10/07/2020] [Indexed: 12/26/2022] Open
Abstract
A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter.erc.monash.edu/.
Collapse
Affiliation(s)
- Yan Zhu
- School of Science, Dalian Maritime University, China
| | - Fuyi Li
- Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University
| |
Collapse
|
48
|
Bulajić M, Srivastava D, Dasen JS, Wichterle H, Mahony S, Mazzoni EO. Differential abilities to engage inaccessible chromatin diversify vertebrate Hox binding patterns. Development 2020; 147:dev194761. [PMID: 33028607 PMCID: PMC7710020 DOI: 10.1242/dev.194761] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 09/25/2020] [Indexed: 12/17/2022]
Abstract
Although Hox genes encode for conserved transcription factors (TFs), they are further divided into anterior, central and posterior groups based on their DNA-binding domain similarity. The posterior Hox group expanded in the deuterostome clade and patterns caudal and distal structures. We aimed to address how similar Hox TFs diverge to induce different positional identities. We studied Hox TF DNA-binding and regulatory activity during an in vitro motor neuron differentiation system that recapitulates embryonic development. We found diversity in the genomic binding profiles of different Hox TFs, even among the posterior group paralogs that share similar DNA-binding domains. These differences in genomic binding were explained by differing abilities to bind to previously inaccessible sites. For example, the posterior group HOXC9 had a greater ability to bind occluded sites than the posterior HOXC10, producing different binding patterns and driving differential gene expression programs. From these results, we propose that the differential abilities of posterior Hox TFs to bind to previously inaccessible chromatin drive patterning diversification.This article has an associated 'The people behind the papers' interview.
Collapse
Affiliation(s)
- Milica Bulajić
- Department of Biology, New York University, New York, NY 10003, USA
| | - Divyanshi Srivastava
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jeremy S Dasen
- Neuroscience Institute, Department of Neuroscience and Physiology, New York University School of Medicine, New York, NY 10016, USA
| | - Hynek Wichterle
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Neuroscience, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | | |
Collapse
|
49
|
Lara-Gonzalez S, Dantas Machado AC, Rao S, Napoli AA, Birktoft J, Di Felice R, Rohs R, Lawson CL. The RNA Polymerase α Subunit Recognizes the DNA Shape of the Upstream Promoter Element. Biochemistry 2020; 59:4523-4532. [PMID: 33205945 DOI: 10.1021/acs.biochem.0c00571] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
We demonstrate here that the α subunit C-terminal domain of Escherichia coli RNA polymerase (αCTD) recognizes the upstream promoter (UP) DNA element via its characteristic minor groove shape and electrostatic potential. In two compositionally distinct crystallized assemblies, a pair of αCTD subunits bind in tandem to the UP element consensus A-tract that is 6 bp in length (A6-tract), each with their arginine 265 guanidinium group inserted into the minor groove. The A6-tract minor groove is significantly narrowed in these crystal structures, as well as in computationally predicted structures of free and bound DNA duplexes derived by Monte Carlo and molecular dynamics simulations, respectively. The negative electrostatic potential of free A6-tract DNA is substantially enhanced compared to that of generic DNA. Shortening the A-tract by 1 bp is shown to "knock out" binding of the second αCTD through widening of the minor groove. Furthermore, in computationally derived structures with arginine 265 mutated to alanine in either αCTD, either with or without the "knockout" DNA mutation, contact with the DNA is perturbed, highlighting the importance of arginine 265 in achieving αCTD-DNA binding. These results demonstrate that the importance of the DNA shape in sequence-dependent recognition of DNA by RNA polymerase is comparable to that of certain transcription factors.
Collapse
Affiliation(s)
- Samuel Lara-Gonzalez
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, New Jersey 08854, United States
| | - Ana Carolina Dantas Machado
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Satyanarayan Rao
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Andrew A Napoli
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, New Jersey 08854, United States
| | - Jens Birktoft
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, New Jersey 08854, United States
| | - Rosa Di Felice
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States.,Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, United States.,CNR-NANO Modena, Via Campi 213/A, 41125 Modena, Italy
| | - Remo Rohs
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States.,Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, United States.,Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States.,Department of Computer Science, University of Southern California, Los Angeles, California 90089, United States
| | - Catherine L Lawson
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, New Jersey 08854, United States.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, New Jersey 08854, United States
| |
Collapse
|
50
|
Dukatz M, Adam S, Biswal M, Song J, Bashtrykov P, Jeltsch A. Complex DNA sequence readout mechanisms of the DNMT3B DNA methyltransferase. Nucleic Acids Res 2020; 48:11495-11509. [PMID: 33105482 PMCID: PMC7672481 DOI: 10.1093/nar/gkaa938] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 10/02/2020] [Accepted: 10/06/2020] [Indexed: 12/14/2022] Open
Abstract
DNA methyltransferases interact with their CpG target sites in the context of variable flanking sequences. We investigated DNA methylation by the human DNMT3B catalytic domain using substrate pools containing CpX target sites in randomized flanking context and identified combined effects of CpG recognition and flanking sequence interaction together with complex contact networks involved in balancing the interaction with different flanking sites. DNA methylation rates were more affected by flanking sequences at non-CpG than at CpG sites. We show that T775 has an essential dynamic role in the catalytic mechanism of DNMT3B. Moreover, we identify six amino acid residues in the DNA-binding interface of DNMT3B (N652, N656, N658, K777, N779, and R823), which are involved in the equalization of methylation rates of CpG sites in favored and disfavored sequence contexts by forming compensatory interactions to the flanking residues including a CpG specific contact to an A at the +1 flanking site. Non-CpG flanking preferences of DNMT3B are highly correlated with non-CpG methylation patterns in human cells. Comparison of the flanking sequence preferences of human and mouse DNMT3B revealed subtle differences suggesting a co-evolution of flanking sequence preferences and cellular DNMT targets.
Collapse
Affiliation(s)
- Michael Dukatz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Sabrina Adam
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Mahamaya Biswal
- Department of Biochemistry, University of California, Riverside, CA 92521, USA
| | - Jikui Song
- Department of Biochemistry, University of California, Riverside, CA 92521, USA
| | - Pavel Bashtrykov
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Albert Jeltsch
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| |
Collapse
|