1
|
Vorontsov IE, Eliseeva IA, Zinkevich A, Nikonov M, Abramov S, Boytsov A, Kamenets V, Kasianova A, Kolmykov S, Yevshin I, Favorov A, Medvedeva YA, Jolma A, Kolpakov F, Makeev V, Kulakovskiy I. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res 2024; 52:D154-D163. [PMID: 37971293 PMCID: PMC10767914 DOI: 10.1093/nar/gkad1077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/17/2023] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Irina A Eliseeva
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Arsenii Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Mikhail Nikonov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Vasily Kamenets
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Alexandra Kasianova
- Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences, 127051 Moscow, Russia
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
| | | | - Alexander Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Yulia A Medvedeva
- Research Center of Biotechnology RAS, Russian Academy of Sciences, 119071 Moscow, Russia
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Fedor Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090 Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, 420008 Kazan, Russia
| |
Collapse
|
2
|
Ray D, Laverty KU, Jolma A, Nie K, Samson R, Pour SE, Tam CL, von Krosigk N, Nabeel-Shah S, Albu M, Zheng H, Perron G, Lee H, Najafabadi H, Blencowe B, Greenblatt J, Morris Q, Hughes TR. RNA-binding proteins that lack canonical RNA-binding domains are rarely sequence-specific. Sci Rep 2023; 13:5238. [PMID: 37002329 PMCID: PMC10066285 DOI: 10.1038/s41598-023-32245-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 03/23/2023] [Indexed: 04/03/2023] Open
Abstract
Thousands of RNA-binding proteins (RBPs) crosslink to cellular mRNA. Among these are numerous unconventional RBPs (ucRBPs)-proteins that associate with RNA but lack known RNA-binding domains (RBDs). The vast majority of ucRBPs have uncharacterized RNA-binding specificities. We analyzed 492 human ucRBPs for intrinsic RNA-binding in vitro and identified 23 that bind specific RNA sequences. Most (17/23), including 8 ribosomal proteins, were previously associated with RNA-related function. We identified the RBDs responsible for sequence-specific RNA-binding for several of these 23 ucRBPs and surveyed whether corresponding domains from homologous proteins also display RNA sequence specificity. CCHC-zf domains from seven human proteins recognized specific RNA motifs, indicating that this is a major class of RBD. For Nudix, HABP4, TPR, RanBP2-zf, and L7Ae domains, however, only isolated members or closely related homologs yielded motifs, consistent with RNA-binding as a derived function. The lack of sequence specificity for most ucRBPs is striking, and we suggest that many may function analogously to chromatin factors, which often crosslink efficiently to cellular DNA, presumably via indirect recruitment. Finally, we show that ucRBPs tend to be highly abundant proteins and suggest their identification in RNA interactome capture studies could also result from weak nonspecific interactions with RNA.
Collapse
Affiliation(s)
- Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Kaitlin U Laverty
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Kate Nie
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Reuben Samson
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Sara E Pour
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Cyrus L Tam
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Niklas von Krosigk
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Syed Nabeel-Shah
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Gabrielle Perron
- Department of Human Genetics, McGill University, Montréal, QC, H3A 0C7, Canada
- McGill Genome Centre, Montréal, QC, H3A 0G1, Canada
| | - Hyunmin Lee
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Hamed Najafabadi
- Department of Human Genetics, McGill University, Montréal, QC, H3A 0C7, Canada
- McGill Genome Centre, Montréal, QC, H3A 0G1, Canada
| | - Benjamin Blencowe
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Jack Greenblatt
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Quaid Morris
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA.
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| |
Collapse
|
3
|
Laverty KU, Jolma A, Pour SE, Zheng H, Ray D, Morris Q, Hughes TR. PRIESSTESS: interpretable, high-performing models of the sequence and structure preferences of RNA-binding proteins. Nucleic Acids Res 2022; 50:e111. [PMID: 36018788 DOI: 10.1093/nar/gkac694] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 07/22/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022] Open
Abstract
Modelling both primary sequence and secondary structure preferences for RNA binding proteins (RBPs) remains an ongoing challenge. Current models use varied RNA structure representations and can be difficult to interpret and evaluate. To address these issues, we present a universal RNA motif-finding/scanning strategy, termed PRIESSTESS (Predictive RBP-RNA InterpretablE Sequence-Structure moTif regrESSion), that can be applied to diverse RNA binding datasets. PRIESSTESS identifies dozens of enriched RNA sequence and/or structure motifs that are subsequently reduced to a set of core motifs by logistic regression with LASSO regularization. Importantly, these core motifs are easily visualized and interpreted, and provide a measure of RBP secondary structure specificity. We used PRIESSTESS to interrogate new HTR-SELEX data for 23 RBPs with diverse RNA binding modes and captured known primary sequence and secondary structure preferences for each. Moreover, when applying PRIESSTESS to 144 RBPs across 202 RNA binding datasets, 75% showed an RNA secondary structure preference but only 10% had a preference besides unpaired bases, suggesting that most RBPs simply recognize the accessibility of primary sequences.
Collapse
Affiliation(s)
- Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Arttu Jolma
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| | - Sara E Pour
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Debashish Ray
- Donnelly Centre, University of Toronto, Toronto, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Donnelly Centre, University of Toronto, Toronto, Canada
| |
Collapse
|
4
|
Zheng L, Liu J, Niu L, Kamran M, Yang AWH, Jolma A, Dai Q, Hughes TR, Patel DJ, Zhang L, Prasanth SG, Yu Y, Ren A, Lai EC. Distinct structural bases for sequence-specific DNA binding by mammalian BEN domain proteins. Genes Dev 2022; 36:225-240. [PMID: 35144965 PMCID: PMC8887127 DOI: 10.1101/gad.348993.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 01/05/2022] [Indexed: 11/24/2022]
Abstract
The BEN domain is a recently recognized DNA binding module that is present in diverse metazoans and certain viruses. Several BEN domain factors are known as transcriptional repressors, but, overall, relatively little is known of how BEN factors identify their targets in humans. In particular, X-ray structures of BEN domain:DNA complexes are only known for Drosophila factors bearing a single BEN domain, which lack direct vertebrate orthologs. Here, we characterize several mammalian BEN domain (BD) factors, including from two NACC family BTB-BEN proteins and from BEND3, which has four BDs. In vitro selection data revealed sequence-specific binding activities of isolated BEN domains from all of these factors. We conducted detailed functional, genomic, and structural studies of BEND3. We show that BD4 is a major determinant for in vivo association and repression of endogenous BEND3 targets. We obtained a high-resolution structure of BEND3-BD4 bound to its preferred binding site, which reveals how BEND3 identifies cognate DNA targets and shows differences with one of its non-DNA-binding BEN domains (BD1). Finally, comparison with our previous invertebrate BEN structures, along with additional structural predictions using AlphaFold2 and RoseTTAFold, reveal distinct strategies for target DNA recognition by different types of BEN domain proteins. Together, these studies expand the DNA recognition activities of BEN factors and provide structural insights into sequence-specific DNA binding by mammalian BEN proteins.
Collapse
Affiliation(s)
- Luqian Zheng
- The Eighth Affiliated Hospital, Sun Yat-Sen University, Shenzhen, Guangdong 518033, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Jingjing Liu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Lijie Niu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
| | - Mohammad Kamran
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Ally W H Yang
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A1, Canada
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | - Arttu Jolma
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A1, Canada
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | - Qi Dai
- Developmental Biology Program, Sloan Kettering Institute, New York, New York 10065, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A1, Canada
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | - Dinshaw J Patel
- Structural Biology Program, Sloan Kettering Institute, New York, New York 10065, USA
| | - Long Zhang
- The Eighth Affiliated Hospital, Sun Yat-Sen University, Shenzhen, Guangdong 518033, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Supriya G Prasanth
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Yang Yu
- State Key Laboratory of Medical Molecular Biology, Department of Molecular Biology and Biochemistry, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, China
- Developmental Biology Program, Sloan Kettering Institute, New York, New York 10065, USA
| | - Aiming Ren
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Eric C Lai
- Developmental Biology Program, Sloan Kettering Institute, New York, New York 10065, USA
| |
Collapse
|
5
|
Lovering RC, Gaudet P, Acencio ML, Ignatchenko A, Jolma A, Fornes O, Kuiper M, Kulakovskiy IV, Lægreid A, Martin MJ, Logie C. A GO catalogue of human DNA-binding transcription factors. Biochim Biophys Acta Gene Regul Mech 2021; 1864:194765. [PMID: 34673265 DOI: 10.1016/j.bbagrm.2021.194765] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 10/08/2021] [Accepted: 10/09/2021] [Indexed: 12/27/2022]
Abstract
To control gene transcription, DNA-binding transcription factors recognise specific sequence motifs in gene regulatory regions. A complete and reliable GO annotation of all DNA-binding transcription factors is key to investigating the delicate balance of gene regulation in response to environmental and developmental stimuli. The need for such information is demonstrated by the many lists of transcription factors that have been produced over the past decade. The COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC) Consortium brought together experts in the field of transcription with the aim of providing high quality and interoperable gene regulatory data. The Gene Ontology (GO) Consortium provides strict definitions for gene product function, including factors that regulate transcription. The collaboration between the GREEKC and GO Consortia has enabled the application of those definitions to produce a new curated catalogue of over 1400 human DNA-binding transcription factors, that can be accessed at https://www.ebi.ac.uk/QuickGO/targetset/dbTF. This catalogue has facilitated an improvement in the GO annotation of human DNA-binding transcription factors and led to the GO annotation of almost sixty thousand DNA-binding transcription factors in over a hundred species. Thus, this work will aid researchers investigating the regulation of transcription in both biomedical and basic science.
Collapse
Affiliation(s)
- Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London WC1E 6BT, United Kingdom.
| | - Pascale Gaudet
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, 1 Rue Michel-Servet, 1211 Geneve 4, Switzerland.
| | - Marcio L Acencio
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim NO-7491, Norway.
| | - Alex Ignatchenko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, British Columbia V5Z 4H4, Canada.
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, Trondheim NO-7491, Norway.
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia; Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia.
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim NO-7491, Norway.
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Colin Logie
- Molecular Biology Department, Faculty of Science, Radboud University, PO Box 9101, 6500HB Nijmegen, the Netherlands.
| |
Collapse
|
6
|
Jolma A, Zhang J, Mondragón E, Morgunova E, Kivioja T, Laverty KU, Yin Y, Zhu F, Bourenkov G, Morris Q, Hughes TR, Maher LJ, Taipale J. Binding specificities of human RNA-binding proteins toward structured and linear RNA sequences. Genome Res 2020; 30:962-973. [PMID: 32703884 PMCID: PMC7397871 DOI: 10.1101/gr.258848.119] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 06/23/2020] [Indexed: 01/09/2023]
Abstract
RNA-binding proteins (RBPs) regulate RNA metabolism at multiple levels by affecting splicing of nascent transcripts, RNA folding, base modification, transport, localization, translation, and stability. Despite their central role in RNA function, the RNA-binding specificities of most RBPs remain unknown or incompletely defined. To address this, we have assembled a genome-scale collection of RBPs and their RNA-binding domains (RBDs) and assessed their specificities using high-throughput RNA-SELEX (HTR-SELEX). Approximately 70% of RBPs for which we obtained a motif bound to short linear sequences, whereas ∼30% preferred structured motifs folding into stem-loops. We also found that many RBPs can bind to multiple distinctly different motifs. Analysis of the matches of the motifs in human genomic sequences suggested novel roles for many RBPs. We found that three cytoplasmic proteins-ZC3H12A, ZC3H12B, and ZC3H12C-bound to motifs resembling the splice donor sequence, suggesting that these proteins are involved in degradation of cytoplasmic viral and/or unspliced transcripts. Structural analysis revealed that the RNA motif was not bound by the conventional C3H1 RNA-binding domain of ZC3H12B. Instead, the RNA motif was bound by the ZC3H12B's PilT N terminus (PIN) RNase domain, revealing a potential mechanism by which unconventional RBDs containing active sites or molecule-binding pockets could interact with short, structured RNA molecules. Our collection containing 145 high-resolution binding specificity models for 86 RBPs is the largest systematic resource for the analysis of human RBPs and will greatly facilitate future analysis of the various biological roles of this important class of proteins.
Collapse
Affiliation(s)
- Arttu Jolma
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Solna, Sweden
| | - Jilin Zhang
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Solna, Sweden
| | - Estefania Mondragón
- Department of Biochemistry and Molecular Biology, Mayo Clinic Graduate School of Biomedical Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota 55905, USA
| | - Ekaterina Morgunova
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Solna, Sweden
| | - Teemu Kivioja
- Genome-Scale Biology Program, University of Helsinki, FI-00014, Helsinki, Finland
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, Canada
| | - Yimeng Yin
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Solna, Sweden
| | - Fangjie Zhu
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Solna, Sweden
| | - Gleb Bourenkov
- European Molecular Biology Laboratory (EMBL), Hamburg Unit c/o DESY, D-22603 Hamburg, Germany
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, Canada
- Donnelly Centre, University of Toronto, M5S 3E1, Toronto, Canada
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, M5S 3G4, Toronto, Canada
- Department of Computer Science, University of Toronto, M5S 2E4, Toronto, Canada
- Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, M5S 1A8, Toronto, Canada
- Donnelly Centre, University of Toronto, M5S 3E1, Toronto, Canada
| | - Louis James Maher
- Department of Biochemistry and Molecular Biology, Mayo Clinic Graduate School of Biomedical Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota 55905, USA
| | - Jussi Taipale
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE-171 77, Solna, Sweden
- Genome-Scale Biology Program, University of Helsinki, FI-00014, Helsinki, Finland
- Department of Biochemistry, University of Cambridge, CB2 1QW, Cambridge, United Kingdom
| |
Collapse
|
7
|
Toivonen J, Kivioja T, Jolma A, Yin Y, Taipale J, Ukkonen E. Modular discovery of monomeric and dimeric transcription factor binding motifs for large data sets. Nucleic Acids Res 2019; 46:e44. [PMID: 29385521 PMCID: PMC5934673 DOI: 10.1093/nar/gky027] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 01/12/2018] [Indexed: 01/06/2023] Open
Abstract
In some dimeric cases of transcription factor (TF) binding, the specificity of dimeric motifs has been observed to differ notably from what would be expected were the two factors to bind to DNA independently of each other. Current motif discovery methods are unable to learn monomeric and dimeric motifs in modular fashion such that deviations from the expected motif would become explicit and the noise from dimeric occurrences would not corrupt monomeric models. We propose a novel modeling technique and an expectation maximization algorithm, implemented as software tool MODER, for discovering monomeric TF binding motifs and their dimeric combinations. Given training data and seeds for monomeric motifs, the algorithm learns in the same probabilistic framework a mixture model which represents monomeric motifs as standard position-specific probability matrices (PPMs), and dimeric motifs as pairs of monomeric PPMs, with associated orientation and spacing preferences. For dimers the model represents deviations from pure modular model of two independent monomers, thus making co-operative binding effects explicit. MODER can analyze in reasonable time tens of Mbps of training data. We validated the tool on HT-SELEX and ChIP-seq data. Our findings include some TFs whose expected model has palindromic symmetry but the observed model is directional.
Collapse
Affiliation(s)
- Jarkko Toivonen
- Department of Computer Science, P.O. Box 68, FI-00014 University of Helsinki, Helsinki, Finland
| | - Teemu Kivioja
- Genome-Scale Biology Program, P.O. Box 63, FI-00014 University of Helsinki, Helsinki, Finland
| | - Arttu Jolma
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, and Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Yimeng Yin
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, and Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Jussi Taipale
- Genome-Scale Biology Program, P.O. Box 63, FI-00014 University of Helsinki, Helsinki, Finland.,Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, and Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden.,Department of Biochemistry, University of Cambridge, CB2 1GA Cambridge, UK
| | - Esko Ukkonen
- Department of Computer Science, P.O. Box 68, FI-00014 University of Helsinki, Helsinki, Finland.,Helsinki Institute for Information Technology HIIT, University of Helsinki & Aalto University, Helsinki, Finland
| |
Collapse
|
8
|
Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT. The Human Transcription Factors. Cell 2019; 172:650-665. [PMID: 29425488 DOI: 10.1016/j.cell.2018.01.029] [Citation(s) in RCA: 1408] [Impact Index Per Article: 281.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 01/15/2018] [Accepted: 01/22/2018] [Indexed: 12/13/2022]
Abstract
Transcription factors (TFs) recognize specific DNA sequences to control chromatin and transcription, forming a complex system that guides expression of the genome. Despite keen interest in understanding how TFs control gene expression, it remains challenging to determine how the precise genomic binding sites of TFs are specified and how TF binding ultimately relates to regulation of transcription. This review considers how TFs are identified and functionally characterized, principally through the lens of a catalog of over 1,600 likely human TFs and binding motifs for two-thirds of them. Major classes of human TFs differ markedly in their evolutionary trajectories and expression patterns, underscoring distinct functions. TFs likewise underlie many different aspects of human physiology, disease, and variation, highlighting the importance of continued effort to understand TF-mediated gene regulation.
Collapse
Affiliation(s)
- Samuel A Lambert
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Laura F Campitelli
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Pratyush K Das
- Genome-Scale Biology Program, University of Helsinki, Helsinki, Finland
| | - Yimeng Yin
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Solna, Sweden
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Xiaoting Chen
- Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Jussi Taipale
- Genome-Scale Biology Program, University of Helsinki, Helsinki, Finland; Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Solna, Sweden; Department of Biochemistry, Cambridge University, Cambridge CB2 1GA, United Kingdom.
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada; Donnelly Centre, University of Toronto, Toronto, ON, Canada.
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA; Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA.
| |
Collapse
|
9
|
Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT. The Human Transcription Factors. Cell 2018; 175:598-599. [DOI: 10.1016/j.cell.2018.09.045] [Citation(s) in RCA: 226] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Morgunova E, Yin Y, Das PK, Jolma A, Zhu F, Popov A, Xu Y, Nilsson L, Taipale J. Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima. eLife 2018; 7:32963. [PMID: 29638214 PMCID: PMC5896879 DOI: 10.7554/elife.32963] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 02/12/2018] [Indexed: 11/17/2022] Open
Abstract
Most transcription factors (TFs) can bind to a population of sequences closely related to a single optimal site. However, some TFs can bind to two distinct sequences that represent two local optima in the Gibbs free energy of binding (ΔG). To determine the molecular mechanism behind this effect, we solved the structures of human HOXB13 and CDX2 bound to their two optimal DNA sequences, CAATAAA and TCGTAAA. Thermodynamic analyses by isothermal titration calorimetry revealed that both sites were bound with similar ΔG. However, the interaction with the CAA sequence was driven by change in enthalpy (ΔH), whereas the TCG site was bound with similar affinity due to smaller loss of entropy (ΔS). This thermodynamic mechanism that leads to at least two local optima likely affects many macromolecular interactions, as ΔG depends on two partially independent variables ΔH and ΔS according to the central equation of thermodynamics, ΔG = ΔH - TΔS. Genes are sections of DNA that carry the instructions needed to build other molecules including all the proteins that the cell needs to fulfill its role. The information in the DNA is stored as a code consisting of four chemical bases, often referred to simply as “A”, “C”, “G” and “T”. The order or sequence of these bases determines the role of a protein. Many organisms – including humans – are built of many different types of cells that perform unique roles. Almost all cells carry the same genetic information, but proteins called transcription factors can regulate the activity of genes so that only a relevant subset of genes is switched on at a particular time. Transcription factors glide along DNA and bind to short DNA sequences by attaching to the DNA bases directly or through bridges made up of water molecules. Two physical concepts known as enthalpy and entropy determine the strength of the connection. Enthalpy relates to how strong the chemical bonds that form between the transcription factors and the DNA bases are, compared to a situation where the transcription factor and DNA do not form a complex and bind to water molecules around them. Entropy measures the disorder of the system – the more disordered the solvent and protein-DNA complex are compared to solvent-containing free DNA and protein, the stronger the binding. A water molecule that bridges a DNA base with an amino-acid of a protein contributes to enthalpy, but results in loss of entropy, because the system becomes more ordered since the water molecule can no longer move freely. Most transcription factors can only bind to DNA sequences that are very similar to each other, but some transcription factors can recognize several different kinds of sequences, and until now it was not clear how they could do this. Morgunova et al. studied four different human transcription factors that can each bind to two distinct DNA sequences. The results showed that the transcription factors bound to both DNA sequences with similar strength, but via different mechanisms. For one DNA sequence, an enthalpy-based mechanism essentially ‘froze’ the transcription factor to the DNA through rigid water bridges. The other DNA sequence was bound equally strongly but through moving water molecules, because this increased the entropy of the system. It is possible that these mechanisms could also apply to many other molecules that interact with each other through water-molecule bridges. A better knowledge of the chemical bonds between transcription factors and DNA bases may in future help efforts to develop new treatments that depend on molecules being able to bind to other molecules. In addition, these findings may one day help scientists to predict how strongly two molecules will interact simply by knowing the structures of the molecules involved.
Collapse
Affiliation(s)
- Ekaterina Morgunova
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Yimeng Yin
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Pratyush K Das
- Genome-Scale Biology Research Program, University of Helsinki, Helsinki, Finland
| | - Arttu Jolma
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Fangjie Zhu
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | | | - You Xu
- Department of Bioscience and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Lennart Nilsson
- Department of Bioscience and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Jussi Taipale
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Genome-Scale Biology Research Program, University of Helsinki, Helsinki, Finland.,Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
11
|
Yin Y, Morgunova E, Jolma A, Kaasinen E, Sahu B, Khund-Sayeed S, Das PK, Kivioja T, Dave K, Zhong F, Nitta KR, Taipale M, Popov A, Ginno PA, Domcke S, Yan J, Schübeler D, Vinson C, Taipale J. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 2018; 356:356/6337/eaaj2239. [PMID: 28473536 DOI: 10.1126/science.aaj2239] [Citation(s) in RCA: 667] [Impact Index Per Article: 111.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 03/09/2017] [Indexed: 12/17/2022]
Abstract
The majority of CpG dinucleotides in the human genome are methylated at cytosine bases. However, active gene regulatory elements are generally hypomethylated relative to their flanking regions, and the binding of some transcription factors (TFs) is diminished by methylation of their target sequences. By analysis of 542 human TFs with methylation-sensitive SELEX (systematic evolution of ligands by exponential enrichment), we found that there are also many TFs that prefer CpG-methylated sequences. Most of these are in the extended homeodomain family. Structural analysis showed that homeodomain specificity for methylcytosine depends on direct hydrophobic interactions with the methylcytosine 5-methyl group. This study provides a systematic examination of the effect of an epigenetic DNA modification on human TF binding specificity and reveals that many developmentally important proteins display preference for mCpG-containing sequences.
Collapse
Affiliation(s)
- Yimeng Yin
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Ekaterina Morgunova
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Arttu Jolma
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Eevi Kaasinen
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Biswajyoti Sahu
- Genome-Scale Biology Program, Post Office Box 63, FI-00014 University of Helsinki, Helsinki, Finland
| | - Syed Khund-Sayeed
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Room 3128, Building 37, Bethesda, MD 20892, USA
| | - Pratyush K Das
- Genome-Scale Biology Program, Post Office Box 63, FI-00014 University of Helsinki, Helsinki, Finland
| | - Teemu Kivioja
- Genome-Scale Biology Program, Post Office Box 63, FI-00014 University of Helsinki, Helsinki, Finland
| | - Kashyap Dave
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Fan Zhong
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Kazuhiro R Nitta
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Minna Taipale
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Alexander Popov
- European Synchrotron Radiation Facility, 38043 Grenoble, France
| | - Paul A Ginno
- Friedrich-Miescher-Institute for Biomedical Research (FMI), Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Silvia Domcke
- Friedrich-Miescher-Institute for Biomedical Research (FMI), Maulbeerstrasse 66, 4058 Basel, Switzerland.,Faculty of Science, University of Basel, Petersplatz 1, 4003 Basel, Switzerland
| | - Jian Yan
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Dirk Schübeler
- Friedrich-Miescher-Institute for Biomedical Research (FMI), Maulbeerstrasse 66, 4058 Basel, Switzerland.,Faculty of Science, University of Basel, Petersplatz 1, 4003 Basel, Switzerland
| | - Charles Vinson
- Laboratory of Metabolism, National Cancer Institute, National Institutes of Health, Room 3128, Building 37, Bethesda, MD 20892, USA
| | - Jussi Taipale
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden. .,Genome-Scale Biology Program, Post Office Box 63, FI-00014 University of Helsinki, Helsinki, Finland
| |
Collapse
|
12
|
Yang L, Orenstein Y, Jolma A, Yin Y, Taipale J, Shamir R, Rohs R. Transcription factor family-specific DNA shape readout revealed by quantitative specificity models. Mol Syst Biol 2017; 13:910. [PMID: 28167566 PMCID: PMC5327724 DOI: 10.15252/msb.20167238] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Transcription factors (TFs) achieve DNA‐binding specificity through contacts with functional groups of bases (base readout) and readout of structural properties of the double helix (shape readout). Currently, it remains unclear whether DNA shape readout is utilized by only a few selected TF families, or whether this mechanism is used extensively by most TF families. We resequenced data from previously published HT‐SELEX experiments, the most extensive mammalian TF–DNA binding data available to date. Using these data, we demonstrated the contributions of DNA shape readout across diverse TF families and its importance in core motif‐flanking regions. Statistical machine‐learning models combined with feature‐selection techniques helped to reveal the nucleotide position‐dependent DNA shape readout in TF‐binding sites and the TF family‐specific position dependence. Based on these results, we proposed novel DNA shape logos to visualize the DNA shape preferences of TFs. Overall, this work suggests a way of obtaining mechanistic insights into TF–DNA binding without relying on experimentally solved all‐atom structures.
Collapse
Affiliation(s)
- Lin Yang
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA, USA
| | - Yaron Orenstein
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Arttu Jolma
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Yimeng Yin
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Jussi Taipale
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
13
|
Schmitges FW, Radovani E, Najafabadi HS, Barazandeh M, Campitelli LF, Yin Y, Jolma A, Zhong G, Guo H, Kanagalingam T, Dai WF, Taipale J, Emili A, Greenblatt JF, Hughes TR. Multiparameter functional diversity of human C2H2 zinc finger proteins. Genome Res 2016; 26:1742-1752. [PMID: 27852650 PMCID: PMC5131825 DOI: 10.1101/gr.209643.116] [Citation(s) in RCA: 103] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 10/24/2016] [Indexed: 11/24/2022]
Abstract
C2H2 zinc finger proteins represent the largest and most enigmatic class of human transcription factors. Their C2H2-ZF arrays are highly variable, indicating that most will have unique DNA binding motifs. However, most of the binding motifs have not been directly determined. In addition, little is known about whether or how these proteins regulate transcription. Most of the ∼700 human C2H2-ZF proteins also contain at least one KRAB, SCAN, BTB, or SET domain, suggesting that they may have common interacting partners and/or effector functions. Here, we report a multifaceted functional analysis of 131 human C2H2-ZF proteins, encompassing DNA binding sites, interacting proteins, and transcriptional response to genetic perturbation. We confirm the expected diversity in DNA binding motifs and genomic binding sites, and provide motif models for 78 previously uncharacterized C2H2-ZF proteins, most of which are unique. Surprisingly, the diversity in protein-protein interactions is nearly as high as diversity in DNA binding motifs: Most C2H2-ZF proteins interact with a unique spectrum of co-activators and co-repressors. Thus, multiparameter diversification likely underlies the evolutionary success of this large class of human proteins.
Collapse
Affiliation(s)
- Frank W Schmitges
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Ernest Radovani
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Hamed S Najafabadi
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Marjan Barazandeh
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Laura F Campitelli
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Yimeng Yin
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83, Sweden
| | - Arttu Jolma
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83, Sweden
| | - Guoqing Zhong
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Hongbo Guo
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Tharsan Kanagalingam
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Wei F Dai
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Jussi Taipale
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83, Sweden
| | - Andrew Emili
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Jack F Greenblatt
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Timothy R Hughes
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
14
|
Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, Enge M, Kivioja T, Morgunova E, Taipale J. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 2016. [DOI: 10.1038/nature18912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Morgunova E, Yin Y, Jolma A, Dave K, Schmierer B, Popov A, Eremina N, Nilsson L, Taipale J. Structural insights into the DNA-binding specificity of E2F family transcription factors. Nat Commun 2015; 6:10050. [PMID: 26632596 PMCID: PMC4686757 DOI: 10.1038/ncomms10050] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Accepted: 10/29/2015] [Indexed: 11/09/2022] Open
Abstract
The mammalian cell cycle is controlled by the E2F family of transcription factors. Typical E2Fs bind to DNA as heterodimers with the related dimerization partner (DP) proteins, whereas the atypical E2Fs, E2F7 and E2F8 contain two DNA-binding domains (DBDs) and act as repressors. To understand the mechanism of repression, we have resolved the structure of E2F8 in complex with DNA at atomic resolution. We find that the first and second DBDs of E2F8 resemble the DBDs of typical E2F and DP proteins, respectively. Using molecular dynamics simulations, biochemical affinity measurements and chromatin immunoprecipitation, we further show that both atypical and typical E2Fs bind to similar DNA sequences in vitro and in vivo. Our results represent the first crystal structure of an E2F protein with two DBDs, and reveal the mechanism by which atypical E2Fs can repress canonical E2F target genes and exert their negative influence on cell cycle progression.
Collapse
Affiliation(s)
- Ekaterina Morgunova
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Yimeng Yin
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Arttu Jolma
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Kashyap Dave
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Bernhard Schmierer
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Alexander Popov
- European Synchrotron Radiation Facility, Division of Experiments, 38 000 Grenoble, France
| | - Nadejda Eremina
- Department of Biochemistry and Biophysics, Stockholm University, SE 106 91, Sweden
| | - Lennart Nilsson
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Jussi Taipale
- Department of Biosciences and Nutrition, Karolinska Institutet, SE 141 83 Stockholm, Sweden.,Genome-Scale Biology Research Program, Faculty of Medicine, University of Helsinki, PO Box 63, FI-00014 Helsinki, Finland
| |
Collapse
|
16
|
Jolma A, Yin Y, Nitta KR, Dave K, Popov A, Taipale M, Enge M, Kivioja T, Morgunova E, Taipale J. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 2015; 527:384-8. [DOI: 10.1038/nature15518] [Citation(s) in RCA: 369] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 08/24/2015] [Indexed: 12/28/2022]
|
17
|
Nitta KR, Jolma A, Yin Y, Morgunova E, Kivioja T, Akhtar J, Hens K, Toivonen J, Deplancke B, Furlong EEM, Taipale J. Conservation of transcription factor binding specificities across 600 million years of bilateria evolution. eLife 2015; 4:e04837. [PMID: 25779349 PMCID: PMC4362205 DOI: 10.7554/elife.04837] [Citation(s) in RCA: 154] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 02/09/2015] [Indexed: 02/07/2023] Open
Abstract
Divergent morphology of species has largely been ascribed to genetic differences in the tissue-specific expression of proteins, which could be achieved by divergence in cis-regulatory elements or by altering the binding specificity of transcription factors (TFs). The relative importance of the latter has been difficult to assess, as previous systematic analyses of TF binding specificity have been performed using different methods in different species. To address this, we determined the binding specificities of 242 Drosophila TFs, and compared them to human and mouse data. This analysis revealed that TF binding specificities are highly conserved between Drosophila and mammals, and that for orthologous TFs, the similarity extends even to the level of very subtle dinucleotide binding preferences. The few human TFs with divergent specificities function in cell types not found in fruit flies, suggesting that evolution of TF specificities contributes to emergence of novel types of differentiated cells.
Collapse
Affiliation(s)
- Kazuhiro R Nitta
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| | - Arttu Jolma
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden,Genome-Scale Biology Program, University of Helsinki, Helsinki, Finland
| | - Yimeng Yin
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| | - Ekaterina Morgunova
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| | - Teemu Kivioja
- Genome-Scale Biology Program, University of Helsinki, Helsinki, Finland
| | - Junaid Akhtar
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Korneel Hens
- Institute of Bioengineering, School of Life Sciences, Swiss Federal Institute of Technology, Lausanne, Switzerland
| | - Jarkko Toivonen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Bart Deplancke
- Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Eileen E M Furlong
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jussi Taipale
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden,Genome-Scale Biology Program, University of Helsinki, Helsinki, Finland,For correspondence:
| |
Collapse
|
18
|
Yan J, Enge M, Whitington T, Dave K, Liu J, Sur I, Schmierer B, Jolma A, Kivioja T, Taipale M, Taipale J. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 2013; 154:801-13. [PMID: 23953112 DOI: 10.1016/j.cell.2013.07.034] [Citation(s) in RCA: 262] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Revised: 05/23/2013] [Accepted: 07/23/2013] [Indexed: 10/26/2022]
Abstract
During cell division, transcription factors (TFs) are removed from chromatin twice, during DNA synthesis and during condensation of chromosomes. How TFs can efficiently find their sites following these stages has been unclear. Here, we have analyzed the binding pattern of expressed TFs in human colorectal cancer cells. We find that binding of TFs is highly clustered and that the clusters are enriched in binding motifs for several major TF classes. Strikingly, almost all clusters are formed around cohesin, and loss of cohesin decreases both DNA accessibility and binding of TFs to clusters. We show that cohesin remains bound in S phase, holding the nascent sister chromatids together at the TF cluster sites. Furthermore, cohesin remains bound to the cluster sites when TFs are evicted in early M phase. These results suggest that cohesin-binding functions as a cellular memory that promotes re-establishment of TF clusters after DNA replication and chromatin condensation.
Collapse
Affiliation(s)
- Jian Yan
- Science for Life Laboratory, Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm 14183, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, Palin K, Vaquerizas JM, Vincentelli R, Luscombe NM, Hughes TR, Lemaire P, Ukkonen E, Kivioja T, Taipale J. DNA-binding specificities of human transcription factors. Cell 2013; 152:327-39. [PMID: 23332764 DOI: 10.1016/j.cell.2012.12.009] [Citation(s) in RCA: 855] [Impact Index Per Article: 77.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Revised: 08/18/2012] [Accepted: 12/03/2012] [Indexed: 12/23/2022]
Abstract
Although the proteins that read the gene regulatory code, transcription factors (TFs), have been largely identified, it is not well known which sequences TFs can recognize. We have analyzed the sequence-specific binding of human TFs using high-throughput SELEX and ChIP sequencing. A total of 830 binding profiles were obtained, describing 239 distinctly different binding specificities. The models represent the majority of human TFs, approximately doubling the coverage compared to existing systematic studies. Our results reveal additional specificity determinants for a large number of factors for which a partial specificity was known, including a commonly observed A- or T-rich stretch that flanks the core motifs. Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.
Collapse
Affiliation(s)
- Arttu Jolma
- Science for Life Center, Department of Biosciences and Nutrition, Karolinska Institutet, 141 83 Huddinge, Sweden
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
Cell differentiation during development is controlled by extracellular morphogens, which induce responding cells to differentiate into distinct cell fates based on the dose of morphogen they receive. Genes that specify the distinct cell fates are differentially responsive to morphogens, and the extracellular morphogen gradient is converted in responding cells to graded activity of transcription factors. In the case of Hedgehog, the gradient is converted to opposing gradients of transcriptional activator and repressor forms of the transcription factor Cubitus interruptus (Ci). It has been generally assumed that the balance between activator and repressor determines target gene responses within this gradient. However, new evidence shows that enhancers can respond selectively to the activator and repressor forms of Ci, and that this selectivity is determined by the affinity of Ci sites within the enhancers.
Collapse
Affiliation(s)
- Thomas Whitington
- Department of Biosciences and Nutrition, Karolinska Institutet, SE-141 83 Stockholm, Sweden
| | | | | |
Collapse
|
21
|
Abstract
Transcription of genes during development and in response to environmental stimuli is determined by genomic DNA sequence. The DNA sequences regulating transcription are read by sequence-specific transcription factors (TFs) that recognize relatively short sequences, generally between four and twenty base pairs in length. Transcriptional regulation generally requires binding of multiple TFs in close proximity to each other. Mechanistic understanding of transcription in an organism thus requires detailed knowledge of binding affinities of all its TFs to all possible DNA sequences, and the co-operative interactions between the TFs. However, very little is known about such co-operative binding interactions, and even the simple TF-DNA binding information exists only for a very small proportion of all TFs - for example, mammals have approximately 1,300-2,000 TFs [1, 2], yet the largest public databases for TF binding specificity, Jaspar and Uniprobe [3, 4] currently list only approximately 500 moderate to high resolution profiles for human or mouse. This lack of knowledge is in part due to the fact that analysis of TF DNA binding has been laborious and expensive. In this chapter, we review methods that can be used to determine binding specificity of TFs to DNA, mainly focusing on recently developed assays that allow high-resolution analysis of TF binding specificity in relatively high throughput.
Collapse
Affiliation(s)
- Arttu Jolma
- Department of Biosciences and Nutrition, SE-171 77, Stockholm, Sweden,
| | | |
Collapse
|
22
|
Jolma A, Kivioja T, Toivonen J, Cheng L, Wei G, Enge M, Taipale M, Vaquerizas JM, Yan J, Sillanpää MJ, Bonke M, Palin K, Talukder S, Hughes TR, Luscombe NM, Ukkonen E, Taipale J. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res 2010; 20:861-73. [PMID: 20378718 PMCID: PMC2877582 DOI: 10.1101/gr.100552.109] [Citation(s) in RCA: 307] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 03/22/2010] [Indexed: 01/15/2023]
Abstract
The genetic code-the binding specificity of all transfer-RNAs--defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the approximately 1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers.
Collapse
Affiliation(s)
- Arttu Jolma
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| | - Teemu Kivioja
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
- Department of Computer Science, FI-00014 University of Helsinki, Helsinki, Finland
| | - Jarkko Toivonen
- Department of Computer Science, FI-00014 University of Helsinki, Helsinki, Finland
| | - Lu Cheng
- Department of Computer Science, FI-00014 University of Helsinki, Helsinki, Finland
| | - Gonghong Wei
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
| | - Martin Enge
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| | - Mikko Taipale
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
| | - Juan M. Vaquerizas
- EMBL–European Bioinformatics Institute, Cambridge CB10 1SD, United Kingdom
| | - Jian Yan
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
| | - Mikko J. Sillanpää
- Department of Mathematics and Statistics, FI-00014 University of Helsinki, Helsinki, Finland
| | - Martin Bonke
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
| | - Kimmo Palin
- Department of Computer Science, FI-00014 University of Helsinki, Helsinki, Finland
| | - Shaheynoor Talukder
- Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M4T 2J4, Canada
| | - Timothy R. Hughes
- Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, ON M4T 2J4, Canada
| | | | - Esko Ukkonen
- Department of Computer Science, FI-00014 University of Helsinki, Helsinki, Finland
| | - Jussi Taipale
- Department of Molecular Medicine, National Public Health Institute (KTL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland
- Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
23
|
Jolma A, Ames D, Horning N, Mitasova H, Neteler M, Racicot A, Sutton T. Chapter Ten Free and Open Source Geospatial Tools for Environmental Modelling and Management. ACTA ACUST UNITED AC 2008. [DOI: 10.1016/s1574-101x(08)00610-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|