1
|
Wells JN, Chang NC, McCormick J, Coleman C, Ramos N, Jin B, Feschotte C. Transposable elements drive the evolution of metazoan zinc finger genes. Genome Res 2023; 33:1325-1339. [PMID: 37714714 PMCID: PMC10547256 DOI: 10.1101/gr.277966.123] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 06/15/2023] [Indexed: 09/17/2023]
Abstract
Cys2-His2 zinc finger genes (ZNFs) form the largest family of transcription factors in metazoans. ZNF evolution is highly dynamic and characterized by the rapid expansion and contraction of numerous subfamilies across the animal phylogeny. The forces and mechanisms underlying rapid ZNF evolution remain poorly understood, but there is growing evidence that, in tetrapods, the targeting and repression of lineage-specific transposable elements (TEs) plays a critical role in the evolution of the Krüppel-associated box ZNF (KZNF) subfamily. Currently, it is unknown whether this function and coevolutionary relationship is unique to KZNFs or is a broader feature of metazoan ZNFs. Here, we present evidence that genomic conflict with TEs has been a central driver of the diversification of ZNFs in animals. Sampling from 3221 genome assemblies, we show that the copy number of retroelements correlates with that of ZNFs across at least 750 million years of metazoan evolution. Using computational predictions, we show that ZNFs preferentially bind TEs in diverse animal species. We further investigate the largest ZNF subfamily found in cyprinid fish, which is characterized by a conserved sequence we dubbed the fish N-terminal zinc finger-associated (FiNZ) domain. Zebrafish possess approximately 700 FiNZ-ZNFs, many of which are evolving adaptively under positive selection. Like mammalian KZNFs, most zebrafish FiNZ-ZNFs are expressed at the onset of zygotic genome activation, and blocking their translation using morpholinos during early embryogenesis results in derepression of transcriptionally active TEs. Together, these data suggest that ZNF diversification has been intimately connected to TE expansion throughout animal evolution.
Collapse
Affiliation(s)
- Jonathan N Wells
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850, USA;
| | - Ni-Chen Chang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850, USA
| | - John McCormick
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850, USA
| | - Caitlyn Coleman
- Department of Cell Biology, Microbiology and Molecular Biology, University of South Florida, Tampa, Florida 33620, USA
| | - Nathalie Ramos
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850, USA
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Bozhou Jin
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14850, USA;
| |
Collapse
|
2
|
Wetzel JL, Zhang K, Singh M. Learning probabilistic protein-DNA recognition codes from DNA-binding specificities using structural mappings. Genome Res 2022; 32:gr.276606.122. [PMID: 36123148 PMCID: PMC9528988 DOI: 10.1101/gr.276606.122] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 07/30/2022] [Indexed: 11/25/2022]
Abstract
Knowledge of how proteins interact with DNA is essential for understanding gene regulation. Although DNA-binding specificities for thousands of transcription factors (TFs) have been determined, the specific amino acid-base interactions comprising their structural interfaces are largely unknown. This lack of resolution hampers attempts to leverage these data in order to predict specificities for uncharacterized TFs or TFs mutated in disease. Here we introduce recognition code learning via automated mapping of protein-DNA structural interfaces (rCLAMPS), a probabilistic approach that uses DNA-binding specificities for TFs from the same structural family to simultaneously infer both which nucleotide positions are contacted by particular amino acids within the TF as well as a recognition code that relates each base-contacting amino acid to nucleotide preferences at the DNA positions it contacts. We apply rCLAMPS to homeodomains, the second largest family of TFs in metazoans and show that it learns a highly effective recognition code that can predict de novo DNA-binding specificities for TFs. Furthermore, we show that the inferred amino acid-nucleotide contacts reveal whether and how nucleotide preferences at individual binding site positions are altered by mutations within TFs. Our approach is an important step toward automatically uncovering the determinants of protein-DNA specificity from large compendia of DNA-binding specificities and inferring the altered functionalities of TFs mutated in disease.
Collapse
Affiliation(s)
- Joshua L Wetzel
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Kaiqian Zhang
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - Mona Singh
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
3
|
Biological databases and their application. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00021-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
4
|
Boisvert O, Létourneau D, Delattre P, Tremblay C, Jolibois É, Montagne M, Lavigne P. Zinc Fingers 10 and 11 of Miz-1 undergo conformational exchange to achieve specific DNA binding. Structure 2021; 30:623-636.e5. [PMID: 34963061 DOI: 10.1016/j.str.2021.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 10/08/2021] [Accepted: 12/01/2021] [Indexed: 11/18/2022]
Abstract
Miz-1 (ZBTB17) is a poly-zinc finger BTB/POZ transcription factor with 12 consecutive C2H2 zinc fingers (ZFs) that binds transcriptional start sites (TSSs) to regulate the expression of genes involved in cell development and proliferation. As of now, it is not known which of the 12 consecutive ZFs are responsible for the recognition of the 24 base pair consensus sequence found at these TSSs. Evidence suggests ZFs 7-12 plays this role. We provide validation for this and describe the structural and dynamical characterization of unprecedented conformational exchange in the linker between ZFs 10 and 11. This conformational exchange uncouples ZFs 7-10 from 11 and 12 and promotes a scanning-recognition mechanism through which the two segments cooperate to bind two sub-sites at both ends of the consensus. We further show that this can result in the coiling of TSSs as part of Miz-1's mechanism of transcriptional transactivation.
Collapse
Affiliation(s)
- Olivier Boisvert
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada
| | - Danny Létourneau
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada
| | - Patrick Delattre
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada
| | - Cynthia Tremblay
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada
| | - Émilie Jolibois
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada
| | - Martin Montagne
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada
| | - Pierre Lavigne
- Département de biochimie et de génomique fonctionnelle, Institut de Pharmacologie de Sherbrooke and PROTÉO, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, 3001 12 Avenue N, Sherbrooke, Quebec J1H 5N4, Canada.
| |
Collapse
|
5
|
Meseguer A, Årman F, Fornes O, Molina-Fernández R, Bonet J, Fernandez-Fuentes N, Oliva B. On the prediction of DNA-binding preferences of C2H2-ZF domains using structural models: application on human CTCF. NAR Genom Bioinform 2021; 2:lqaa046. [PMID: 33575598 PMCID: PMC7671317 DOI: 10.1093/nargab/lqaa046] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 05/07/2020] [Accepted: 06/10/2020] [Indexed: 12/25/2022] Open
Abstract
Cis2-His2 zinc finger (C2H2-ZF) proteins are the largest family of transcription factors in human and higher metazoans. To date, the DNA-binding preferences of many members of this family remain unknown. We have developed a computational method to predict their DNA-binding preferences. We have computed theoretical position weight matrices (PWMs) of proteins composed by C2H2-ZF domains, with the only requirement of an input structure. We have predicted more than two-third of a single zinc-finger domain binding site for about 70% variants of Zif268, a classical member of this family. We have successfully matched between 60 and 90% of the binding-site motif of examples of proteins composed by three C2H2-ZF domains in JASPAR, a standard database of PWMs. The tests are used as a proof of the capacity to scan a DNA fragment and find the potential binding sites of transcription-factors formed by C2H2-ZF domains. As an example, we have tested the approach to predict the DNA-binding preferences of the human chromatin binding factor CTCF. We offer a server to model the structure of a zinc-finger protein and predict its PWM.
Collapse
Affiliation(s)
- Alberto Meseguer
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Filip Årman
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Ruben Molina-Fernández
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Jaume Bonet
- Laboratory of Protein Design & Immunoengineering, School of Engineering, Ecole Polytechnique Federale de Lausanne, Lausanne 1015, Vaud, Switzerland
| | - Narcis Fernandez-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic, Catalonia 08500, Spain
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| |
Collapse
|
6
|
Wan H, Li JM, Ding H, Lin SX, Tu SQ, Tian XH, Hu JP, Chang S. An Overview of Computational Tools of Nucleic Acid Binding Site Prediction for Site-specific Proteins and Nucleases. Protein Pept Lett 2019; 27:370-384. [PMID: 31746287 DOI: 10.2174/0929866526666191028162302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 05/24/2019] [Accepted: 09/24/2019] [Indexed: 12/26/2022]
Abstract
Understanding the interaction mechanism of proteins and nucleic acids is one of the most fundamental problems for genome editing with engineered nucleases. Due to some limitations of experimental investigations, computational methods have played an important role in obtaining the knowledge of protein-nucleic acid interaction. Over the past few years, dozens of computational tools have been used for identification of nucleic acid binding site for site-specific proteins and design of site-specific nucleases because of their significant advantages in genome editing. Here, we review existing widely-used computational tools for target prediction of site-specific proteins as well as off-target prediction of site-specific nucleases. This article provides a list of on-line prediction tools according to their features followed by the description of computational methods used by these tools, which range from various sequence mapping algorithms (like Bowtie, FetchGWI and BLAST) to different machine learning methods (such as Support Vector Machine, hidden Markov models, Random Forest, elastic network and deep neural networks). We also make suggestions on the further development in improving the accuracy of prediction methods. This survey will provide a reference guide for computational biologists working in the field of genome editing.
Collapse
Affiliation(s)
- Hua Wan
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Jian-Ming Li
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Huang Ding
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Shuo-Xin Lin
- Department of Electrical and Computer Engineering, James Clark School of Engineering, University of Maryland, College Park, MD 20742, United States
| | - Shu-Qin Tu
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Xu-Hong Tian
- College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
| | - Jian-Ping Hu
- College of Pharmacy and Biological Engineering, Sichuan Industrial Institute of Antibiotics, Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, Antibiotics Research and Re-Evaluation Key Laboratory of Sichuan Province, Chengdu University, Chengdu 610106, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
7
|
Blanco JD, Radusky L, Climente-González H, Serrano L. FoldX accurate structural protein-DNA binding prediction using PADA1 (Protein Assisted DNA Assembly 1). Nucleic Acids Res 2019; 46:3852-3863. [PMID: 29608705 PMCID: PMC5934639 DOI: 10.1093/nar/gky228] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open
Abstract
The speed at which new genomes are being sequenced highlights the need for genome-wide methods capable of predicting protein–DNA interactions. Here, we present PADA1, a generic algorithm that accurately models structural complexes and predicts the DNA-binding regions of resolved protein structures. PADA1 relies on a library of protein and double-stranded DNA fragment pairs obtained from a training set of 2103 DNA–protein complexes. It includes a fast statistical force field computed from atom-atom distances, to evaluate and filter the 3D docking models. Using published benchmark validation sets and 212 DNA–protein structures published after 2016 we predicted the DNA-binding regions with an RMSD of <1.8 Å per residue in >95% of the cases. We show that the quality of the docked templates is compatible with FoldX protein design tool suite to identify the crystallized DNA molecule sequence as the most energetically favorable in 80% of the cases. We highlighted the biological potential of PADA1 by reconstituting DNA and protein conformational changes upon protein mutagenesis of a meganuclease and its variants, and by predicting DNA-binding regions and nucleotide sequences in proteins crystallized without DNA. These results opens up new perspectives for the engineering of DNA–protein interfaces.
Collapse
Affiliation(s)
- Javier Delgado Blanco
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Leandro Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Héctor Climente-González
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
8
|
Zamanighomi M, Lin Z, Wang Y, Jiang R, Wong WH. Predicting transcription factor binding motifs from DNA-binding domains, chromatin accessibility and gene expression data. Nucleic Acids Res 2017; 45:5666-5677. [PMID: 28472398 PMCID: PMC5449588 DOI: 10.1093/nar/gkx358] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 04/20/2017] [Indexed: 01/08/2023] Open
Abstract
Transcription factors (TFs) play crucial roles in regulating gene expression through interactions with specific DNA sequences. Recently, the sequence motif of almost 400 human TFs have been identified using high-throughput SELEX sequencing. However, there remain a large number of TFs (∼800) with no high-throughput-derived binding motifs. Computational methods capable of associating known motifs to such TFs will avoid tremendous experimental efforts and enable deeper understanding of transcriptional regulatory functions. We present a method to associate known motifs to TFs (MATLAB code is available in Supplementary Materials). Our method is based on a probabilistic framework that not only exploits DNA-binding domains and specificities, but also integrates open chromatin, gene expression and genomic data to accurately infer monomeric and homodimeric binding motifs. Our analysis resulted in the assignment of motifs to 200 TFs with no SELEX-derived motifs, roughly a 50% increase compared to the existing coverage.
Collapse
Affiliation(s)
- Mahdi Zamanighomi
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Zhixiang Lin
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Yong Wang
- Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100190, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
9
|
Ruffalo M, Bar-Joseph Z. Genome wide predictions of miRNA regulation by transcription factors. Bioinformatics 2017; 32:i746-i754. [PMID: 27587697 DOI: 10.1093/bioinformatics/btw452] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. RESULTS To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. AVAILABILITY AND IMPLEMENTATION Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ CONTACT zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew Ruffalo
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 15213
| | - Ziv Bar-Joseph
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 15213
| |
Collapse
|
10
|
Farrel A, Guo JT. An efficient algorithm for improving structure-based prediction of transcription factor binding sites. BMC Bioinformatics 2017; 18:342. [PMID: 28715997 PMCID: PMC5514533 DOI: 10.1186/s12859-017-1755-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Accepted: 07/12/2017] [Indexed: 01/07/2023] Open
Abstract
Background Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction. Results A DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites. Conclusion Our new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1755-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alvin Farrel
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
| |
Collapse
|
11
|
Bédard M, Roy V, Montagne M, Lavigne P. Structural Insights into c-Myc-interacting Zinc Finger Protein-1 (Miz-1) Delineate Domains Required for DNA Scanning and Sequence-specific Binding. J Biol Chem 2016; 292:3323-3340. [PMID: 28035002 DOI: 10.1074/jbc.m116.748699] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 12/13/2016] [Indexed: 11/06/2022] Open
Abstract
c-Myc-interacting zinc finger protein-1 (Miz-1) is a poly-Cys2His2 zinc finger (ZF) transcriptional regulator of many cell cycle genes. A Miz-1 DNA sequence consensus has recently been identified and has also unveiled Miz-1 functions in other cellular processes, underscoring its importance in the cell. Miz-1 contains 13 ZFs, but it is unknown why Miz-1 has so many ZFs and whether they recognize and bind DNA sequences in a typical fashion. Here, we used NMR to deduce the role of Miz-1 ZFs 1-4 in detecting the Miz-1 consensus sequence and preventing nonspecific DNA binding. In the construct containing the first 4 ZFs, we observed that ZFs 3 and 4 form an unusual compact and stable structure that restricts their motions. Disruption of this compact structure by an electrostatically mismatched A86K mutation profoundly affected the DNA binding properties of the WT construct. On the one hand, Miz1-4WT was found to bind the Miz-1 DNA consensus sequence weakly and through ZFs 1-3 only. On the other hand, the four ZFs in the structurally destabilized Miz1-4A86K mutant bound to the DNA consensus with a 30-fold increase in affinity (100 nm). The formation of such a thermodynamically stable but nonspecific complex is expected to slow down the rate of DNA scanning by Miz-1 during the search for its consensus sequence. Interestingly, we found that the motif stabilizing the compact structure between ZFs 3 and 4 is conserved and enriched in other long poly-ZF proteins. As discussed in detail, our findings support a general role of compact inter-ZF structures in minimizing the formation of off-target DNA complexes.
Collapse
Affiliation(s)
- Mikaël Bédard
- Département de Biochimie, Institut de Pharmacologie de Sherbrooke, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke J1H 5N4, Canada; Regroupement Stratégique sur la Fonction, la Structure, et l'Ingénierie des Protéines (PROTEO), Université Laval, Québec G1V 0A6, Canada; Groupe de Recherche Axé sur la Structure des Protéines (GRASP), McGill University, Montréal, Québec H3G 0B1, Canada
| | - Vincent Roy
- Département de Biochimie, Institut de Pharmacologie de Sherbrooke, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke J1H 5N4, Canada; Regroupement Stratégique sur la Fonction, la Structure, et l'Ingénierie des Protéines (PROTEO), Université Laval, Québec G1V 0A6, Canada; Groupe de Recherche Axé sur la Structure des Protéines (GRASP), McGill University, Montréal, Québec H3G 0B1, Canada
| | - Martin Montagne
- Département de Biochimie, Institut de Pharmacologie de Sherbrooke, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke J1H 5N4, Canada; Regroupement Stratégique sur la Fonction, la Structure, et l'Ingénierie des Protéines (PROTEO), Université Laval, Québec G1V 0A6, Canada; Groupe de Recherche Axé sur la Structure des Protéines (GRASP), McGill University, Montréal, Québec H3G 0B1, Canada
| | - Pierre Lavigne
- Département de Biochimie, Institut de Pharmacologie de Sherbrooke, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke J1H 5N4, Canada; Regroupement Stratégique sur la Fonction, la Structure, et l'Ingénierie des Protéines (PROTEO), Université Laval, Québec G1V 0A6, Canada; Groupe de Recherche Axé sur la Structure des Protéines (GRASP), McGill University, Montréal, Québec H3G 0B1, Canada.
| |
Collapse
|
12
|
Corona RI, Guo JT. Statistical analysis of structural determinants for protein-DNA-binding specificity. Proteins 2016; 84:1147-61. [PMID: 27147539 DOI: 10.1002/prot.25061] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 04/21/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
DNA-binding proteins play critical roles in biological processes including gene expression, DNA packaging and DNA repair. They bind to DNA target sequences with different degrees of binding specificity, ranging from highly specific (HS) to nonspecific (NS). Alterations of DNA-binding specificity, due to either genetic variation or somatic mutations, can lead to various diseases. In this study, a comparative analysis of protein-DNA complex structures was carried out to investigate the structural features that contribute to binding specificity. Protein-DNA complexes were grouped into three general classes based on degrees of binding specificity: HS, multispecific (MS), and NS. Our results show a clear trend of structural features among the three classes, including amino acid binding propensities, simple and complex hydrogen bonds, major/minor groove and base contacts, and DNA shape. We found that aspartate is enriched in HS DNA binding proteins and predominately binds to a cytosine through a single hydrogen bond or two consecutive cytosines through bidentate hydrogen bonds. Aromatic residues, histidine and tyrosine, are highly enriched in the HS and MS groups and may contribute to specific binding through different mechanisms. To further investigate the role of protein flexibility in specific protein-DNA recognition, we analyzed the conformational changes between the bound and unbound states of DNA-binding proteins and structural variations. The results indicate that HS and MS DNA-binding domains have larger conformational changes upon DNA-binding and larger degree of flexibility in both bound and unbound states. Proteins 2016; 84:1147-1161. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Rosario I Corona
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223
| |
Collapse
|
13
|
Determination of specificity influencing residues for key transcription factor families. QUANTITATIVE BIOLOGY 2015; 3:115-123. [PMID: 26753103 DOI: 10.1007/s40484-015-0045-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (Direct Information, PSICOV and adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs forhomeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at stormo.wustl.edu/SpecPred.
Collapse
|
14
|
Nadimpalli S, Persikov AV, Singh M. Pervasive variation of transcription factor orthologs contributes to regulatory network evolution. PLoS Genet 2015; 11:e1005011. [PMID: 25748510 PMCID: PMC4351887 DOI: 10.1371/journal.pgen.1005011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 01/18/2015] [Indexed: 01/17/2023] Open
Abstract
Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.
Collapse
Affiliation(s)
- Shilpa Nadimpalli
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Anton V. Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
15
|
Persikov AV, Wetzel JL, Rowland EF, Oakes BL, Xu DJ, Singh M, Noyes MB. A systematic survey of the Cys2His2 zinc finger DNA-binding landscape. Nucleic Acids Res 2015; 43:1965-84. [PMID: 25593323 PMCID: PMC4330361 DOI: 10.1093/nar/gku1395] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Cys2His2 zinc fingers (C2H2-ZFs) comprise the largest class of metazoan DNA-binding domains. Despite this domain's well-defined DNA-recognition interface, and its successful use in the design of chimeric proteins capable of targeting genomic regions of interest, much remains unknown about its DNA-binding landscape. To help bridge this gap in fundamental knowledge and to provide a resource for design-oriented applications, we screened large synthetic protein libraries to select binding C2H2-ZF domains for each possible three base pair target. The resulting data consist of >160 000 unique domain-DNA interactions and comprise the most comprehensive investigation of C2H2-ZF DNA-binding interactions to date. An integrated analysis of these independent screens yielded DNA-binding profiles for tens of thousands of domains and led to the successful design and prediction of C2H2-ZF DNA-binding specificities. Computational analyses uncovered important aspects of C2H2-ZF domain-DNA interactions, including the roles of within-finger context and domain position on base recognition. We observed the existence of numerous distinct binding strategies for each possible three base pair target and an apparent balance between affinity and specificity of binding. In sum, our comprehensive data help elucidate the complex binding landscape of C2H2-ZF domains and provide a foundation for efforts to determine, predict and engineer their DNA-binding specificities.
Collapse
Affiliation(s)
- Anton V Persikov
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Joshua L Wetzel
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Elizabeth F Rowland
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Benjamin L Oakes
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Denise J Xu
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Marcus B Noyes
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
16
|
Slattery M, Zhou T, Yang L, Dantas Machado AC, Gordân R, Rohs R. Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 2014; 39:381-99. [PMID: 25129887 DOI: 10.1016/j.tibs.2014.07.002] [Citation(s) in RCA: 332] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 07/11/2014] [Accepted: 07/15/2014] [Indexed: 12/21/2022]
Abstract
Transcription factors (TFs) influence cell fate by interpreting the regulatory DNA within a genome. TFs recognize DNA in a specific manner; the mechanisms underlying this specificity have been identified for many TFs based on 3D structures of protein-DNA complexes. More recently, structural views have been complemented with data from high-throughput in vitro and in vivo explorations of the DNA-binding preferences of many TFs. Together, these approaches have greatly expanded our understanding of TF-DNA interactions. However, the mechanisms by which TFs select in vivo binding sites and alter gene expression remain unclear. Recent work has highlighted the many variables that influence TF-DNA binding, while demonstrating that a biophysical understanding of these many factors will be central to understanding TF function.
Collapse
Affiliation(s)
- Matthew Slattery
- Department of Biomedical Sciences, University of Minnesota Medical School, Duluth, MN 55812, USA; Developmental Biology Center, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Tianyin Zhou
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Lin Yang
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Ana Carolina Dantas Machado
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Raluca Gordân
- Center for Genomic and Computational Biology, Departments of Biostatistics and Bioinformatics, Computer Science, and Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, USA.
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
17
|
Siggers T, Reddy J, Barron B, Bulyk ML. Diversification of transcription factor paralogs via noncanonical modularity in C2H2 zinc finger DNA binding. Mol Cell 2014; 55:640-8. [PMID: 25042805 DOI: 10.1016/j.molcel.2014.06.019] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 05/27/2014] [Accepted: 06/09/2014] [Indexed: 12/25/2022]
Abstract
A major challenge in obtaining a full molecular description of evolutionary adaptation is to characterize how transcription factor (TF) DNA-binding specificity can change. To identify mechanisms of TF diversification, we performed detailed comparisons of yeast C2H2 ZF proteins with identical canonical recognition residues that are expected to bind the same DNA sequences. Unexpectedly, we found that ZF proteins can adapt to recognize new binding sites in a modular fashion whereby binding to common core sites remains unaffected. We identified two distinct mechanisms, conserved across multiple Ascomycota species, by which this molecular adaptation occurred. Our results suggest a route for TF evolution that alleviates negative pleiotropic effects by modularly gaining new binding sites. These findings expand our current understanding of ZF DNA binding and provide evidence for paralogous ZFs utilizing alternate modes of DNA binding to recognize unique sets of noncanonical binding sites.
Collapse
Affiliation(s)
- Trevor Siggers
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Biology, Boston University, Boston, MA 02215, USA.
| | - Jessica Reddy
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Brian Barron
- Department of Biology, Boston University, Boston, MA 02215, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
18
|
Gupta A, Christensen RG, Bell HA, Goodwin M, Patel RY, Pandey M, Enuameh MS, Rayla AL, Zhu C, Thibodeau-Beganny S, Brodsky MH, Joung JK, Wolfe SA, Stormo GD. An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins. Nucleic Acids Res 2014; 42:4800-12. [PMID: 24523353 PMCID: PMC4005693 DOI: 10.1093/nar/gku132] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 01/21/2014] [Accepted: 01/22/2014] [Indexed: 11/17/2022] Open
Abstract
Cys(2)-His(2) zinc finger proteins (ZFPs) are the largest family of transcription factors in higher metazoans. They also represent the most diverse family with regards to the composition of their recognition sequences. Although there are a number of ZFPs with characterized DNA-binding preferences, the specificity of the vast majority of ZFPs is unknown and cannot be directly inferred by homology due to the diversity of recognition residues present within individual fingers. Given the large number of unique zinc fingers and assemblies present across eukaryotes, a comprehensive predictive recognition model that could accurately estimate the DNA-binding specificity of any ZFP based on its amino acid sequence would have great utility. Toward this goal, we have used the DNA-binding specificities of 678 two-finger modules from both natural and artificial sources to construct a random forest-based predictive model for ZFP recognition. We find that our recognition model outperforms previously described determinant-based recognition models for ZFPs, and can successfully estimate the specificity of naturally occurring ZFPs with previously defined specificities.
Collapse
Affiliation(s)
- Ankit Gupta
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan G. Christensen
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Heather A. Bell
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Mathew Goodwin
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Ronak Y. Patel
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Manishi Pandey
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Metewo Selase Enuameh
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Amy L. Rayla
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Cong Zhu
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Stacey Thibodeau-Beganny
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Michael H. Brodsky
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - J. Keith Joung
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Scot A. Wolfe
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Gary D. Stormo
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
19
|
Koe CT, Li S, Rossi F, Wong JJL, Wang Y, Zhang Z, Chen K, Aw SS, Richardson HE, Robson P, Sung WK, Yu F, Gonzalez C, Wang H. The Brm-HDAC3-Erm repressor complex suppresses dedifferentiation in Drosophila type II neuroblast lineages. eLife 2014; 3:e01906. [PMID: 24618901 PMCID: PMC3944433 DOI: 10.7554/elife.01906] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The control of self-renewal and differentiation of neural stem and progenitor cells is a crucial issue in stem cell and cancer biology. Drosophila type II neuroblast lineages are prone to developing impaired neuroblast homeostasis if the limited self-renewing potential of intermediate neural progenitors (INPs) is unrestrained. Here, we demonstrate that Drosophila SWI/SNF chromatin remodeling Brahma (Brm) complex functions cooperatively with another chromatin remodeling factor, Histone deacetylase 3 (HDAC3) to suppress the formation of ectopic type II neuroblasts. We show that multiple components of the Brm complex and HDAC3 physically associate with Earmuff (Erm), a type II-specific transcription factor that prevents dedifferentiation of INPs into neuroblasts. Consistently, the predicted Erm-binding motif is present in most of known binding loci of Brm. Furthermore, brm and hdac3 genetically interact with erm to prevent type II neuroblast overgrowth. Thus, the Brm-HDAC3-Erm repressor complex suppresses dedifferentiation of INPs back into type II neuroblasts. DOI:http://dx.doi.org/10.7554/eLife.01906.001 Stem cells show great promise for repairing damaged tissue, and maybe even generating new organs, but stem cell therapies will only be successful if researchers can understand and control the behaviour of stem cells in the lab. Neural stem cells or ‘neuroblasts’ from the brains of larval fruit flies have become a popular model for studying these processes, and one type of neuroblast—known as a ‘type II’ neuroblast—is similar to mammalian neural stem cells in many ways. When type II neuroblasts divide, they generate another neuroblast and a second cell called an intermediate neural progenitor (INP) cell. This progenitor cell then matures and undergoes a limited number of divisions to generate more INP cells and cells called ganglion mother cells. The process by which stem cells and INP cells become specific types of cells is known as differentiation. However, under certain circumstances, the INP cells can undergo the opposite process, which is called dedifferentiation, and become ‘ectopic neuroblasts’. This can give rise to tumors, so cells must employ a mechanism to prevent dedifferentiation. Researchers have known that a protein specifically expressed in INP cells called Earmuff is involved in this process, but many of the details have remained hidden. Now, Koe et al. have discovered that a multi-protein complex containing Earmuff and a number of other proteins—Brahma and HDAC3—have important roles in preventing dedifferentiation. All three proteins are involved in different aspects of gene expression: Earmuff is a transcription factor that controls the process by which the genes in DNA are transcribed to make molecules of messenger RNA; Brahma and HDAC3 are both involved in a process called chromatin remodeling. The DNA inside cells is packaged into a compact structure known as chromatin, and chromatin remodeling involves partially unpacking this structure so that transcription factors and other proteins can have access to the DNA. Koe et al. also showed that Earmuff, Brahma and HDAC3 combine to form a complex that prevents dedifferentiation. An immediate priority is to identify those genes whose expression is regulated by this complex in order to prevent dedifferentiation. DOI:http://dx.doi.org/10.7554/eLife.01906.002
Collapse
Affiliation(s)
- Chwee Tat Koe
- Neuroscience and Behavioral Disorders Program, Duke-NUS Graduate Medical School Singapore, Singapore, Singapore
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Persikov AV, Rowland EF, Oakes BL, Singh M, Noyes MB. Deep sequencing of large library selections allows computational discovery of diverse sets of zinc fingers that bind common targets. Nucleic Acids Res 2013; 42:1497-508. [PMID: 24214968 PMCID: PMC3919609 DOI: 10.1093/nar/gkt1034] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Cys2His2 zinc finger (ZF) is the most frequently found sequence-specific DNA-binding domain in eukaryotic proteins. The ZF's modular protein-DNA interface has also served as a platform for genome engineering applications. Despite decades of intense study, a predictive understanding of the DNA-binding specificities of either natural or engineered ZF domains remains elusive. To help fill this gap, we developed an integrated experimental-computational approach to enrich and recover distinct groups of ZFs that bind common targets. To showcase the power of our approach, we built several large ZF libraries and demonstrated their excellent diversity. As proof of principle, we used one of these ZF libraries to select and recover thousands of ZFs that bind several 3-nt targets of interest. We were then able to computationally cluster these recovered ZFs to reveal several distinct classes of proteins, all recovered from a single selection, to bind the same target. Finally, for each target studied, we confirmed that one or more representative ZFs yield the desired specificity. In sum, the described approach enables comprehensive large-scale selection and characterization of ZF specificities and should be a great aid in furthering our understanding of the ZF domain.
Collapse
Affiliation(s)
- Anton V Persikov
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, Department of Computer Science, Princeton University, Princeton, NJ 08544, USA and Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| | | | | | | | | |
Collapse
|
21
|
Persikov AV, Singh M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res 2013; 42:97-108. [PMID: 24097433 PMCID: PMC3874201 DOI: 10.1093/nar/gkt890] [Citation(s) in RCA: 134] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Proteins with sequence-specific DNA binding function are important for a wide range of biological activities. De novo prediction of their DNA-binding specificities from sequence alone would be a great aid in inferring cellular networks. Here we introduce a method for predicting DNA-binding specificities for Cys2His2 zinc fingers (C2H2-ZFs), the largest family of DNA-binding proteins in metazoans. We develop a general approach, based on empirical calculations of pairwise amino acid–nucleotide interaction energies, for predicting position weight matrices (PWMs) representing DNA-binding specificities for C2H2-ZF proteins. We predict DNA-binding specificities on a per-finger basis and merge predictions for C2H2-ZF domains that are arrayed within sequences. We test our approach on a diverse set of natural C2H2-ZF proteins with known binding specificities and demonstrate that for >85% of the proteins, their predicted PWMs are accurate in 50% of their nucleotide positions. For proteins with several zinc finger isoforms, we show via case studies that this level of accuracy enables us to match isoforms with their known DNA-binding specificities. A web server for predicting a PWM given a protein containing C2H2-ZF domains is available online at http://zf.princeton.edu and can be used to aid in protein engineering applications and in genome-wide searches for transcription factor targets.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ 08544, USA and Department of Computer Science, Princeton University, Princeton NJ 08544, USA
| | | |
Collapse
|
22
|
Abstract
The specificity of protein-DNA interactions is most commonly modeled using position weight matrices (PWMs). First introduced in 1982, they have been adapted to many new types of data and many different approaches have been developed to determine the parameters of the PWM. New high-throughput technologies provide a large amount of data rapidly and offer an unprecedented opportunity to determine accurately the specificities of many transcription factors (TFs). But taking full advantage of the new data requires advanced algorithms that take into account the biophysical processes involved in generating the data. The new large datasets can also aid in determining when the PWM model is inadequate and must be extended to provide accurate predictions of binding sites. This article provides a general mathematical description of a PWM and how it is used to score potential binding sites, a brief history of the approaches that have been developed and the types of data that are used with an emphasis on algorithms that we have developed for analyzing high-throughput datasets from several new technologies. It also describes extensions that can be added when the simple PWM model is inadequate and further enhancements that may be necessary. It briefly describes some applications of PWMs in the discovery and modeling of in vivo regulatory networks.
Collapse
|
23
|
Sarkar A, Kumar S, Punetha A, Grover A, Sundar D. Analysis and Prediction of DNA-Recognition by Zinc Finger Proteins. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Zinc fingers are the most abundant class of DNA-binding proteins encoded in the eukaryotic genomes. Custom-designed zinc finger proteins attached to various DNA-modifying domains can be used to achieve highly specific genome modification, which has tremendous applications in molecular therapeutics. Analysis of sequence and structure of the zinc finger proteins provides clues for understanding protein-DNA interactions and aid in custom-design of zinc finger proteins with tailor-made specificity. Computational methods for prediction of recognition helices for C2H2 zinc fingers that bind to specific target DNA sites could provide valuable insights for researchers interested in designing specific zinc finger proteins for biological and biomedical applications. In this chapter, we describe the zinc finger protein-DNA interaction patterns, challenges in engineering the recognition-specificity of zinc finger proteins, the computational methods of prediction of proteins that recognize specific target DNA sequence and their applications in molecular therapeutics.
Collapse
|
24
|
Hu ZP, Chen LS, Jia CY, Zhu HZ, Wang W, Zhong J. Screening of potential pseudo att sites of Streptomyces phage ΦC31 integrase in the human genome. Acta Pharmacol Sin 2013; 34:561-9. [PMID: 23416928 DOI: 10.1038/aps.2012.173] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
AIM ΦC31 integrase mediates site-specific recombination between two short sequences, attP and attB, in phage and bacterial genomes, which is a promising tool in gene regulation-based therapy since the zinc finger structure is probably the DNA recognizing domain that can further be engineered. The aim of this study was to screen potential pseudo att sites of ΦC31 integrase in the human genome, and evaluate the risks of its application in human gene therapy. METHODS TFBS (transcription factor binding sites) were found on the basis of reported pseudo att sites using multiple motif-finding tools, including AlignACE, BioProspector, Consensus, MEME, and Weeder. The human genome with the proposed motif was scanned to find the potential pseudo att sites of ΦC31 integrase. RESULTS The possible recognition motif of ΦC31 integrase was identified, which was composed of two co-occurrence conserved elements that were reverse complement to each other flanking the core sequence TTG. In the human genome, a total of 27924 potential pseudo att sites of ΦC31 integrase were found, which were distributed in each human chromosome with high-risk specificity values in the chromosomes 16, 17, and 19. When the risks of the sites were evaluate more rigorously, 53 hits were discovered, and some of them were just the vital functional genes or regulatory regions, such as ACYP2, AKR1B1, DUSP4, etc. CONCLUSION The results provide clues for more comprehensive evaluation of the risks of using ΦC31 integrase in human gene therapy and for drug discovery.
Collapse
|
25
|
Grau J, Wolf A, Reschke M, Bonas U, Posch S, Boch J. Computational predictions provide insights into the biology of TAL effector target sites. PLoS Comput Biol 2013; 9:e1002962. [PMID: 23526890 PMCID: PMC3597551 DOI: 10.1371/journal.pcbi.1002962] [Citation(s) in RCA: 82] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 01/14/2013] [Indexed: 11/19/2022] Open
Abstract
Transcription activator-like (TAL) effectors are injected into host plant cells by Xanthomonas bacteria to function as transcriptional activators for the benefit of the pathogen. The DNA binding domain of TAL effectors is composed of conserved amino acid repeat structures containing repeat-variable diresidues (RVDs) that determine DNA binding specificity. In this paper, we present TALgetter, a new approach for predicting TAL effector target sites based on a statistical model. In contrast to previous approaches, the parameters of TALgetter are estimated from training data computationally. We demonstrate that TALgetter successfully predicts known TAL effector target sites and often yields a greater number of predictions that are consistent with up-regulation in gene expression microarrays than an existing approach, Target Finder of the TALE-NT suite. We study the binding specificities estimated by TALgetter and approve that different RVDs are differently important for transcriptional activation. In subsequent studies, the predictions of TALgetter indicate a previously unreported positional preference of TAL effector target sites relative to the transcription start site. In addition, several TAL effectors are predicted to bind to the TATA-box, which might constitute one general mode of transcriptional activation by TAL effectors. Scrutinizing the predicted target sites of TALgetter, we propose several novel TAL effector virulence targets in rice and sweet orange. TAL-mediated induction of the candidates is supported by gene expression microarrays. Validity of these targets is also supported by functional analogy to known TAL effector targets, by an over-representation of TAL effector targets with similar function, or by a biological function related to pathogen infection. Hence, these predicted TAL effector virulence targets are promising candidates for studying the virulence function of TAL effectors. TALgetter is implemented as part of the open-source Java library Jstacs, and is freely available as a web-application and a command line program.
Collapse
Affiliation(s)
- Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany.
| | | | | | | | | | | |
Collapse
|
26
|
Christensen RG, Enuameh MS, Noyes MB, Brodsky MH, Wolfe SA, Stormo GD. Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics 2013; 28:i84-9. [PMID: 22689783 PMCID: PMC3371834 DOI: 10.1093/bioinformatics/bts202] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C2H2 zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes. Results: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model. Contact:stormo@wustl.edu
Collapse
Affiliation(s)
- Ryan G Christensen
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | | | | | | | | |
Collapse
|
27
|
Takeda T, Corona RI, Guo JT. A knowledge-based orientation potential for transcription factor-DNA docking. Bioinformatics 2012; 29:322-30. [DOI: 10.1093/bioinformatics/bts699] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
28
|
Maienschein-Cline M, Dinner AR, Hlavacek WS, Mu F. Improved predictions of transcription factor binding sites using physicochemical features of DNA. Nucleic Acids Res 2012; 40:e175. [PMID: 22923524 PMCID: PMC3526315 DOI: 10.1093/nar/gks771] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a linear support vector machine (SVM) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA. SiteSleuth appears to generally perform better than PWM-based methods. Here, we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications. New features are derived from Gibbs energies of amino acid-DNA interactions and hydroxyl radical cleavage profiles of DNA. Algorithmic modifications consist of inclusion of a feature selection step, use of a nonlinear kernel in the SVM classifier, and use of a consensus-based post-processing step for predictions. We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features. The accuracy of each of the variant methods considered was assessed by cross validation using data available in the RegulonDB database for 54 Escherichia coli TFs, as well as by experimental validation using published ChIP-chip data available for Fis and Lrp.
Collapse
|
29
|
AlQuraishi M, McAdams HH. Three enhancements to the inference of statistical protein-DNA potentials. Proteins 2012; 81:426-42. [PMID: 23042633 DOI: 10.1002/prot.24201] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Revised: 09/17/2012] [Accepted: 10/02/2012] [Indexed: 12/28/2022]
Abstract
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94305, USA
| | | |
Collapse
|
30
|
Bitar M, Drummond MG, Costa MGS, Lobo FP, Calzavara-Silva CE, Bisch PM, Machado CR, Macedo AM, Pierce RJ, Franco GR. Modeling the zing finger protein SmZF1 from Schistosoma mansoni: Insights into DNA binding and gene regulation. J Mol Graph Model 2012; 39:29-38. [PMID: 23220279 DOI: 10.1016/j.jmgm.2012.10.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Revised: 10/09/2012] [Accepted: 10/13/2012] [Indexed: 10/27/2022]
Abstract
Zinc finger proteins are widely found in eukaryotes, representing an important class of DNA-binding proteins frequently involved in transcriptional regulation. Zinc finger motifs are composed by two antiparallel β-strands and one α-helix, stabilized by a zinc ion coordinated by conserved histidine and cysteine residues. In Schistosoma mansoni, these regulatory proteins are known to modulate morphological and physiological changes, having crucial roles in parasite development. A previously described C(2)H(2) zinc finger protein, SmZF1, was shown to be present in cell nuclei of different life stages of S. mansoni and to activate gene transcription in a heterologous system. A high-quality SmZF1 tridimensional structure was generated using comparative modeling. Molecular dynamics simulations of the obtained structure revealed stability of the zinc fingers motifs and high flexibility on the terminals, comparable to the profile observed on the template X-ray structure based on thermal b-factors. Based on the protein tridimensional features and amino acid composition, we were able to characterize four C(2)H(2) zinc finger motifs, the first involved in protein-protein interactions while the three others involved in DNA binding. We defined a consensus DNA binding sequence using three distinct algorithms and further carried out docking calculations, which revealed the interaction of fingers 2-4 with the predicted DNA. A search for S. mansoni genes presenting putative SmZF1 binding sites revealed 415 genes hypothetically under SmZF1 control. Using an automatic annotation and GO assignment approach, we found that the majority of those genes code for proteins involved in developmental processes. Taken together, these results present a consistent base to the structural and functional characterization of SmZF1.
Collapse
Affiliation(s)
- Mainá Bitar
- Laboratório de Física Biológica, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Heil CSS, Noor MAF. Zinc finger binding motifs do not explain recombination rate variation within or between species of Drosophila. PLoS One 2012; 7:e45055. [PMID: 23028758 PMCID: PMC3445564 DOI: 10.1371/journal.pone.0045055] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 08/15/2012] [Indexed: 01/15/2023] Open
Abstract
In humans and mice, the Cys(2)His(2) zinc finger protein PRDM9 binds to a DNA sequence motif enriched in hotspots of recombination, possibly modifying nucleosomes, and recruiting recombination machinery to initiate Double Strand Breaks (DSBs). However, since its discovery, some researchers have suggested that the recombinational effect of PRDM9 is lineage or species specific. To test for a conserved role of PRDM9-like proteins across taxa, we use the Drosophila pseudoobscura species group in an attempt to identify recombination associated zinc finger proteins and motifs. We leveraged the conserved amino acid motifs in Cys(2)His(2) zinc fingers to predict nucleotide binding motifs for all Cys(2)His(2) zinc finger proteins in Drosophila pseudoobscura and identified associations with empirical measures of recombination rate. Additionally, we utilized recombination maps from D. pseudoobscura and D. miranda to explore whether changes in the binding motifs between species can account for changes in the recombination landscape, analogous to the effect observed in PRDM9 among human populations. We identified a handful of potential recombination-associated sequence motifs, but the associations are generally tenuous and their biological relevance remains uncertain. Furthermore, we found no evidence that changes in zinc finger DNA binding explains variation in recombination rate between species. We therefore conclude that there is no protein with a DNA sequence specific human-PRDM9-like function in Drosophila. We suggest these findings could be explained by the existence of a different recombination initiation system in Drosophila.
Collapse
Affiliation(s)
- Caiti S S Heil
- Department of Biology, Duke University, Durham, North Carolina, USA.
| | | |
Collapse
|
32
|
The KRAB zinc finger protein RSL1 regulates sex- and tissue-specific promoter methylation and dynamic hormone-responsive chromatin configuration. Mol Cell Biol 2012; 32:3732-42. [PMID: 22801370 DOI: 10.1128/mcb.00615-12] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Over 400 Krüppel-associated box zinc finger proteins (KRAB-ZFPs) are encoded in mammalian genomes. While KRAB-ZFPs strongly repress transcription in vitro, little is known about their biological function or gene targets in vivo. Regulator of sex limitation 1 (Rsl1), one of the first KRAB-Zfp genes assigned a physiological role, accentuates sex-biased liver gene expression, most dramatically for mouse sex-limited protein (Slp), which provides an in vivo reporter of KRAB-ZFP function. Slp is induced in males in the liver and kidney by growth hormone (GH) and androgen, respectively. In the liver but not kidney, the Rsl1 genotype correlates with methylation of a CpG dinucleotide in the Slp promoter that is demethylated at puberty. RSL1 binds 2 kb upstream of the Slp promoter, both in vitro and in vivo, within an enhancer containing response elements for STAT5b. Chromatin immunoprecipitation (ChIP) assays demonstrate that RSL1 recruits KAP1/TRIM28, the corepressor for KRAB action in vitro, to this enhancer. Slp induction requires rapid cycling of STAT5b in chromatin. Remarkably, RSL1 simultaneously binds adjacent to STAT5b with a reciprocal binding pattern that limits hormonal response. These experiments demonstrate a surprisingly dynamic interplay between a hormonal activator, STAT5b, and a KRAB-ZFP repressor and provide unique insights into KRAB-ZFP epigenetic mechanisms.
Collapse
|
33
|
Wu J, Hong B, Takeda T, Guo JT. High performance transcription factor-DNA docking with GPU computing. Proteome Sci 2012; 10 Suppl 1:S17. [PMID: 22759575 PMCID: PMC3380734 DOI: 10.1186/1477-5956-10-s1-s17] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.
Collapse
Affiliation(s)
- Jiadong Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, 30332, USA.
| | | | | | | |
Collapse
|
34
|
Chu SW, Noyes MB, Christensen RG, Pierce BG, Zhu LJ, Weng Z, Stormo GD, Wolfe SA. Exploring the DNA-recognition potential of homeodomains. Genome Res 2012; 22:1889-98. [PMID: 22539651 PMCID: PMC3460184 DOI: 10.1101/gr.139014.112] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The recognition potential of most families of DNA-binding domains (DBDs) remains relatively unexplored. Homeodomains (HDs), like many other families of DBDs, display limited diversity in their preferred recognition sequences. To explore the recognition potential of HDs, we utilized a bacterial selection system to isolate HD variants, from a randomized library, that are compatible with each of the 64 possible 3' triplet sites (i.e., TAANNN). The majority of these selections yielded sets of HDs with overrepresented residues at specific recognition positions, implying the selection of specific binders. The DNA-binding specificity of 151 representative HD variants was subsequently characterized, identifying HDs that preferentially recognize 44 of these target sites. Many of these variants contain novel combinations of specificity determinants that are uncommon or absent in extant HDs. These novel determinants, when grafted into different HD backbones, produce a corresponding alteration in specificity. This information was used to create more explicit HD recognition models, which can inform the prediction of transcriptional regulatory networks for extant HDs or the engineering of HDs with novel DNA-recognition potential. The diversity of recovered HD recognition sequences raises important questions about the fitness barrier that restricts the evolution of alternate recognition modalities in natural systems.
Collapse
Affiliation(s)
- Stephanie W Chu
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Hooghe B, Broos S, van Roy F, De Bleser P. A flexible integrative approach based on random forest improves prediction of transcription factor binding sites. Nucleic Acids Res 2012; 40:e106. [PMID: 22492513 PMCID: PMC3413102 DOI: 10.1093/nar/gks283] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Transcription factor binding sites (TFBSs) are DNA sequences of 6–15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.
Collapse
Affiliation(s)
- Bart Hooghe
- Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium
| | | | | | | |
Collapse
|
36
|
Benchmarks for flexible and rigid transcription factor-DNA docking. BMC STRUCTURAL BIOLOGY 2011; 11:45. [PMID: 22044637 PMCID: PMC3262759 DOI: 10.1186/1472-6807-11-45] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 11/01/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms. RESULTS We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered. CONCLUSIONS We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.
Collapse
|
37
|
Zhao X, Sze SH. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains. J Comput Biol 2011; 18:759-70. [PMID: 21554019 DOI: 10.1089/cmb.2010.0197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.
Collapse
Affiliation(s)
- Xiaoyan Zhao
- Department of Computer Science & Engineering, Texas A&M University, College Station, Texas 77843, USA
| | | |
Collapse
|
38
|
Nowick K, Fields C, Gernat T, Caetano-Anolles D, Kholina N, Stubbs L. Gain, loss and divergence in primate zinc-finger genes: a rich resource for evolution of gene regulatory differences between species. PLoS One 2011; 6:e21553. [PMID: 21738707 PMCID: PMC3126818 DOI: 10.1371/journal.pone.0021553] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 06/01/2011] [Indexed: 12/14/2022] Open
Abstract
The molecular changes underlying major phenotypic differences between humans and other primates are not well understood, but alterations in gene regulation are likely to play a major role. Here we performed a thorough evolutionary analysis of the largest family of primate transcription factors, the Krüppel-type zinc finger (KZNF) gene family. We identified and curated gene and pseudogene models for KZNFs in three primate species, chimpanzee, orangutan and rhesus macaque, to allow for a comparison with the curated set of human KZNFs. We show that the recent evolutionary history of primate KZNFs has been complex, including many lineage-specific duplications and deletions. We found 213 species-specific KZNFs, among them 7 human-specific and 23 chimpanzee-specific genes. Two human-specific genes were validated experimentally. Ten genes have been lost in humans and 13 in chimpanzees, either through deletion or pseudogenization. We also identified 30 KZNF orthologs with human-specific and 42 with chimpanzee-specific sequence changes that are predicted to affect DNA binding properties of the proteins. Eleven of these genes show signatures of accelerated evolution, suggesting positive selection between humans and chimpanzees. During primate evolution the most extensive re-shaping of the KZNF repertoire, including most gene additions, pseudogenizations, and structural changes occurred within the subfamily homininae. Using zinc finger (ZNF) binding predictions, we suggest potential impact these changes have had on human gene regulatory networks. The large species differences in this family of TFs stands in stark contrast to the overall high conservation of primate genomes and potentially represents a potent driver of primate evolution.
Collapse
Affiliation(s)
- Katja Nowick
- Institute for Genomic Biology, University of Illinois, Urbana, Illinois, United States of America
- Department of Cell and Developmental Biology, University of Illinois, Urbana, Illinois, United States of America
| | - Christopher Fields
- Institute for Genomic Biology, University of Illinois, Urbana, Illinois, United States of America
| | - Tim Gernat
- Institute for Genomic Biology, University of Illinois, Urbana, Illinois, United States of America
| | - Derek Caetano-Anolles
- Institute for Genomic Biology, University of Illinois, Urbana, Illinois, United States of America
- Department of Cell and Developmental Biology, University of Illinois, Urbana, Illinois, United States of America
| | - Nadezda Kholina
- Institute for Genomic Biology, University of Illinois, Urbana, Illinois, United States of America
- Department of Cell and Developmental Biology, University of Illinois, Urbana, Illinois, United States of America
| | - Lisa Stubbs
- Institute for Genomic Biology, University of Illinois, Urbana, Illinois, United States of America
- Department of Cell and Developmental Biology, University of Illinois, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
39
|
Lam KN, van Bakel H, Cote AG, van der Ven A, Hughes TR. Sequence specificity is obtained from the majority of modular C2H2 zinc-finger arrays. Nucleic Acids Res 2011; 39:4680-90. [PMID: 21321018 PMCID: PMC3113560 DOI: 10.1093/nar/gkq1303] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 12/02/2010] [Accepted: 12/06/2010] [Indexed: 01/31/2023] Open
Abstract
C2H2 zinc fingers (C2H2-ZFs) are the most prevalent type of vertebrate DNA-binding domain, and typically appear in tandem arrays (ZFAs), with sequential C2H2-ZFs each contacting three (or more) sequential bases. C2H2-ZFs can be assembled in a modular fashion, providing one explanation for their remarkable evolutionary success. Given a set of modules with defined three-base specificities, modular assembly also presents a way to construct artificial proteins with specific DNA-binding preferences. However, a recent survey of a large number of three-finger ZFAs engineered by modular assembly reported high failure rates (∼70%), casting doubt on the generality of modular assembly. Here, we used protein-binding microarrays to analyze 28 ZFAs that failed in the aforementioned study. Most (17) preferred specific sequences, which in all but one case resembled the intended target sequence. Like natural ZFAs, the engineered ZFAs typically yielded degenerate motifs, binding dozens to hundreds of related individual sequences. Thus, the failure of these proteins in previous assays is not due to lack of sequence-specific DNA-binding activity. Our findings underscore the relevance of individual C2H2-ZF sequence specificities within tandem arrays, and support the general ability of modular assembly to produce ZFAs with sequence-specific DNA-binding activity.
Collapse
Affiliation(s)
- Kathy N. Lam
- Department of Molecular Genetics and Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Harm van Bakel
- Department of Molecular Genetics and Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Atina G. Cote
- Department of Molecular Genetics and Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Anton van der Ven
- Department of Molecular Genetics and Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Timothy R. Hughes
- Department of Molecular Genetics and Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| |
Collapse
|
40
|
Zeng T, Li J, Liu J. Distinct interfacial biclique patterns between ssDNA-binding proteins and those with dsDNAs. Proteins 2011; 79:598-610. [PMID: 21120860 DOI: 10.1002/prot.22908] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We introduce a new motif called interfacial biclique pattern to study the difference between double-stranded DNA-binding proteins (DSBs, most of them also known to play the role as transcriptional factors) and single-stranded DNA-binding proteins (SSBs) which are found to involve in many applications recently. An interfacial biclique pattern in a protein-DNA complex usually consists of a group of residues and a group of nucleotides such that every residue has a contact to all of the bases. The proposal of this idea is based on a biological redundancy mechanism that: a site mutation has little influence on the other residues to recognize the target nucleotides and vice versa. The distribution of the residues on the interfacial motifs is investigated to identify distinct stable preferred residues, stable un-preferred residues and unstable preferred residues between SSBs and DSBs. We also examine residue co-occurrence and residue-base association rules in the interfacial motifs to uncover the different choices of residue combinations by SSBs and DSBs that have contacts with one or more bases. We found that DSBs and SSBs have their own right residues at the right places for the binding preference and association with nucleotides. Some of our results can be supported by literature work.
Collapse
Affiliation(s)
- Tao Zeng
- School of Computer, Wuhan University, Wuhan, Hubei, China 430072
| | | | | |
Collapse
|
41
|
Persikov AV, Singh M. An expanded binding model for Cys2His2 zinc finger protein-DNA interfaces. Phys Biol 2011; 8:035010. [PMID: 21572177 DOI: 10.1088/1478-3975/8/3/035010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Cys(2)His(2) zinc finger (C2H2-ZF) proteins comprise the largest class of eukaryotic transcription factors. The 'canonical model' for C2H2-ZF protein-DNA interaction consists of only four amino acid-nucleotide contacts per zinc finger domain, and this model has been the basis for several efforts for computationally predicting and experimentally designing protein-DNA interfaces. Here, we perform a systematic analysis of structural and experimental binding data and find that, in addition to the canonical contacts, several other amino acid and base pair combinations frequently play a role in C2H2-ZF protein-DNA binding. We suggest an expansion of the canonical C2H2-ZF model to include one to three additional contacts, and show that computational approaches including these additional contacts improve predictions of DNA targets of zinc finger proteins.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, NJ, USA
| | | |
Collapse
|
42
|
Yanover C, Bradley P. Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res 2011; 39:4564-76. [PMID: 21343182 PMCID: PMC3113574 DOI: 10.1093/nar/gkr048] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.
Collapse
Affiliation(s)
- Chen Yanover
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | | |
Collapse
|
43
|
Re-programming DNA-binding specificity in zinc finger proteins for targeting unique address in a genome. SYSTEMS AND SYNTHETIC BIOLOGY 2011; 4:323-9. [PMID: 22132059 DOI: 10.1007/s11693-011-9077-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Accepted: 02/03/2011] [Indexed: 12/26/2022]
Abstract
Recent studies provide a glimpse of future potential therapeutic applications of custom-designed zinc finger proteins in achieving highly specific genomic manipulation. Custom-design of zinc finger proteins with tailor-made specificity is currently limited by the availability of information on recognition helices for all possible DNA targets. However, recent advances suggest that a combination of design and selection method is best suited to identify custom zinc finger DNA-binding proteins for known genome target sites. Design of functionally self-contained zinc finger proteins can be achieved by (a) modular protein engineering and (b) computational prediction. Here, we explore the novel functionality obtained by engineered zinc finger proteins and the computational approaches for prediction of recognition helices of zinc finger proteins that can raise our ability to re-program zinc finger proteins with desired novel DNA-binding specificities.
Collapse
|
44
|
Abstract
Krüppel-type or C2H2 zinc fingers represent a dominant DNA-binding motif in eukaryotic transcription factor (TF) proteins. In Krüppel-type (KZNF) TFs, KZNF motifs are arranged in arrays of three to as many as 40 tandem units, which cooperate to define the unique DNA recognition properties of the protein. Each finger contains four amino acids located at specific positions, which are brought into direct contact with adjacent nucleotides in the DNA sequence as the KZNF array winds around the major groove of the alpha helix. This arrangement creates an intimate and potentially predictable relationship between the amino acid sequence of KZNF arrays and the nucleotide sequence of target binding sites. The large number of possible combinations and arrangements of modular KZNF motifs, and the increasing lengths of KZNF arrays in vertebrate species, has created huge repertoires of functionally unique TF proteins. The properties of this versatile DNA-binding motif have been exploited independently many times over the course of evolution, through attachment to effector motifs that confer activating, repressing or other activities to the proteins. Once created, some of these novel inventions have expanded in specific evolutionary clades, creating large families of TFs that are lineage- or species-unique. This chapter reviews the properties and their remarkable evolutionary history of eukaryotic KZNF TF proteins, with special focus on large families that dominate the TF landscapes in different metazoan species.
Collapse
Affiliation(s)
- Lisa Stubbs
- Department of Cell and Developmental Biology, Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA,
| | | | | |
Collapse
|
45
|
Jaimovich A, Friedman N. From large-scale assays to mechanistic insights: computational analysis of interactions. Curr Opin Biotechnol 2010; 22:87-93. [PMID: 21109421 DOI: 10.1016/j.copbio.2010.10.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 10/27/2010] [Indexed: 01/17/2023]
Abstract
The activity in the living cell is carried out by a myriad network of interactions between macromolecules. These include interactions between proteins that form a functional complex, a protein modifying another protein in a transient interaction, a transcription factor that binds a specific DNA locus triggering a change in chromatin or transcription, and so on. Characterization of these interactions in terms of timing, context, and function is crucial for understanding how cells carry out basic biological processes. The recent years have led to the introduction of many assays for probing these interactions in a systematic and large-scale manner. However, there is a large gap between assay results and understanding of biological systems. The challenge for computational methods is to bridge this gap by combining results of different assays and introducing statistical methodologies. In this review we discuss recent advances in approaches dealing with these challenges, and key directions for the future.
Collapse
Affiliation(s)
- Ariel Jaimovich
- School of Computer Science & Engineering, Hebrew University of Jerusalem, Jerusalem, Israel
| | | |
Collapse
|
46
|
Bauer AL, Hlavacek WS, Unkefer PJ, Mu F. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites. PLoS Comput Biol 2010; 6:e1001007. [PMID: 21124945 PMCID: PMC2987836 DOI: 10.1371/journal.pcbi.1001007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Accepted: 10/21/2010] [Indexed: 11/19/2022] Open
Abstract
An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.
Collapse
Affiliation(s)
- Amy L. Bauer
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - William S. Hlavacek
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Pat J. Unkefer
- National Stable Isotope Resource, Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Fangping Mu
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| |
Collapse
|
47
|
Anand A, Pugalenthi G, Fogel GB, Suganthan P. Identification and analysis of transcription factor family-specific features derived from DNA and protein information. Pattern Recognit Lett 2010. [DOI: 10.1016/j.patrec.2009.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
48
|
Cai Y, He Z, Shi X, Kong X, Gu L, Xie L. A novel sequence-based method of predicting protein DNA-binding residues, using a machine learning approach. Mol Cells 2010; 30:99-105. [PMID: 20706794 DOI: 10.1007/s10059-010-0093-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 04/06/2010] [Accepted: 04/22/2010] [Indexed: 11/29/2022] Open
Abstract
Protein-DNA interactions play an essential role in transcriptional regulation, DNA repair, and many vital biological processes. The mechanism of protein-DNA binding, however, remains unclear. For the study of many diseases, researchers must improve their understanding of the amino acid motifs that recognize DNA. Because identifying these motifs experimentally is expensive and time-consuming, it is necessary to devise an approach for computational prediction. Some in silico methods have been developed, but there are still considerable limitations. In this study, we used a machine learning approach to develop a new sequence-based method of predicting protein-DNA binding residues. To make these predictions, we used the properties of the micro-environment of each amino acid from the AAIndex as well as conservation scores. Testing by the cross-validation method, we obtained an overall accuracy of 94.89%. Our method shows that the amino acid micro-environment is important for DNA binding, and that it is possible to identify the protein-DNA binding sites with it.
Collapse
Affiliation(s)
- Yudong Cai
- Institute of System Biology, Shanghai University, Shanghai, 200244, People's Republic of China.
| | | | | | | | | | | |
Collapse
|
49
|
Lee MY, Lu A, Gudas LJ. Transcriptional regulation of Rex1 (zfp42) in normal prostate epithelial cells and prostate cancer cells. J Cell Physiol 2010; 224:17-27. [PMID: 20232320 DOI: 10.1002/jcp.22071] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Rex1 (zfp42) was identified by our laboratory because of its reduced expression in F9 teratocarcinoma stem cells after retinoic acid (RA) treatment. The Rex1 (Zfp42) gene is currently widely used as a marker of embryonic stem cells. We compared the transcriptional regulation of the human Rex1 gene in NTera-2 (NT-2) human teratocarcinoma, normal human prostate epithelial cells (PrEC), and prostate cancer cells (PC-3) by promoter/luciferase analyses. Oct4, Sox2, Nanog, and Dax1 transcripts are expressed at higher levels in NT-2 and PrEC cells than in PC-3 cells. Co-transfection analyses showed that YY1 and Rex1 are positive regulators of hRex1 transcription in NT-2 and PrEC cells, whereas Nanog is not. Serial deletion constructs of the hRex1 promoter were created and analyzed, by which we identified a potential negative regulatory site that is located between -1 and -0.4 kb of the hRex1 promoter. We also delineated regions of the hRex1 promoter between -0.4 kb and the TSS that, when mutated, reduced transcriptional activation; these are putative Rex1 binding sites. Mutation of a putative Rex1 binding site in electrophoretic mobility shift assays (EMSA) resulted in reduced protein binding. Taken together, our results indicate that hRex1 binds to the hRex1 promoter region at -298 bp and positively regulates hRex1 transcription, but that this regulation is lost in PC-3 human prostate cancer cells. This lack of positive transcriptional regulation by the hRex1 protein may be responsible for the lack of Rex1 expression in PC-3 prostate cancer cells.
Collapse
Affiliation(s)
- Mi-Young Lee
- Department of Pharmacology, Weill Cornell Medical College of Cornell University, New York, NY 10065, USA
| | | | | |
Collapse
|
50
|
Levy R, Edelman M, Sobolev V. Prediction of 3D metal binding sites from translated gene sequences based on remote-homology templates. Proteins 2010; 76:365-74. [PMID: 19173310 DOI: 10.1002/prot.22352] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Database-scale analysis was performed to determine whether structural models, based on remote homologues, are effective in predicting 3D transition metal binding sites in proteins directly from translated gene sequences. The extent by which side chain modeling alone reduces sensitivity and selectivity is shown to be <10%. Surprisingly, selectivity was not dependent on the level of sequence homology between template and target, or on the presence of a metal ion in the structural template. Applying a modification of the CHED algorithm (Babor et al., Proteins 2008;70:208-217) and machine learning filters, a selectivity of approximately 90% was achieved for protein sequences using unrelated structural templates over a sequence identity range of 18-100%. Below approximately 18% identity, the number of analyzable target-template pairs and predictability of metal binding sites falls off sharply. A full third of structural templates were found to have target partners only in the remote homology range of 18-30%. In this range, nonmetal-binding templates are calculated to be the majority and serve to predict with 50% sensitivity at the geometric level. Overall, sensitivity at the geometric level for targets having templates in the 18-30% sequence identity range is 73%, with an average of one false positive site per true site. Protein sequences described as "unknown" in the UniProt database and composed largely of unidentified genome project sequences were studied and metal binding sites predicted. A web server for prediction of metal binding sites from protein sequence is provided.
Collapse
Affiliation(s)
- Ronen Levy
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|