1
|
Persikov AV, Singh M. An expanded binding model for Cys2His2 zinc finger protein-DNA interfaces. Phys Biol 2011; 8:035010. [PMID: 21572177 DOI: 10.1088/1478-3975/8/3/035010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Cys(2)His(2) zinc finger (C2H2-ZF) proteins comprise the largest class of eukaryotic transcription factors. The 'canonical model' for C2H2-ZF protein-DNA interaction consists of only four amino acid-nucleotide contacts per zinc finger domain, and this model has been the basis for several efforts for computationally predicting and experimentally designing protein-DNA interfaces. Here, we perform a systematic analysis of structural and experimental binding data and find that, in addition to the canonical contacts, several other amino acid and base pair combinations frequently play a role in C2H2-ZF protein-DNA binding. We suggest an expansion of the canonical C2H2-ZF model to include one to three additional contacts, and show that computational approaches including these additional contacts improve predictions of DNA targets of zinc finger proteins.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, NJ, USA
| | | |
Collapse
|
2
|
Yanover C, Bradley P. Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res 2011; 39:4564-76. [PMID: 21343182 PMCID: PMC3113574 DOI: 10.1093/nar/gkr048] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Sequence-specific DNA recognition by gene regulatory proteins is critical for proper cellular functioning. The ability to predict the DNA binding preferences of these regulatory proteins from their amino acid sequence would greatly aid in reconstruction of their regulatory interactions. Structural modeling provides one route to such predictions: by building accurate molecular models of regulatory proteins in complex with candidate binding sites, and estimating their relative binding affinities for these sites using a suitable potential function, it should be possible to construct DNA binding profiles. Here, we present a novel molecular modeling protocol for protein-DNA interfaces that borrows conformational sampling techniques from de novo protein structure prediction to generate a diverse ensemble of structural models from small fragments of related and unrelated protein-DNA complexes. The extensive conformational sampling is coupled with sequence space exploration so that binding preferences for the target protein can be inferred from the resulting optimized DNA sequences. We apply the algorithm to predict binding profiles for a benchmark set of eleven C2H2 zinc finger transcription factors, five of known and six of unknown structure. The predicted profiles are in good agreement with experimental binding data; furthermore, examination of the modeled structures gives insight into observed binding preferences.
Collapse
Affiliation(s)
- Chen Yanover
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | | |
Collapse
|
3
|
|
4
|
Alibés A, Nadra AD, De Masi F, Bulyk ML, Serrano L, Stricher F. Using protein design algorithms to understand the molecular basis of disease caused by protein-DNA interactions: the Pax6 example. Nucleic Acids Res 2010; 38:7422-31. [PMID: 20685816 PMCID: PMC2995082 DOI: 10.1093/nar/gkq683] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Quite often a single or a combination of protein mutations is linked to specific diseases. However, distinguishing from sequence information which mutations have real effects in the protein’s function is not trivial. Protein design tools are commonly used to explain mutations that affect protein stability, or protein–protein interaction, but not for mutations that could affect protein–DNA binding. Here, we used the protein design algorithm FoldX to model all known missense mutations in the paired box domain of Pax6, a highly conserved transcription factor involved in eye development and in several diseases such as aniridia. The validity of FoldX to deal with protein–DNA interactions was demonstrated by showing that high levels of accuracy can be achieved for mutations affecting these interactions. Also we showed that protein-design algorithms can accurately reproduce experimental DNA-binding logos. We conclude that 88% of the Pax6 mutations can be linked to changes in intrinsic stability (77%) and/or to its capabilities to bind DNA (30%). Our study emphasizes the importance of structure-based analysis to understand the molecular basis of diseases and shows that protein–DNA interactions can be analyzed to the same level of accuracy as protein stability, or protein–protein interactions.
Collapse
Affiliation(s)
- Andreu Alibés
- EMBL/CRG Systems Biology Research Unit, Center for Genomic Regulation, UPF, Barcelona, Spain.
| | | | | | | | | | | |
Collapse
|
5
|
Abstract
Structure-based DNA-binding prediction is a powerful tool to infer protein-binding sites and design new specificities. It can limit experiments in scope and help focus them toward candidates with higher chances of success. The zinc finger domain is an excellent scaffold for design due to its small and robust fold and relatively simple interaction pattern. It presents some degree of modularity, and modeling can be used to guide experiments and help increase zinc finger module libraries. In this chapter we present a fast and simple but still powerful method for predicting and designing DNA-binding specificities applied to C(2)H(2) zinc finger proteins, based on FoldX, a semiautomatic protein design tool. Given a template structure, this method generates candidate mutants for a given target DNA sequence selected by energetic criteria.
Collapse
|
6
|
Xu B, Yang Y, Liang H, Zhou Y. An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins 2009; 76:718-30. [PMID: 19274740 DOI: 10.1002/prot.22384] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
How to make an accurate representation of protein-DNA interaction by an energy function is a long-standing unsolved problem in structural biology. Here, we modified a statistical potential based on the distance-scaled, finite ideal-gas reference state so that it is optimized for protein-DNA interactions. The changes include a volume-fraction correction to account for unmixable atom types in proteins and DNA in addition to the usage of a low-count correction, residue/base-specific atom types, and a shorter cutoff distance for protein-DNA interactions. The new statistical energy functions are tested in threading and docking decoy discriminations and prediction of protein-DNA binding affinities and transcription-factor binding profiles. The results indicate that new proposed energy functions are among the best in existing energy functions for protein-DNA interactions. The new energy functions are available as a web-server called DDNA 2.0 at http://sparks.informatics.iupui.edu. The server version was trained by the entire 212 protein-DNA complexes.
Collapse
Affiliation(s)
- Beisi Xu
- Department of Polymer Science and Engineering, University of Science and Technology of China, Hefei, Anhui, China
| | | | | | | |
Collapse
|
7
|
Persikov AV, Osada R, Singh M. Predicting DNA recognition by Cys2His2 zinc finger proteins. ACTA ACUST UNITED AC 2008; 25:22-9. [PMID: 19008249 DOI: 10.1093/bioinformatics/btn580] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain. RESULTS We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions. AVAILABILITY An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|
8
|
Jamal Rahi S, Virnau P, Mirny LA, Kardar M. Predicting transcription factor specificity with all-atom models. Nucleic Acids Res 2008; 36:6209-17. [PMID: 18829719 PMCID: PMC2577325 DOI: 10.1093/nar/gkn589] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The binding of a transcription factor (TF) to a DNA operator site can initiate or repress the expression of a gene. Computational prediction of sites recognized by a TF has traditionally relied upon knowledge of several cognate sites, rather than an ab initio approach. Here, we examine the possibility of using structure-based energy calculations that require no knowledge of bound sites but rather start with the structure of a protein–DNA complex. We study the PurR Escherichia coli TF, and explore to which extent atomistic models of protein–DNA complexes can be used to distinguish between cognate and noncognate DNA sites. Particular emphasis is placed on systematic evaluation of this approach by comparing its performance with bioinformatic methods, by testing it against random decoys and sites of homologous TFs. We also examine a set of experimental mutations in both DNA and the protein. Using our explicit estimates of energy, we show that the specificity for PurR is dominated by direct protein–DNA interactions, and weakly influenced by bending of DNA.
Collapse
Affiliation(s)
- Sahand Jamal Rahi
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Peter Virnau
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- *To whom correspondence should be addressed. Tel: +49 6131 392 3646; Fax: +49 6131 392 5441;
| | - Leonid A. Mirny
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Mehran Kardar
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| |
Collapse
|
9
|
Cysewski P. A post-SCF complete basis set study on the recognition patterns of uracil and cytosine by aromatic and π–aromatic stacking interactions with amino acid residues. Phys Chem Chem Phys 2008; 10:2636-45. [DOI: 10.1039/b718394a] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
10
|
Bussemaker HJ, Foat BC, Ward LD. Predictive modeling of genome-wide mRNA expression: from modules to molecules. ACTA ACUST UNITED AC 2007; 36:329-47. [PMID: 17311525 DOI: 10.1146/annurev.biophys.36.040306.132725] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Various algorithms are available for predicting mRNA expression and modeling gene regulatory processes. They differ in whether they rely on the existence of modules of coregulated genes or build a model that applies to all genes, whether they represent regulatory activities as hidden variables or as mRNA levels, and whether they implicitly or explicitly model the complex cis-regulatory logic of multiple interacting transcription factors binding the same DNA. The fact that functional genomics data of different types reflect the same molecular processes provides a natural strategy for integrative computational analysis. One promising avenue toward an accurate and comprehensive model of gene regulation combines biophysical modeling of the interactions among proteins, DNA, and RNA with the use of large-scale functional genomics data to estimate regulatory network connectivity and activity parameters. As the ability of these models to represent complex cis-regulatory logic increases, the need for approaches based on cross-species conservation may diminish.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.
| | | | | |
Collapse
|
11
|
Abstract
Protein–DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful comparison of how well the different methods are able to predict biologically important quantities such as the correct DNA binding sequence, total binding free energy and free energy changes caused by DNA mutation. In addition to carrying out the comparison, we present two important theoretical models developed initially in protein folding that have not yet been tried on protein–DNA interactions. In the process, we find that the results of these knowledge-based potentials show a strong dependence on the interaction distance and the derivation method. Finally, we present a knowledge-based potential that gives comparable or superior results to the best of the other methods, including the molecular mechanics force field AMBER99.
Collapse
Affiliation(s)
- Jason E Donald
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford St. Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
12
|
Becker NB, Wolff L, Everaers R. Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials. Nucleic Acids Res 2006; 34:5638-49. [PMID: 17038333 PMCID: PMC1636474 DOI: 10.1093/nar/gkl683] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2006] [Revised: 09/05/2006] [Accepted: 09/06/2006] [Indexed: 01/23/2023] Open
Abstract
Essential biological processes require that proteins bind to a set of specific DNA sites with tuned relative affinities. We focus on the indirect readout mechanism and discuss its theoretical description in relation to the present understanding of DNA elasticity on the rigid base pair level. Combining existing parametrizations of elastic potentials for DNA, we derive elastic free energies directly related to competitive binding experiments, and propose a computationally inexpensive local marker for elastically optimized subsequences in protein-DNA co-crystals. We test our approach in an application to the bacteriophage 434 repressor. In agreement with known results we find that indirect readout dominates at the central, non-contacted bases of the binding site. Elastic optimization involves all deformation modes and is mainly due to the adapted equilibrium structure of the operator, while sequence-dependent elasticity plays a minor role. These qualitative observations are robust with respect to current parametrization uncertainties. Predictions for relative affinities mediated by indirect readout depend sensitively on the chosen parametrization. Their quantitative comparison with experimental data allows for a critical evaluation of DNA elastic potentials and of the correspondence between crystal and solution structures. The software written for the presented analysis is included as Supplementary Data.
Collapse
Affiliation(s)
- Nils B Becker
- Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Strasse 38, 01187 Dresden, Germany.
| | | | | |
Collapse
|