1
|
Li J, Chiu TP, Rohs R. Predicting DNA structure using a deep learning method. Nat Commun 2024; 15:1243. [PMID: 38336958 PMCID: PMC10858265 DOI: 10.1038/s41467-024-45191-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 01/17/2024] [Indexed: 02/12/2024] Open
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA structure, also described as DNA shape, plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k-mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, DNA structural features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing an understanding of the effects of flanking regions on DNA structure in a target region of a sequence. The Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as versatile and powerful tool for diverse DNA structure-related studies.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Tsu-Pei Chiu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Chemistry, University of Southern California, Los Angeles, CA, 90089, USA.
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA, 90089, USA.
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
2
|
Li J, Chiu TP, Rohs R. Deep DNAshape: Predicting DNA shape considering extended flanking regions using a deep learning method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563383. [PMID: 37961633 PMCID: PMC10634709 DOI: 10.1101/2023.10.22.563383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Understanding the mechanisms of protein-DNA binding is critical in comprehending gene regulation. Three-dimensional DNA shape plays a key role in these mechanisms. In this study, we present a deep learning-based method, Deep DNAshape, that fundamentally changes the current k -mer based high-throughput prediction of DNA shape features by accurately accounting for the influence of extended flanking regions, without the need for extensive molecular simulations or structural biology experiments. By using the Deep DNAshape method, refined DNA shape features can be predicted for any length and number of DNA sequences in a high-throughput manner, providing a deeper understanding of the effects of flanking regions on DNA shape in a target region of a sequence. Deep DNAshape method provides access to the influence of distant flanking regions on a region of interest. Our findings reveal that DNA shape readout mechanisms of a core target are quantitatively affected by flanking regions, including extended flanking regions, providing valuable insights into the detailed structural readout mechanisms of protein-DNA binding. Furthermore, when incorporated in machine learning models, the features generated by Deep DNAshape improve the model prediction accuracy. Collectively, Deep DNAshape can serve as a versatile and powerful tool for diverse DNA structure-related studies.
Collapse
|
3
|
Pal A, Levy Y. Balance between asymmetry and abundance in multi-domain DNA-binding proteins may regulate the kinetics of their binding to DNA. PLoS Comput Biol 2020; 16:e1007867. [PMID: 32453726 PMCID: PMC7274453 DOI: 10.1371/journal.pcbi.1007867] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 06/05/2020] [Accepted: 04/11/2020] [Indexed: 11/19/2022] Open
Abstract
DNA sequences are often recognized by multi-domain proteins that may have higher affinity and specificity than single-domain proteins. However, the higher affinity to DNA might be coupled with slower recognition kinetics. In this study, we address this balance between stability and kinetics for multi-domain Cys2His2- (C2H2-) type zinc-finger (ZF) proteins. These proteins are the most prevalent DNA-binding domain in eukaryotes and C2H2 type zinc-finger proteins (C2H2-ZFPs) constitute nearly one-half of all known and predicted transcription factors in human. Extensive contact with DNA via tandem ZF domains confers high stability on the sequence-specific complexes. However, this can limit target search efficiency, especially for low abundance ZFPs. Earlier, we found that asymmetrical distribution of electrostatic charge among the three ZF domains of the low abundance transcription factor Egr-1 facilitates its DNA search process. Here, on a diverse set of 273 human C2H2-ZFP comprised of 3–15 tandem ZF domains, we find that, in many cases, electrostatic charge and binding specificity are asymmetrically distributed among the ZF domains so that neighbouring domains have different DNA-binding properties. For proteins containing 3–6 ZF domains, we show that the low abundance proteins possess a higher degree of non-specific asymmetry and vice versa. Our findings suggest that where the electrostatics of tandem ZF domains are similar (i.e., symmetrical), the ZFPs are more abundant to optimize their DNA search efficiency. This study reveals new insights into the fundamental determinants of recognition by C2H2-ZFPs of their DNA binding sites in the cellular landscape. The importance of electrostatic asymmetry with respect to binding site recognition by C2H2-ZFPs suggests the possibility that it may also be important in other ZFP systems and reveals a new design feature for zinc finger engineering. Optimal recognition of proteins to DNA is governed by various factors among them the thermodynamics, kinetics and specificity of the protein-DNA complex. Multi-domain DNA-binding proteins are expected to have higher affinity and specificity due to the extensive interface they form with DNA. However, larger interface may result with higher friction when these proteins scan the DNA for the target site via the sliding mechanism. A way to overcome this drawback is to have asymmetry in the protein so that the interface with DNA is smaller. Alternatively, higher abundance can also increase the search speed. Here, using computational analysis of large data set of multi-domain zinc finger DNA-binding proteins, we report a trade-off between asymmetry and abundance.
Collapse
Affiliation(s)
- Arumay Pal
- Department of Structural Biology, Weizmann Institute of Science Rehovot, Israel
| | - Yaakov Levy
- Department of Structural Biology, Weizmann Institute of Science Rehovot, Israel
- * E-mail:
| |
Collapse
|
4
|
Bozkurt E, Perez MAS, Hovius R, Browning NJ, Rothlisberger U. Genetic Algorithm Based Design and Experimental Characterization of a Highly Thermostable Metalloprotein. J Am Chem Soc 2018; 140:4517-4521. [DOI: 10.1021/jacs.7b10660] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Esra Bozkurt
- Laboratory of Computational Chemistry and Biochemistry, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Marta A. S. Perez
- Laboratory of Computational Chemistry and Biochemistry, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Ruud Hovius
- Laboratory of Protein Engineering, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Nicholas J. Browning
- Laboratory of Computational Chemistry and Biochemistry, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
5
|
Ortiz-Lombardia M, Foos N, Maurel-Zaffran C, Saurin AJ, Graba Y. Hox functional diversity: Novel insights from flexible motif folding and plastic protein interaction. Bioessays 2017; 39. [PMID: 28092121 DOI: 10.1002/bies.201600246] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
How the formidable diversity of forms emerges from developmental and evolutionary processes is one of the most fascinating questions in biology. The homeodomain-containing Hox proteins were recognized early on as major actors in diversifying animal body plans. The molecular mechanisms underlying how this transcription factor family controls a large array of context- and cell-specific biological functions is, however, still poorly understood. Clues to functional diversity have emerged from studies exploring how Hox protein activity is controlled through interactions with PBC class proteins, also evolutionary conserved HD-containing proteins. Recent structural data and molecular dynamic simulations add further mechanistic insights into Hox protein mode of action, suggesting that flexible folding of protein motifs allows for plastic protein interaction. As we discuss in this review, these findings define a novel type of Hox-PBC interaction, weak and dynamic instead of strong and static, hence providing novel clues to understanding Hox transcriptional specificity and diversity.
Collapse
Affiliation(s)
- Miguel Ortiz-Lombardia
- Aix-Marseille-Université, CNRS UMR 7257, AFMB, Marseille, France.,Aix-Marseille-Université, CNRS UMR 7256, AFMB, Marseille, France
| | - Nicolas Foos
- Aix-Marseille-Université, CNRS UMR 7257, AFMB, Marseille, France
| | | | - Andrew J Saurin
- Aix-Marseille-Université, CNRS UMR 7288, case 907, IBDM, Marseille, France
| | - Yacine Graba
- Aix-Marseille-Université, CNRS UMR 7288, case 907, IBDM, Marseille, France
| |
Collapse
|
6
|
Corona RI, Guo JT. Statistical analysis of structural determinants for protein-DNA-binding specificity. Proteins 2016; 84:1147-61. [PMID: 27147539 DOI: 10.1002/prot.25061] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 04/21/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
DNA-binding proteins play critical roles in biological processes including gene expression, DNA packaging and DNA repair. They bind to DNA target sequences with different degrees of binding specificity, ranging from highly specific (HS) to nonspecific (NS). Alterations of DNA-binding specificity, due to either genetic variation or somatic mutations, can lead to various diseases. In this study, a comparative analysis of protein-DNA complex structures was carried out to investigate the structural features that contribute to binding specificity. Protein-DNA complexes were grouped into three general classes based on degrees of binding specificity: HS, multispecific (MS), and NS. Our results show a clear trend of structural features among the three classes, including amino acid binding propensities, simple and complex hydrogen bonds, major/minor groove and base contacts, and DNA shape. We found that aspartate is enriched in HS DNA binding proteins and predominately binds to a cytosine through a single hydrogen bond or two consecutive cytosines through bidentate hydrogen bonds. Aromatic residues, histidine and tyrosine, are highly enriched in the HS and MS groups and may contribute to specific binding through different mechanisms. To further investigate the role of protein flexibility in specific protein-DNA recognition, we analyzed the conformational changes between the bound and unbound states of DNA-binding proteins and structural variations. The results indicate that HS and MS DNA-binding domains have larger conformational changes upon DNA-binding and larger degree of flexibility in both bound and unbound states. Proteins 2016; 84:1147-1161. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Rosario I Corona
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, North Carolina, 28223
| |
Collapse
|
7
|
Feinauer CJ, Hofmann A, Goldt S, Liu L, Máté G, Heermann DW. Zinc finger proteins and the 3D organization of chromosomes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 90:67-117. [PMID: 23582202 DOI: 10.1016/b978-0-12-410523-2.00003-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Zinc finger domains are one of the most common structural motifs in eukaryotic cells, which employ the motif in some of their most important proteins (including TFIIIA, CTCF, and ZiF268). These DNA binding proteins contain up to 37 zinc finger domains connected by flexible linker regions. They have been shown to be important organizers of the 3D structure of chromosomes and as such are called the master weaver of the genome. Using NMR and numerical simulations, much progress has been made during the past few decades in understanding their various functions and their ways of binding to the DNA, but a large knowledge gap remains to be filled. One problem of the hitherto existing theoretical models of zinc finger protein DNA binding in this context is that they are aimed at describing specific binding. Furthermore, they exclusively focus on the microscopic details or approach the problem without considering such details at all. We present the Flexible Linker Model, which aims explicitly at describing nonspecific binding. It takes into account the most important effects of flexible linkers and allows a qualitative investigation of the effects of these linkers on the nonspecific binding affinity of zinc finger proteins to DNA. Our results indicate that the binding affinity is increased by the flexible linkers by several orders of magnitude. Moreover, they show that the binding map for proteins with more than one domain presents interesting structures, which have been neither observed nor described before, and can be interpreted to fit very well with existing theories of facilitated target location. The effect of the increased binding affinity is also in agreement with recent experiments that until now have lacked an explanation. We further explore the class of proteins with flexible linkers, which are unstructured until they bind. We have developed a methodology to characterize these flexible proteins. Employing the concept of barcodes, we propose a measure to compare such flexible proteins in terms of a similarity measure. This measure is validated by a comparison between a geometric similarity measure and the topological similarity measure that takes geometry as well as topology into account.
Collapse
Affiliation(s)
- Christoph J Feinauer
- Institute for Theoretical Physics, Heidelberg University, Philosophenweg, Heidelberg, Germany
| | | | | | | | | | | |
Collapse
|
8
|
Beierlein FR, Kneale GG, Clark T. Predicting the effects of basepair mutations in DNA-protein complexes by thermodynamic integration. Biophys J 2011; 101:1130-8. [PMID: 21889450 DOI: 10.1016/j.bpj.2011.07.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Revised: 06/28/2011] [Accepted: 07/05/2011] [Indexed: 10/17/2022] Open
Abstract
Thermodynamically rigorous free energy methods in principle allow the exact computation of binding free energies in biological systems. Here, we use thermodynamic integration together with molecular dynamics simulations of a DNA-protein complex to compute relative binding free energies of a series of mutants of a protein-binding DNA operator sequence. A guanine-cytosine basepair that interacts strongly with the DNA-binding protein is mutated into adenine-thymine, cytosine-guanine, and thymine-adenine. It is shown that basepair mutations can be performed using a conservative protocol that gives error estimates of ∼10% of the change in free energy of binding. Despite the high CPU-time requirements, this work opens the exciting opportunity of being able to perform basepair scans to investigate protein-DNA binding specificity in great detail computationally.
Collapse
Affiliation(s)
- Frank R Beierlein
- Computer-Chemie-Centrum, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | | | | |
Collapse
|
9
|
Luo Y, Zhang M, Zhang J, Zhang J, Chen C, Chen YE, Xiong JW, Zhu X. Platelet-derived growth factor induces Rad expression through Egr-1 in vascular smooth muscle cells. PLoS One 2011; 6:e19408. [PMID: 21559360 PMCID: PMC3084842 DOI: 10.1371/journal.pone.0019408] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 04/05/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Ras associated with diabetes (Rad) inhibits vascular lesion formation by reducing the attachment and migration of vascular smooth muscle cells (VSMCs). However, the transcriptional regulation of Rad in VSMCs is unclear. METHODOLOGY AND PRINCIPAL FINDINGS We found that Platelet-Derived Growth Factor (PDGF)induced Rad expression in a time- and dose-dependent manner in rat aortic smooth muscle cells (RASMCs) using quantitative real-time PCR. By serial deletion analysis of the Rad promoter, we identified that two GC-rich early growth response-1 (Egr-1) binding sites are essential for PDGF-induced Rad promoter activation. Overexpression of Egr-1 in RASMCs strongly stimulated Rad expression while the Egr-1 corepressor, NGFI-A binding protein 2 (NAB2), repressed PDGF-induced Rad up-regulation in a dose-dependent manner. Direct binding of Egr-1 to the Rad promoter region was further confirmed by chromatin immunoprecipitation assays. CONCLUSIONS Our results demonstrate that Rad is regulated by PDGF through the transcriptional factor Egr-1 in RASMCs.
Collapse
Affiliation(s)
- Yan Luo
- The Institute of Molecular Medicine, Peking University, Beijing, China
| | - Meiling Zhang
- The Institute of Molecular Medicine, Peking University, Beijing, China
| | - Ji Zhang
- The Institute of Molecular Medicine, Peking University, Beijing, China
| | - Jifeng Zhang
- The Cardiovascular Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Chunlei Chen
- The Institute of Molecular Medicine, Peking University, Beijing, China
| | - Y. Eugene Chen
- The Cardiovascular Center, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jing-Wei Xiong
- The Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xiaojun Zhu
- The Institute of Molecular Medicine, Peking University, Beijing, China
- * E-mail:
| |
Collapse
|
10
|
Carvalho AM, Oliveira AL. GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge. Algorithms Mol Biol 2011; 6:13. [PMID: 21513505 PMCID: PMC3112114 DOI: 10.1186/1748-7188-6-13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 04/22/2011] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied. RESULTS We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote. CONCLUSIONS The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.
Collapse
Affiliation(s)
- Alexandra M Carvalho
- Department of Electrical Engineering, IST/TULisbon, KDBIO/INESC-ID, Lisboa, Portugal
| | - Arlindo L Oliveira
- Department of Computer Science and Engineering, IST/TULisbon, KDBIO/INESC-ID, Lisboa, Portugal
| |
Collapse
|
11
|
De Masi F, Grove CA, Vedenko A, Alibés A, Gisselbrecht SS, Serrano L, Bulyk ML, Walhout AJM. Using a structural and logics systems approach to infer bHLH-DNA binding specificity determinants. Nucleic Acids Res 2011; 39:4553-63. [PMID: 21335608 PMCID: PMC3113581 DOI: 10.1093/nar/gkr070] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Numerous efforts are underway to determine gene regulatory networks that describe physical relationships between transcription factors (TFs) and their target DNA sequences. Members of paralogous TF families typically recognize similar DNA sequences. Knowledge of the molecular determinants of protein–DNA recognition by paralogous TFs is of central importance for understanding how small differences in DNA specificities can dictate target gene selection. Previously, we determined the in vitro DNA binding specificities of 19 Caenorhabditis elegans basic helix-loop-helix (bHLH) dimers using protein binding microarrays. These TFs bind E-box (CANNTG) and E-box-like sequences. Here, we combine these data with logics, bHLH–DNA co-crystal structures and computational modeling to infer which bHLH monomer can interact with which CAN E-box half-site and we identify a critical residue in the protein that dictates this specificity. Validation experiments using mutant bHLH proteins provide support for our inferences. Our study provides insights into the mechanisms of DNA recognition by bHLH dimers as well as a blueprint for system-level studies of the DNA binding determinants of other TF families in different model organisms and humans.
Collapse
Affiliation(s)
- Federico De Masi
- Department of Medicine, Division of Genetics, Brigham & Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
12
|
|
13
|
Alibés A, Nadra AD, De Masi F, Bulyk ML, Serrano L, Stricher F. Using protein design algorithms to understand the molecular basis of disease caused by protein-DNA interactions: the Pax6 example. Nucleic Acids Res 2010; 38:7422-31. [PMID: 20685816 PMCID: PMC2995082 DOI: 10.1093/nar/gkq683] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Quite often a single or a combination of protein mutations is linked to specific diseases. However, distinguishing from sequence information which mutations have real effects in the protein’s function is not trivial. Protein design tools are commonly used to explain mutations that affect protein stability, or protein–protein interaction, but not for mutations that could affect protein–DNA binding. Here, we used the protein design algorithm FoldX to model all known missense mutations in the paired box domain of Pax6, a highly conserved transcription factor involved in eye development and in several diseases such as aniridia. The validity of FoldX to deal with protein–DNA interactions was demonstrated by showing that high levels of accuracy can be achieved for mutations affecting these interactions. Also we showed that protein-design algorithms can accurately reproduce experimental DNA-binding logos. We conclude that 88% of the Pax6 mutations can be linked to changes in intrinsic stability (77%) and/or to its capabilities to bind DNA (30%). Our study emphasizes the importance of structure-based analysis to understand the molecular basis of diseases and shows that protein–DNA interactions can be analyzed to the same level of accuracy as protein stability, or protein–protein interactions.
Collapse
Affiliation(s)
- Andreu Alibés
- EMBL/CRG Systems Biology Research Unit, Center for Genomic Regulation, UPF, Barcelona, Spain.
| | | | | | | | | | | |
Collapse
|
14
|
An effective approach for generating a three-Cys2His2 zinc-finger-DNA complex model by docking. BMC Bioinformatics 2010; 11:334. [PMID: 20565873 PMCID: PMC2905368 DOI: 10.1186/1471-2105-11-334] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Accepted: 06/18/2010] [Indexed: 11/15/2022] Open
Abstract
Background Determination of protein-DNA complex structures with both NMR and X-ray crystallography remains challenging in many cases. High Ambiguity-Driven DOCKing (HADDOCK) is an information-driven docking program that has been used to successfully model many protein-DNA complexes. However, a protein-DNA complex model whereby the protein wraps around DNA has not been reported. Defining the ambiguous interaction restraints for the classical three-Cys2His2 zinc-finger proteins that wrap around DNA is critical because of the complicated binding geometry. In this study, we generated a Zif268-DNA complex model using three different sets of ambiguous interaction restraints (AIRs) to study the effect of the geometric distribution on the docking and used this approach to generate a newly reported Sp1-DNA complex model. Results The complex models we generated on the basis of two AIRs with a good geometric distribution in each domain are reasonable in terms of the number of models with wrap-around conformation, interface root mean square deviation, AIR energy and fraction native contacts. We derived the modeling approach for generating a three-Cys2His2 zinc-finger-DNA complex model according to the results of docking studies using the Zif268-DNA and other three crystal complex structures. Furthermore, the Sp1-DNA complex model was calculated with this approach, and the interactions between Sp1 and DNA are in good agreement with those previously reported. Conclusions Our docking data demonstrate that two AIRs with a reasonable geometric distribution in each of the three-Cys2His2 zinc-finger domains are sufficient to generate an accurate complex model with protein wrapping around DNA. This approach is efficient for generating a zinc-finger protein-DNA complex model for unknown complex structures in which the protein wraps around DNA. We provide a flowchart showing the detailed procedures of this approach.
Collapse
|
15
|
Abstract
Structure-based DNA-binding prediction is a powerful tool to infer protein-binding sites and design new specificities. It can limit experiments in scope and help focus them toward candidates with higher chances of success. The zinc finger domain is an excellent scaffold for design due to its small and robust fold and relatively simple interaction pattern. It presents some degree of modularity, and modeling can be used to guide experiments and help increase zinc finger module libraries. In this chapter we present a fast and simple but still powerful method for predicting and designing DNA-binding specificities applied to C(2)H(2) zinc finger proteins, based on FoldX, a semiautomatic protein design tool. Given a template structure, this method generates candidate mutants for a given target DNA sequence selected by energetic criteria.
Collapse
|
16
|
Xu B, Yang Y, Liang H, Zhou Y. An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins 2009; 76:718-30. [PMID: 19274740 DOI: 10.1002/prot.22384] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
How to make an accurate representation of protein-DNA interaction by an energy function is a long-standing unsolved problem in structural biology. Here, we modified a statistical potential based on the distance-scaled, finite ideal-gas reference state so that it is optimized for protein-DNA interactions. The changes include a volume-fraction correction to account for unmixable atom types in proteins and DNA in addition to the usage of a low-count correction, residue/base-specific atom types, and a shorter cutoff distance for protein-DNA interactions. The new statistical energy functions are tested in threading and docking decoy discriminations and prediction of protein-DNA binding affinities and transcription-factor binding profiles. The results indicate that new proposed energy functions are among the best in existing energy functions for protein-DNA interactions. The new energy functions are available as a web-server called DDNA 2.0 at http://sparks.informatics.iupui.edu. The server version was trained by the entire 212 protein-DNA complexes.
Collapse
Affiliation(s)
- Beisi Xu
- Department of Polymer Science and Engineering, University of Science and Technology of China, Hefei, Anhui, China
| | | | | | | |
Collapse
|
17
|
Temiz NA, Camacho CJ. Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface. Nucleic Acids Res 2009; 37:4076-88. [PMID: 19429892 PMCID: PMC2709573 DOI: 10.1093/nar/gkp289] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A major obstacle towards understanding the molecular basis of transcriptional regulation is the lack of a recognition code for protein–DNA interactions. Using high-quality crystal structures and binding data on the promiscuous family of C2H2 zinc fingers (ZF), we decode 10 fundamental specific interactions responsible for protein–DNA recognition. The interactions include five hydrogen bond types, three atomic desolvation penalties, a favorable non-polar energy, and a novel water accessibility factor. We apply this code to three large datasets containing a total of 89 C2H2 transcription factor (TF) mutants on the three ZFs of EGR. Guided by molecular dynamics simulations of individual ZFs, we map the interactions into homology models that embody all feasible intra- and intermolecular bonds, selecting for each sequence the structure with the lowest free energy. These interactions reproduce the change in affinity of 35 mutants of finger I (R2 = 0.998), 23 mutants of finger II (R2 = 0.96) and 31 finger III human domains (R2 = 0.94). Our findings reveal recognition rules that depend on DNA sequence/structure, molecular water at the interface and induced fit of the C2H2 TFs. Collectively, our method provides the first robust framework to decode the molecular basis of TFs binding to DNA.
Collapse
Affiliation(s)
- N Alpay Temiz
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
18
|
Osawa Y, Ikebukuro K, Sode K. Zn finger-based direct detection system for PCR products of Salmonella spp. and the Influenza A virus. Biotechnol Lett 2009; 31:725-33. [DOI: 10.1007/s10529-009-9927-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Accepted: 12/23/2008] [Indexed: 10/21/2022]
|
19
|
Jamal Rahi S, Virnau P, Mirny LA, Kardar M. Predicting transcription factor specificity with all-atom models. Nucleic Acids Res 2008; 36:6209-17. [PMID: 18829719 PMCID: PMC2577325 DOI: 10.1093/nar/gkn589] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The binding of a transcription factor (TF) to a DNA operator site can initiate or repress the expression of a gene. Computational prediction of sites recognized by a TF has traditionally relied upon knowledge of several cognate sites, rather than an ab initio approach. Here, we examine the possibility of using structure-based energy calculations that require no knowledge of bound sites but rather start with the structure of a protein–DNA complex. We study the PurR Escherichia coli TF, and explore to which extent atomistic models of protein–DNA complexes can be used to distinguish between cognate and noncognate DNA sites. Particular emphasis is placed on systematic evaluation of this approach by comparing its performance with bioinformatic methods, by testing it against random decoys and sites of homologous TFs. We also examine a set of experimental mutations in both DNA and the protein. Using our explicit estimates of energy, we show that the specificity for PurR is dominated by direct protein–DNA interactions, and weakly influenced by bending of DNA.
Collapse
Affiliation(s)
- Sahand Jamal Rahi
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Peter Virnau
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- *To whom correspondence should be addressed. Tel: +49 6131 392 3646; Fax: +49 6131 392 5441;
| | - Leonid A. Mirny
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Mehran Kardar
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| |
Collapse
|
20
|
Zheng Y, Kief J, Auffarth K, Farfsing JW, Mahlert M, Nieto F, Basse CW. The Ustilago maydis Cys2His2-type zinc finger transcription factor Mzr1 regulates fungal gene expression during the biotrophic growth stage. Mol Microbiol 2008; 68:1450-70. [PMID: 18410495 DOI: 10.1111/j.1365-2958.2008.06244.x] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The smut fungus Ustilago maydis establishes a biotrophic relationship with its host plant maize to progress through sexual development. Here, we report the identification and characterization of the Cys(2)His(2)-type zinc finger protein Mzr1 that functions as a transcriptional activator during host colonization. Expression of the U. maydis mig2 cluster genes is tightly linked to this phase. Upon conditional overexpression, Mzr1 confers induction of a subset of mig2 genes during vegetative growth and this requires the same promoter elements that confer inducible expression in planta. Furthermore, expression of the mig2-4 and mig2-5 genes during biotrophic growth is strongly reduced in cells deleted in mzr1. DNA-array analysis led to the identification of additional Mzr1-induced genes. Some of these genes show a mig2-like plant-specific expression pattern and Mzr1 is responsible for their high-level expression during pathogenesis. Mzr1 function requires the b-dependently regulated Cys(2)His(2)-type cell cycle regulator Biz1, indicating that two stage-specific regulators mediate gene expression during host colonization. In spite of a role as transcriptional activator during biotrophic growth, mzr1 is not essential for pathogenesis; however, conditional overexpression interfered with proliferation during vegetative growth and mating ability, caused a cell separation defect, and triggered filamentous growth. We discuss the implications of these findings.
Collapse
Affiliation(s)
- Yan Zheng
- Max-Planck-Institute for Terrestrial Microbiology, Department of Organismic Interactions, Karl-von-Frisch-Strasse, D-35043 Marburg, Germany
| | | | | | | | | | | | | |
Collapse
|
21
|
Egr-1 binds the GnRH promoter to mediate the increase in gene expression by insulin. Mol Cell Endocrinol 2007; 270:64-72. [PMID: 17379398 DOI: 10.1016/j.mce.2007.02.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Revised: 02/15/2007] [Accepted: 02/21/2007] [Indexed: 12/26/2022]
Abstract
Insulin increases gonadotropin-releasing hormone (GnRH) gene expression in in vitro models of GnRH neurons. Early growth response-1 (Egr-1) is a transcription factor that mediates the effect of insulin on target genes. In the GN11 cell line--an immortalized GnRH-secreting neuronal cell line--insulin maximally increases Egr-1 mRNA after 30min of treatment and Egr-1 protein and GnRH mRNA after 60min of treatment. Egr-1 small interfering RNA blocks the insulin-induced increase in GnRH promoter activity, measured as luciferase expression. Chromatin immunoprecipitation using Egr-1 antibody precipitates DNA in a proximal region of the GnRH promoter but not DNA in a distal region. Mutagenesis of a putative Egr-1 binding site within the proximal region blocks the insulin-induced increase in GnRH promoter activity. Thus, Egr-1 binds the GnRH promoter at a site between -67 and -76bp from the transcriptional start site to mediate the insulin-induced increase in GnRH gene transcription.
Collapse
|
22
|
Siggers TW, Honig B. Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry. Nucleic Acids Res 2007; 35:1085-97. [PMID: 17264128 PMCID: PMC1851644 DOI: 10.1093/nar/gkl1155] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Predicting the binding specificity of transcription factors is a critical step in the characterization and computational identification and of cis-regulatory elements in genomic sequences. Here we use protein–DNA structures to predict binding specificity and consider the possibility of predicting position weight matrices (PWM) for an entire protein family based on the structures of just a few family members. A particular focus is the sensitivity of prediction accuracy to the docking geometry of the structure used. We investigate this issue with the goal of determining how similar two docking geometries must be for binding specificity predictions to be accurate. Docking similarity is quantified using our recently described interface alignment score (IAS). Using a molecular-mechanics force field, we predict high-affinity nucleotide sequences that bind to the second zinc-finger (ZF) domain from the Zif268 protein, using different C2H2 ZF domains as structural templates. We identify a strong relationship between IAS values and prediction accuracy, and define a range of IAS values for which accurate structure-based predictions of binding specificity is to be expected. The implication of our results for large-scale, structure-based prediction of PWMs is discussed.
Collapse
Affiliation(s)
| | - Barry Honig
- *To whom correspondence should be addressed. Tel: + 1 212 851 4651; Fax: + 1 212 8514 650;
| |
Collapse
|
23
|
Abstract
Protein–DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful comparison of how well the different methods are able to predict biologically important quantities such as the correct DNA binding sequence, total binding free energy and free energy changes caused by DNA mutation. In addition to carrying out the comparison, we present two important theoretical models developed initially in protein folding that have not yet been tried on protein–DNA interactions. In the process, we find that the results of these knowledge-based potentials show a strong dependence on the interaction distance and the derivation method. Finally, we present a knowledge-based potential that gives comparable or superior results to the best of the other methods, including the molecular mechanics force field AMBER99.
Collapse
Affiliation(s)
- Jason E Donald
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford St. Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
24
|
Becker NB, Wolff L, Everaers R. Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials. Nucleic Acids Res 2006; 34:5638-49. [PMID: 17038333 PMCID: PMC1636474 DOI: 10.1093/nar/gkl683] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2006] [Revised: 09/05/2006] [Accepted: 09/06/2006] [Indexed: 01/23/2023] Open
Abstract
Essential biological processes require that proteins bind to a set of specific DNA sites with tuned relative affinities. We focus on the indirect readout mechanism and discuss its theoretical description in relation to the present understanding of DNA elasticity on the rigid base pair level. Combining existing parametrizations of elastic potentials for DNA, we derive elastic free energies directly related to competitive binding experiments, and propose a computationally inexpensive local marker for elastically optimized subsequences in protein-DNA co-crystals. We test our approach in an application to the bacteriophage 434 repressor. In agreement with known results we find that indirect readout dominates at the central, non-contacted bases of the binding site. Elastic optimization involves all deformation modes and is mainly due to the adapted equilibrium structure of the operator, while sequence-dependent elasticity plays a minor role. These qualitative observations are robust with respect to current parametrization uncertainties. Predictions for relative affinities mediated by indirect readout depend sensitively on the chosen parametrization. Their quantitative comparison with experimental data allows for a critical evaluation of DNA elastic potentials and of the correspondence between crystal and solution structures. The software written for the presented analysis is included as Supplementary Data.
Collapse
Affiliation(s)
- Nils B Becker
- Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Strasse 38, 01187 Dresden, Germany.
| | | | | |
Collapse
|
25
|
Flor-Parra I, Vranes M, Kämper J, Pérez-Martín J. Biz1, a zinc finger protein required for plant invasion by Ustilago maydis, regulates the levels of a mitotic cyclin. THE PLANT CELL 2006; 18:2369-87. [PMID: 16905655 PMCID: PMC1560913 DOI: 10.1105/tpc.106.042754] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2006] [Revised: 06/16/2006] [Accepted: 07/17/2006] [Indexed: 05/11/2023]
Abstract
Plant invasion by pathogenic fungi involves regulated growth and highly organized fungal morphological changes. For instance, when the smut fungus Ustilago maydis infects maize (Zea mays), its dikaryotic infective filament is cell cycle arrested, and appressoria are differentiated prior to plant penetration. Once the filament enters the plant, the cell cycle block is released and fungal cells begin proliferation, suggesting a tight interaction between plant invasion and the cell cycle and morphogenesis control systems. We describe a novel factor, Biz1 (b-dependent zinc finger protein), which has two Cys(2)His(2) zinc finger domains and nuclear localization, suggesting a transcriptional regulatory function. The deletion of biz1 shows no detectable phenotypic alterations during axenic growth. However, mutant cells show a severe reduction in appressoria formation and plant penetration, and those hyphae that invade the plant arrest their pathogenic development directly after plant penetration. biz1 is induced via the b-mating-type locus, the key control instance for pathogenic development. The gene is expressed at high levels throughout pathogenic development, which induces a G2 cell cycle arrest that is a direct consequence of the downregulation of the mitotic cyclin Clb1. Our data support a model in which Biz1 is involved in cell cycle arrest preceding plant penetration as well as in the induction of appressoria.
Collapse
Affiliation(s)
- Ignacio Flor-Parra
- Departamento de Biotecnología Microbiana, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Campus de Cantoblanco, Universidad Autonoma de Madrid, 28049 Madrid, Spain
| | | | | | | |
Collapse
|
26
|
Endres RG, Wingreen NS. Weight matrices for protein-DNA binding sites from a single co-crystal structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 73:061921. [PMID: 16906878 DOI: 10.1103/physreve.73.061921] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Revised: 01/31/2006] [Indexed: 05/11/2023]
Abstract
Transcription-factor proteins bind to specific DNA sequences to regulate gene expression in cells. DNA-binding sites are often identified using weight matrices calculated from multiple known binding sites. However, in many cases the number of examples is limited. Here, we report on an atomistic method that starts from an x-ray co-crystal structure of the protein bound to one particular DNA sequence, and infers other binding sites, which are used to construct a weight matrix. The emphasis of the paper is on using the Wang-Landau Monte Carlo algorithm to efficiently sample high-affinity binding sites, which demonstrates that sampling can produce accurate weight matrices in analogy to bioinformatics approaches. For cases of low complexity, we compare to the exhaustive (but slow) dead-end elimination algorithm. To recover crystal binding sites, it is important to include bound water in the protein-DNA interface. Our approach can, in principle, even be applied when no native protein-DNA co-crystal structure is available, only the structure of a closely related homologous protein whose amino-acid sequence is changed to the protein of interest.
Collapse
Affiliation(s)
- Robert G Endres
- NEC Laboratories America, Inc., Princeton, New Jersey 08540, USA.
| | | |
Collapse
|
27
|
Lee WP, Tzou WS. Molecular surface directionality of the DNA-binding protein surface on the earth map. Genet Mol Biol 2006. [DOI: 10.1590/s1415-47572006000200033] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Wei-Po Lee
- National University of Kaohsiung, Taiwan
| | - Wen-Shyong Tzou
- National Taiwan Ocean University 2, Taiwan; National Taiwan Ocean University 2, Taiwan
| |
Collapse
|
28
|
Papworth M, Kolasinska P, Minczuk M. Designer zinc-finger proteins and their applications. Gene 2006; 366:27-38. [PMID: 16298089 DOI: 10.1016/j.gene.2005.09.011] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2005] [Accepted: 09/18/2005] [Indexed: 10/25/2022]
Abstract
The Cys(2)His(2) zinc finger is one of the most common DNA-binding motifs in Eukaryota. A simple mode of DNA recognition by the Cys(2)His(2) zinc finger domain provides an ideal scaffold for designing proteins with novel sequence specificities. The ability to bind specifically to virtually any DNA sequence combined with the potential of fusing them with effector domains has led to the technology of engineering of chimeric DNA-modifying enzymes and transcription factors. This in turn has opened the possibility of using the engineered zinc finger-based factors as novel human therapeutics. One such synthetic factor-designer zinc finger transcription activator of the vascular endothelial growth factor A gene-has recently entered clinical trials to evaluate the ability of stimulating the growth of blood vessels in treating the peripheral arterial obstructive disease. This review concentrates on the aspects of natural Cys(2)His(2) zinc fingers evolution and fundamental steps in design of engineered zinc finger proteins. The applications of engineered zinc finger proteins are discussed in a context of the mechanism mediating their effect on the targeted DNA. Furthermore, the regulation of the expression of zinc finger proteins and their targeting to various cellular compartments and to chromatin and non-chromatin target templates are described. Also possible future applications of designer zinc finger proteins are discussed.
Collapse
Affiliation(s)
- Monika Papworth
- MRC Laboratory of Molecular Biology, Hills Road, CB2 2QH, UK.
| | | | | |
Collapse
|
29
|
Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res 2005; 33:5781-98. [PMID: 16246914 PMCID: PMC1270944 DOI: 10.1093/nar/gki875] [Citation(s) in RCA: 153] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Protein-DNA interactions play a central role in transcriptional regulation and other biological processes. Investigating the mechanism of binding affinity and specificity in protein-DNA complexes is thus an important goal. Here we develop a simple physical energy function, which uses electrostatics, solvation, hydrogen bonds and atom-packing terms to model direct readout and sequence-specific DNA conformational energy to model indirect readout of DNA sequence by the bound protein. The predictive capability of the model is tested against another model based only on the knowledge of the consensus sequence and the number of contacts between amino acids and DNA bases. Both models are used to carry out predictions of protein-DNA binding affinities which are then compared with experimental measurements. The nearly additive nature of protein-DNA interaction energies in our model allows us to construct position-specific weight matrices by computing base pair probabilities independently for each position in the binding site. Our approach is less data intensive than knowledge-based models of protein-DNA interactions, and is not limited to any specific family of transcription factors. However, native structures of protein-DNA complexes or their close homologs are required as input to the model. Use of homology modeling can significantly increase the extent of our approach, making it a useful tool for studying regulatory pathways in many organisms and cell types.
Collapse
Affiliation(s)
- Alexandre V Morozov
- Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA.
| | | | | | | |
Collapse
|
30
|
Kaplan T, Friedman N, Margalit H. Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput Biol 2005; 1:e1. [PMID: 16103898 PMCID: PMC1183507 DOI: 10.1371/journal.pcbi.0010001] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2005] [Accepted: 02/11/2005] [Indexed: 12/02/2022] Open
Abstract
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys2His2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. Cells respond to dynamic changes in their environment by invoking various cellular processes, coordinated by a complex regulatory program. A main component of this program is the regulation of transcription, which is mainly accomplished by transcription factors that bind the DNA in the vicinity of genes. To better understand transcriptional regulation, advanced computational approaches are needed for linking between transcription factors and their targets. The authors describe a novel approach by which the binding site of a given transcription factor can be characterized without previous experimental binding data. This approach involves learning a set of context-specific amino acid–nucleotide recognition preferences that, when combined with the sequence and structure of the protein, can predict its specific binding preferences. Applying this approach to the Cys2His2 Zinc Finger protein family demonstrated its genome-wide potential by automatically predicting the direct targets of 29 regulators in the genome of the fruit fly Drosophila melanogaster. At present, with the availability of many genome sequences, there are numerous proteins annotated as transcription factors based on their sequence alone. This approach offers a promising direction for revealing the targets of these factors and for understanding their roles in the cellular network.
Collapse
Affiliation(s)
- Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
| | - Nir Friedman
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
- *To whom correspondence should be addressed. E-mail: (NF), (HM)
| | - Hanah Margalit
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
- *To whom correspondence should be addressed. E-mail: (NF), (HM)
| |
Collapse
|