101
|
Wunderlich Z, Mirny LA. Using genome-wide measurements for computational prediction of SH2-peptide interactions. Nucleic Acids Res 2009; 37:4629-41. [PMID: 19502496 PMCID: PMC2724268 DOI: 10.1093/nar/gkp394] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Peptide-recognition modules (PRMs) are used throughout biology to mediate protein–protein interactions, and many PRMs are members of large protein domain families. Recent genome-wide measurements describe networks of peptide–PRM interactions. In these networks, very similar PRMs recognize distinct sets of peptides, raising the question of how peptide-recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2 domain–peptide interactions to study the physical origin of domain–peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino-acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein–DNA interactions.
Collapse
Affiliation(s)
- Zeba Wunderlich
- Biophysics Program, Harvard University, Cambridge, MA 02138, USA
| | | |
Collapse
|
102
|
Thermodynamic pathways to genome spatial organization in the cell nucleus. Biophys J 2009; 96:2168-77. [PMID: 19289043 DOI: 10.1016/j.bpj.2008.12.3919] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2008] [Revised: 11/16/2008] [Accepted: 12/09/2008] [Indexed: 12/17/2022] Open
Abstract
The architecture of the eukaryotic genome is characterized by a high degree of spatial organization. Chromosomes occupy preferred territories correlated to their state of activity and, yet, displace their genes to interact with remote sites in complex patterns requiring the orchestration of a huge number of DNA loci and molecular regulators. Far from random, this organization serves crucial functional purposes, but its governing principles remain elusive. By computer simulations of a statistical mechanics model, we show how architectural patterns spontaneously arise from the physical interaction between soluble binding molecules and chromosomes via collective thermodynamics mechanisms. Chromosomes colocalize, loops and territories form, and find their relative positions as stable thermodynamic states. These are selected by thermodynamic switches, which are regulated by concentrations/affinity of soluble mediators and by number/location of their attachment sites along chromosomes. Our thermodynamic switch model of nuclear architecture, thus, explains on quantitative grounds how well-known cell strategies of upregulation of DNA binding proteins or modification of chromatin structure can dynamically shape the organization of the nucleus.
Collapse
|
103
|
Andrabi M, Mizuguchi K, Sarai A, Ahmad S. Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks. BMC STRUCTURAL BIOLOGY 2009; 9:30. [PMID: 19439068 PMCID: PMC2693520 DOI: 10.1186/1472-6807-9-30] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Accepted: 05/13/2009] [Indexed: 11/18/2022]
Abstract
Background DNA recognition by proteins is one of the most important processes in living systems. Therefore, understanding the recognition process in general, and identifying mutual recognition sites in proteins and DNA in particular, carries great significance. The sequence and structural dependence of DNA-binding sites in proteins has led to the development of successful machine learning methods for their prediction. However, all existing machine learning methods predict DNA-binding sites, irrespective of their target sequence and hence, none of them is helpful in identifying specific protein-DNA contacts. In this work, we formulate the problem of predicting specific DNA-binding sites in terms of contacts between the residue environments of proteins and the identity of a mononucleotide or a dinucleotide step in DNA. The aim of this work is to take a protein sequence or structural features as inputs and predict for each amino acid residue if it binds to DNA at locations identified by one of the four possible mononucleotides or one of the 10 unique dinucleotide steps. Contact predictions are made at various levels of resolution viz. in terms of side chain, backbone and major or minor groove atoms of DNA. Results Significant differences in residue preferences for specific contacts are observed, which combined with other features, lead to promising levels of prediction. In general, PSSM-based predictions, supported by secondary structure and solvent accessibility, achieve a good predictability of ~70–80%, measured by the area under the curve (AUC) of ROC graphs. The major and minor groove contact predictions stood out in terms of their poor predictability from sequences or PSSM, which was very strongly (>20 percentage points) compensated by the addition of secondary structure and solvent accessibility information, revealing a predominant role of local protein structure in the major/minor groove DNA-recognition. Following a detailed analysis of results, a web server to predict mononucleotide and dinucleotide-step contacts using PSSM was developed and made available at or . Conclusion Most residue-nucleotide contacts can be predicted with high accuracy using only sequence and evolutionary information. Major and minor groove contacts, however, depend profoundly on the local structure. Overall, this study takes us a step closer to the ultimate goal of predicting mutual recognition sites in protein and DNA sequences.
Collapse
Affiliation(s)
- Munazah Andrabi
- National Institute of Biomedical Innovation, Ibaraki-shi, Osaka, Japan.
| | | | | | | |
Collapse
|
104
|
Temiz NA, Camacho CJ. Experimentally based contact energies decode interactions responsible for protein-DNA affinity and the role of molecular waters at the binding interface. Nucleic Acids Res 2009; 37:4076-88. [PMID: 19429892 PMCID: PMC2709573 DOI: 10.1093/nar/gkp289] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A major obstacle towards understanding the molecular basis of transcriptional regulation is the lack of a recognition code for protein–DNA interactions. Using high-quality crystal structures and binding data on the promiscuous family of C2H2 zinc fingers (ZF), we decode 10 fundamental specific interactions responsible for protein–DNA recognition. The interactions include five hydrogen bond types, three atomic desolvation penalties, a favorable non-polar energy, and a novel water accessibility factor. We apply this code to three large datasets containing a total of 89 C2H2 transcription factor (TF) mutants on the three ZFs of EGR. Guided by molecular dynamics simulations of individual ZFs, we map the interactions into homology models that embody all feasible intra- and intermolecular bonds, selecting for each sequence the structure with the lowest free energy. These interactions reproduce the change in affinity of 35 mutants of finger I (R2 = 0.998), 23 mutants of finger II (R2 = 0.96) and 31 finger III human domains (R2 = 0.94). Our findings reveal recognition rules that depend on DNA sequence/structure, molecular water at the interface and induced fit of the C2H2 TFs. Collectively, our method provides the first robust framework to decode the molecular basis of TFs binding to DNA.
Collapse
Affiliation(s)
- N Alpay Temiz
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
105
|
Becker NB, Everaers R. DNA nanomechanics: How proteins deform the double helix. J Chem Phys 2009; 130:135102. [DOI: 10.1063/1.3082157] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
106
|
Gao M, Skolnick J. From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions. PLoS Comput Biol 2009; 5:e1000341. [PMID: 19343221 PMCID: PMC2659451 DOI: 10.1371/journal.pcbi.1000341] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 02/26/2009] [Indexed: 11/19/2022] Open
Abstract
DNA–protein interactions are involved in many essential biological
activities. Because there is no simple mapping code between DNA base pairs and
protein amino acids, the prediction of DNA–protein interactions is a
challenging problem. Here, we present a novel computational approach for
predicting DNA-binding protein residues and DNA–protein interaction
modes without knowing its specific DNA target sequence. Given the structure of a
DNA-binding protein, the method first generates an ensemble of complex
structures obtained by rigid-body docking with a nonspecific canonical B-DNA.
Representative models are subsequently selected through clustering and ranking
by their DNA–protein interfacial energy. Analysis of these encounter
complex models suggests that the recognition sites for specific DNA binding are
usually favorable interaction sites for the nonspecific DNA probe and that
nonspecific DNA–protein interaction modes exhibit some similarity to
specific DNA–protein binding modes. Although the method requires as
input the knowledge that the protein binds DNA, in benchmark tests, it achieves
better performance in identifying DNA-binding sites than three previously
established methods, which are based on sophisticated machine-learning
techniques. We further apply our method to protein structures predicted through
modeling and demonstrate that our method performs satisfactorily on protein
models whose root-mean-square Cα deviation from native is up to 5
Å from their native structures. This study provides valuable
structural insights into how a specific DNA-binding protein interacts with a
nonspecific DNA sequence. The similarity between the specific
DNA–protein interaction mode and nonspecific interaction modes may
reflect an important sampling step in search of its specific DNA targets by a
DNA-binding protein. Many essential biological activities require interactions between DNA and
proteins. These proteins usually use certain amino acids, called DNA-binding
sites, to recognize their specific DNA targets. To facilitate the search of its
specific DNA targets, a DNA-binding protein often associates with nonspecific
DNA and then diffuses along the DNA. Due to the weak interactions between
nonspecific DNA and the protein, structural characterization of nonspecific
DNA–protein complexes is experimentally challenging. This paper
describes a computational modeling study on nonspecific DNA–protein
complexes and comparative analysis with respect to specific
DNA–protein complexes. The study found that the specific DNA-binding
sites on a protein are typically favorable for nonspecific DNA and that
nonspecific and specific DNA–protein interaction modes are quite
similar. This similarity may reflect an important sampling step in the search
for the specific DNA target sequence by a DNA-binding protein. On the basis of
these observations, a novel method was proposed for predicting DNA-binding sites
and binding modes of a DNA-binding protein without knowing its specific DNA
target sequence. Ultimately, the combination of this method and protein
structure prediction may lead the way to high throughput modeling of
DNA–protein interactions.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia
Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia
Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
107
|
Rohs R, West SM, Liu P, Honig B. Nuance in the double-helix and its role in protein-DNA recognition. Curr Opin Struct Biol 2009; 19:171-7. [PMID: 19362815 PMCID: PMC2701566 DOI: 10.1016/j.sbi.2009.03.002] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Revised: 02/25/2009] [Accepted: 03/03/2009] [Indexed: 10/20/2022]
Abstract
It has been known for some time that the double-helix is not a uniform structure but rather exhibits sequence-specific variations that, combined with base-specific intermolecular interactions, offer the possibility of numerous modes of protein-DNA recognition. All-atom simulations have revealed mechanistic insights into the structural and energetic basis of various recognition mechanisms for a number of protein-DNA complexes while coarser grained simulations have begun to provide an understanding of the function of larger assemblies. Molecular simulations have also been applied to the prediction of transcription factor binding sites, while empirical approaches have been developed to predict nucleosome positioning. Studies that combine and integrate experimental, statistical and computational data offer the promise of rapid advances in our understanding of protein-DNA recognition mechanisms.
Collapse
Affiliation(s)
- Remo Rohs
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St., Nicholas Avenue, New York, NY 10032, USADepartment of Biochemistry and Molecular Biophysics, Columbia University, 630 West, 168 Street, New York, NY 10032, USA
| | - Sean M. West
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St., Nicholas Avenue, New York, NY 10032, USADepartment of Biochemistry and Molecular Biophysics, Columbia University, 630 West, 168 Street, New York, NY 10032, USA
| | - Peng Liu
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St., Nicholas Avenue, New York, NY 10032, USADepartment of Biochemistry and Molecular Biophysics, Columbia University, 630 West, 168 Street, New York, NY 10032, USA
| | - Barry Honig
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St., Nicholas Avenue, New York, NY 10032, USADepartment of Biochemistry and Molecular Biophysics, Columbia University, 630 West, 168 Street, New York, NY 10032, USA
| |
Collapse
|
108
|
Abstract
We present an all-atom molecular modeling method that can predict the binding specificity of a transcription factor based on its 3D structure, with no further information required. We use molecular dynamics and free energy calculations to compute the relative binding free energies for a transcription factor with multiple possible DNA sequences. These sequences are then used to construct a position weight matrix to represent the transcription factor-binding sites. Free energy differences are calculated by morphing one base pair into another using a multi-copy representation in which multiple base pairs are superimposed at a single DNA position. Water-mediated hydrogen bonds between transcription factor side chains and DNA bases are known to contribute to binding specificity for certain transcription factors. To account for this important effect, the simulation protocol includes an explicit molecular water solvent and counter-ions. For computational efficiency, we use a standard additive approximation for the contribution of each DNA base pair to the total binding free energy. The additive approximation is not strictly necessary, and more detailed computations could be used to investigate non-additive effects.
Collapse
Affiliation(s)
- L Angela Liu
- Department of Biomedical Engineering and Institute for Multiscale Modeling of Biological Interactions, John Hopkins University, Baltimore, MD, USA
| | | |
Collapse
|
109
|
Wichadakul D, McDermott J, Samudrala R. Prediction and integration of regulatory and protein-protein interactions. Methods Mol Biol 2009; 541:101-43. [PMID: 19381527 DOI: 10.1007/978-1-59745-243-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast ( 1 ) and worm ( 2 ). Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse ( 3 ). We describe how to compile and handle various formats and identifiers of data sets from different sources and how to predict TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein subcellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein-protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.
Collapse
|
110
|
Persikov AV, Osada R, Singh M. Predicting DNA recognition by Cys2His2 zinc finger proteins. ACTA ACUST UNITED AC 2008; 25:22-9. [PMID: 19008249 DOI: 10.1093/bioinformatics/btn580] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cys(2)His(2) zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved protein-DNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The 'canonical model' for ZF protein-DNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain. RESULTS We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF protein-DNA interactions, ours additionally incorporates information about protein-DNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of protein-DNA pairs; this type of information has not been used previously in predicting ZF protein-DNA binding. Here, we build a high-quality literature-derived experimental database of ZF-DNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF protein-DNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF protein-DNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of protein-DNA pairs have great potential for effective prediction of protein-DNA interactions. AVAILABILITY An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.
Collapse
Affiliation(s)
- Anton V Persikov
- Lewis-Sigler Institute for Integrative Genomics and Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|
111
|
Abstract
DNA is thought to behave as a stiff elastic rod with respect to the ubiquitous mechanical deformations inherent to its biology. To test this model at short DNA lengths, we measured the mean and variance of end-to-end length for a series of DNA double helices in solution, using small-angle x-ray scattering interference between gold nanocrystal labels. In the absence of applied tension, DNA is at least one order of magnitude softer than measured by single-molecule stretching experiments. Further, the data rule out the conventional elastic rod model. The variance in end-to-end length follows a quadratic dependence on the number of base pairs rather than the expected linear dependence, indicating that DNA stretching is cooperative over more than two turns of the DNA double helix. Our observations support the idea of long-range allosteric communication through DNA structure.
Collapse
Affiliation(s)
| | - Rhiju Das
- Department of Physics, Stanford University, Stanford, CA 94305
| | | |
Collapse
|
112
|
Angarica VE, Pérez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics 2008; 9:436. [PMID: 18922190 PMCID: PMC2585596 DOI: 10.1186/1471-2105-9-436] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Accepted: 10/16/2008] [Indexed: 11/10/2022] Open
Abstract
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Collapse
Affiliation(s)
- Vladimir Espinosa Angarica
- Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, 50009 Zaragoza, España.
| | | | | | | | | |
Collapse
|
113
|
Jamal Rahi S, Virnau P, Mirny LA, Kardar M. Predicting transcription factor specificity with all-atom models. Nucleic Acids Res 2008; 36:6209-17. [PMID: 18829719 PMCID: PMC2577325 DOI: 10.1093/nar/gkn589] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The binding of a transcription factor (TF) to a DNA operator site can initiate or repress the expression of a gene. Computational prediction of sites recognized by a TF has traditionally relied upon knowledge of several cognate sites, rather than an ab initio approach. Here, we examine the possibility of using structure-based energy calculations that require no knowledge of bound sites but rather start with the structure of a protein–DNA complex. We study the PurR Escherichia coli TF, and explore to which extent atomistic models of protein–DNA complexes can be used to distinguish between cognate and noncognate DNA sites. Particular emphasis is placed on systematic evaluation of this approach by comparing its performance with bioinformatic methods, by testing it against random decoys and sites of homologous TFs. We also examine a set of experimental mutations in both DNA and the protein. Using our explicit estimates of energy, we show that the specificity for PurR is dominated by direct protein–DNA interactions, and weakly influenced by bending of DNA.
Collapse
Affiliation(s)
- Sahand Jamal Rahi
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Peter Virnau
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- *To whom correspondence should be addressed. Tel: +49 6131 392 3646; Fax: +49 6131 392 5441;
| | - Leonid A. Mirny
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Mehran Kardar
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA, Staudinger Weg 7, Institut für Physik, 55099 Mainz, Germany and Harvard-MIT Division of Health Sciences and Technology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| |
Collapse
|
114
|
Yang M, Teplow DB. Amyloid beta-protein monomer folding: free-energy surfaces reveal alloform-specific differences. J Mol Biol 2008; 384:450-64. [PMID: 18835397 DOI: 10.1016/j.jmb.2008.09.039] [Citation(s) in RCA: 199] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2008] [Revised: 09/09/2008] [Accepted: 09/12/2008] [Indexed: 12/22/2022]
Abstract
Alloform-specific differences in structural dynamics between amyloid beta-protein (Abeta) 40 and Abeta42 appear to underlie the pathogenesis of Alzheimer's disease. To elucidate these differences, we performed microsecond timescale replica-exchange molecular dynamics simulations to sample the conformational space of the Abeta monomer and constructed its free-energy surface. We find that neither peptide monomer is unstructured, but rather that each may be described as a unique statistical coil in which five relatively independent folding units exist, comprising residues 1-5, 10-13, 17-22, 28-37, and 39-42, which are connected by four turn structures. The free-energy surfaces of both peptides are characterized by two large basins, comprising conformers with either substantial alpha-helix or beta-sheet content. Conformational transitions within and between these basins are rapid. The two additional hydrophobic residues at the Abeta42 C-terminus, Ile41 and Ala42, significantly increase contacts within the C-terminus, and between the C-terminus and the central hydrophobic cluster (Leu17-Ala21). As a result, the beta-structure of Abeta42 is more stable than that of Abeta40, and the conformational equilibrium in Abeta42 shifts towards beta-structure. These results suggest that drugs stabilizing alpha-helical Abeta conformers (or destabilizing the beta-sheet state) would block formation of neurotoxic oligomers. The atomic-resolution conformer structures determined in our simulations may serve as useful targets for this purpose. The conformers also provide starting points for simulations of Abeta oligomerization-a process postulated to be the key pathogenetic event in Alzheimer's disease.
Collapse
Affiliation(s)
- Mingfeng Yang
- Department of Neurology, David Geffen School of Medicine, and Molecular Biology Institute and Brain Research Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | |
Collapse
|
115
|
Minakawa N, Kawano Y, Murata S, Inoue N, Matsuda A. Oligodeoxynucleotides containing 3-bromo-3-deazaadenine and 7-bromo-7-deazaadenine 2'-deoxynucleosides as chemical probes to investigate DNA-protein interactions. Chembiochem 2008; 9:464-70. [PMID: 18219644 DOI: 10.1002/cbic.200700580] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We describe the design and proof of concept of a pair of chemical probes for investigating DNA-protein interactions-specifically, the incorporation of 7-bromo-7-deazaadenine and 3-bromo-3-deazaadenine 2'-deoxynucleosides (Br(7)C(7)dA and Br(3)C(3)dA) into oligodeoxynucleotides (ODNs)-and their utility. Whereas the bromo substituent of the Br(7)C(7)dA unit in an ODN duplex acts sterically to inhibit binding with NF-kappaB, which interacts with the duplex in its major groove, the bromo substituent of the Br(3)C(3)dA unit acts sterically to inhibit binding with RNase H, which interacts with the duplex in its minor groove. In addition, the utilization of ODNs containing 7-deazaadenine and 3-deazaadenine 2'-deoxynucleosides (C(7)dA and C(3)dA), together with the pair of chemical probes, afforded valuable information on the requirement for nitrogen atoms located in either the major or minor grooves. Accordingly, we were able to show the utility of ODNs containing Br(7)C(7)dA, Br(3)C(3)dA, C(7)dA, and C(3)dA for the investigation of DNA-protein interactions.
Collapse
Affiliation(s)
- Noriaki Minakawa
- Graduate School of Pharmaceutical Sciences, Hokkaido University, Kita-12, Nishi-6, Kita-ku, Sapporo 060-0812, Japan
| | | | | | | | | |
Collapse
|
116
|
Chang YL, Tsai HK, Kao CY, Chen YC, Hu YJ, Yang JM. Evolutionary conservation of DNA-contact residues in DNA-binding domains. BMC Bioinformatics 2008; 9 Suppl 6:S3. [PMID: 18541056 PMCID: PMC2423444 DOI: 10.1186/1471-2105-9-s6-s3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background DNA-binding proteins are of utmost importance to gene regulation. The identification of DNA-binding domains is useful for understanding the regulation mechanisms of DNA-binding proteins. In this study, we proposed a method to determine whether a domain or a protein can has DNA binding capability by considering evolutionary conservation of DNA-binding residues. Results Our method achieves high precision and recall for 66 families of DNA-binding domains, with a false positive rate less than 5% for 250 non-DNA-binding proteins. In addition, experimental results show that our method is able to identify the different DNA-binding behaviors of proteins in the same SCOP family based on the use of evolutionary conservation of DNA-contact residues. Conclusion This study shows the conservation of DNA-contact residues in DNA-binding domains. We conclude that the members in the same subfamily bind DNA specifically and the members in different subfamilies often recognize different DNA targets. Additionally, we observe the co-evolution of DNA-contact residues and interacting DNA base-pairs.
Collapse
Affiliation(s)
- Yao-Lin Chang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan.
| | | | | | | | | | | |
Collapse
|
117
|
Tolstorukov MY, Choudhary V, Olson WK, Zhurkin VB, Park PJ. nuScore: a web-interface for nucleosome positioning predictions. Bioinformatics 2008; 24:1456-8. [PMID: 18445607 DOI: 10.1093/bioinformatics/btn212] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Sequence-directed mapping of nucleosome positions is of major biological interest. Here, we present a web-interface for estimation of the affinity of the histone core to DNA and prediction of nucleosome arrangement on a given sequence. Our approach is based on assessment of the energy cost of imposing the deformations required to wrap DNA around the histone surface. The interface allows the user to specify a number of options such as selecting from several structural templates for threading calculations and adding random sequences to the analysis. AVAILABILITY The nuScore interface is freely available for use at http://compbio.med.harvard.edu/nuScore. CONTACT peter_park@harvard.edu; tolstorukov@gmail.com SUPPLEMENTARY INFORMATION The site contains user manual, description of the methodology and examples.
Collapse
Affiliation(s)
- Michael Y Tolstorukov
- Harvard-Partners Center for Genetics and Genomics, Brigham and Women's Hospital, Boston, MA 02115, USA.
| | | | | | | | | |
Collapse
|
118
|
Lintner RE, Mishra PK, Srivastava P, Martinez-Vaz BM, Khodursky AB, Blumenthal RM. Limited functional conservation of a global regulator among related bacterial genera: Lrp in Escherichia, Proteus and Vibrio. BMC Microbiol 2008; 8:60. [PMID: 18405378 PMCID: PMC2374795 DOI: 10.1186/1471-2180-8-60] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2007] [Accepted: 04/11/2008] [Indexed: 02/03/2023] Open
Abstract
Background Bacterial genome sequences are being determined rapidly, but few species are physiologically well characterized. Predicting regulation from genome sequences usually involves extrapolation from better-studied bacteria, using the hypothesis that a conserved regulator, conserved target gene, and predicted regulator-binding site in the target promoter imply conserved regulation between the two species. However many compared organisms are ecologically and physiologically diverse, and the limits of extrapolation have not been well tested. In E. coli K-12 the leucine-responsive regulatory protein (Lrp) affects expression of ~400 genes. Proteus mirabilis and Vibrio cholerae have highly-conserved lrp orthologs (98% and 92% identity to E. coli lrp). The functional equivalence of Lrp from these related species was assessed. Results Heterologous Lrp regulated gltB, livK and lrp transcriptional fusions in an E. coli background in the same general way as the native Lrp, though with significant differences in extent. Microarray analysis of these strains revealed that the heterologous Lrp proteins significantly influence only about half of the genes affected by native Lrp. In P. mirabilis, heterologous Lrp restored swarming, though with some pattern differences. P. mirabilis produced substantially more Lrp than E. coli or V. cholerae under some conditions. Lrp regulation of target gene orthologs differed among the three native hosts. Strikingly, while Lrp negatively regulates its own gene in E. coli, and was shown to do so even more strongly in P. mirabilis, Lrp appears to activate its own gene in V. cholerae. Conclusion The overall similarity of regulatory effects of the Lrp orthologs supports the use of extrapolation between related strains for general purposes. However this study also revealed intrinsic differences even between orthologous regulators sharing >90% overall identity, and 100% identity for the DNA-binding helix-turn-helix motif, as well as differences in the amounts of those regulators. These results suggest that predicting regulation of specific target genes based on genome sequence comparisons alone should be done on a conservative basis.
Collapse
Affiliation(s)
- Robert E Lintner
- Department of Medical Microbiology and Immunology, University of Toledo Health Sciences Center, Toledo, OH 43614-2598, USA.
| | | | | | | | | | | |
Collapse
|
119
|
Habib N, Kaplan T, Margalit H, Friedman N. A novel Bayesian DNA motif comparison method for clustering and retrieval. PLoS Comput Biol 2008; 4:e1000010. [PMID: 18463706 PMCID: PMC2265534 DOI: 10.1371/journal.pcbi.1000010] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2007] [Accepted: 01/24/2008] [Indexed: 11/17/2022] Open
Abstract
Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors. Regulation of gene expression plays a central role in the activity of living cells and in their response to internal (e.g., cell division) or external (e.g., stress) stimuli. Key players in determining gene-specific regulation are transcription factors that bind sequence-specific sites on the DNA, modulating the expression of nearby genes. To understand the regulatory program of the cell, we need to identify these transcription factors, when they act, and on which genes. Transcription regulatory maps can be assembled by computational analysis of experimental data, by discovering the DNA recognition sequences (motifs) of transcription factors and their occurrences along the genome. Such an analysis usually results in a large number of overlapping motifs. To reconstruct regulatory maps, it is crucial to combine similar motifs and to relate them to transcription factors. To this end we developed an accurate fully-automated method, termed BLiC, based upon an improved similarity measure for comparing DNA motifs. By applying it to genome-wide data in yeast, we identified the DNA motifs of transcription factors and their putative target genes. Finally, we analyze motifs of transcription factor that alter their target genes under different conditions, and show how cells adjust their regulatory program in response to environmental changes.
Collapse
Affiliation(s)
- Naomi Habib
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
| | | | | | | |
Collapse
|
120
|
Moroni E, Caselle M, Fogolari F. Identification of DNA-binding protein target sequences by physical effective energy functions: free energy analysis of lambda repressor-DNA complexes. BMC STRUCTURAL BIOLOGY 2007; 7:61. [PMID: 17900341 PMCID: PMC2194778 DOI: 10.1186/1472-6807-7-61] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2007] [Accepted: 09/27/2007] [Indexed: 11/26/2022]
Abstract
Background Specific binding of proteins to DNA is one of the most common ways gene expression is controlled. Although general rules for the DNA-protein recognition can be derived, the ambiguous and complex nature of this mechanism precludes a simple recognition code, therefore the prediction of DNA target sequences is not straightforward. DNA-protein interactions can be studied using computational methods which can complement the current experimental methods and offer some advantages. In the present work we use physical effective potentials to evaluate the DNA-protein binding affinities for the λ repressor-DNA complex for which structural and thermodynamic experimental data are available. Results The binding free energy of two molecules can be expressed as the sum of an intermolecular energy (evaluated using a molecular mechanics forcefield), a solvation free energy term and an entropic term. Different solvation models are used including distance dependent dielectric constants, solvent accessible surface tension models and the Generalized Born model. The effect of conformational sampling by Molecular Dynamics simulations on the computed binding energy is assessed; results show that this effect is in general negative and the reproducibility of the experimental values decreases with the increase of simulation time considered. The free energy of binding for non-specific complexes, estimated using the best energetic model, agrees with earlier theoretical suggestions. As a results of these analyses, we propose a protocol for the prediction of DNA-binding target sequences. The possibility of searching regulatory elements within the bacteriophage λ genome using this protocol is explored. Our analysis shows good prediction capabilities, even in absence of any thermodynamic data and information on the naturally recognized sequence. Conclusion This study supports the conclusion that physics-based methods can offer a completely complementary methodology to sequence-based methods for the identification of DNA-binding protein target sequences.
Collapse
Affiliation(s)
- Elisabetta Moroni
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
- Dipartimento di Fisica G. Occhialini, Università di Milano-Bicocca and INFN, Piazza delle Scienze 3, 20156 Milano, Italy
| | - Michele Caselle
- Dipartimento di Fisica Teorica, Universià di Torino and INFN, Via P. Giuria 1, 10125 Torino, Italy
| | - Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|
121
|
Eklund JL, Ulge UY, Eastberg J, Monnat RJ. Altered target site specificity variants of the I-PpoI His-Cys box homing endonuclease. Nucleic Acids Res 2007; 35:5839-50. [PMID: 17720708 PMCID: PMC2034468 DOI: 10.1093/nar/gkm624] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We used a yeast one-hybrid assay to isolate and characterize variants of the eukaryotic homing endonuclease I-PpoI that were able to bind a mutant, cleavage-resistant I-PpoI target or ‘homing’ site DNA in vivo. Native I-PpoI recognizes and cleaves a semi-palindromic 15-bp target site with high specificity in vivo and in vitro. This target site is present in the 28S or equivalent large subunit rDNA genes of all eukaryotes. I-PpoI variants able to bind mutant target site DNA had from 1 to 8 amino acid substitutions in the DNA–protein interface. Biochemical characterization of these proteins revealed a wide range of site–binding affinities and site discrimination. One-third of variants were able to cleave target site DNA, but there was no systematic relationship between site-binding affinity and site cleavage. Computational modeling of several variants provided mechanistic insight into how amino acid substitutions that contact, or are adjacent to, specific target site DNA base pairs determine I-PpoI site-binding affinity and site discrimination, and may affect cleavage efficiency.
Collapse
Affiliation(s)
- Jennifer L. Eklund
- Department of Genome Sciences, Department of Pathology, the Molecular and Cellular Biology Program, University of Washington, Seattle, WA and Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Umut Y. Ulge
- Department of Genome Sciences, Department of Pathology, the Molecular and Cellular Biology Program, University of Washington, Seattle, WA and Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jennifer Eastberg
- Department of Genome Sciences, Department of Pathology, the Molecular and Cellular Biology Program, University of Washington, Seattle, WA and Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Raymond J. Monnat
- Department of Genome Sciences, Department of Pathology, the Molecular and Cellular Biology Program, University of Washington, Seattle, WA and Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- *To whom correspondence should be addressed. 206 616 7392206 543 3967
| |
Collapse
|
122
|
Bussemaker HJ, Foat BC, Ward LD. Predictive modeling of genome-wide mRNA expression: from modules to molecules. ACTA ACUST UNITED AC 2007; 36:329-47. [PMID: 17311525 DOI: 10.1146/annurev.biophys.36.040306.132725] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Various algorithms are available for predicting mRNA expression and modeling gene regulatory processes. They differ in whether they rely on the existence of modules of coregulated genes or build a model that applies to all genes, whether they represent regulatory activities as hidden variables or as mRNA levels, and whether they implicitly or explicitly model the complex cis-regulatory logic of multiple interacting transcription factors binding the same DNA. The fact that functional genomics data of different types reflect the same molecular processes provides a natural strategy for integrative computational analysis. One promising avenue toward an accurate and comprehensive model of gene regulation combines biophysical modeling of the interactions among proteins, DNA, and RNA with the use of large-scale functional genomics data to estimate regulatory network connectivity and activity parameters. As the ability of these models to represent complex cis-regulatory logic increases, the need for approaches based on cross-species conservation may diminish.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA.
| | | | | |
Collapse
|
123
|
Lippow SM, Tidor B. Progress in computational protein design. Curr Opin Biotechnol 2007; 18:305-11. [PMID: 17644370 PMCID: PMC3495006 DOI: 10.1016/j.copbio.2007.04.009] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Accepted: 04/17/2007] [Indexed: 11/25/2022]
Abstract
Current progress in computational structure-based protein design is reviewed in the areas of methodology and applications. Foundational advances include new potential functions, more efficient ways of computing energetics, flexible treatments of solvent, and useful energy function approximations, as well as ensemble-based approaches to scoring designs for inclusion of entropic effects, improvements to guaranteed and to stochastic search techniques, and methods to design combinatorial libraries for screening and selection. Applications include new approaches and successes in the design of specificity for protein folding, binding, and catalysis, in the redesign of proteins for enhanced binding affinity, and in the application of design technology to study and alter enzyme catalysis. Computational protein design continues to mature and advance.
Collapse
Affiliation(s)
- Shaun M Lippow
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | |
Collapse
|
124
|
Marabotti A, Colonna G, Facchiano A. New computational strategy to analyze the interactions of ERalpha and ERbeta with different ERE sequences. J Comput Chem 2007; 28:1031-41. [PMID: 17269124 DOI: 10.1002/jcc.20582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The importance of computational methods for the simulation and analysis of biological systems has increased during the last years. In particular, methods to predict binding energies are developing not only with the aim of ranking the affinities between two or more complexes, but also to quantify the contribution of different types of interaction. In this work, we present the application of HINT, a non Newtonian force field, to rank the affinities of complexes formed by estrogen receptors (ER) alpha and beta and different estrogen responsive elements (ERE) near the estrogen-regulated genes. We used the crystallographic coordinates of the DNA binding domain of ERalpha complexed to a consensus ERE as a starting point to simulate several complexes in which some nucleotides in the ERE sequence were mutated. Moreover, we used homology modeling methods to create the structure of the complexes between the DNA binding domain of ERbeta (for which no experimental structures are currently available) and the same ERE sequences. Our results show that HINT is able to rank the affinities of ERalpha and ERbeta for different ERE sequences, and to correctly identify the positions on the DNA sequence that are most important for binding affinity. Moreover, the HINT output gives us the opportunity to identify and quantify the role played by each single atom of amino acids and nucleotides in the binding event, as well as to predict the effect on the binding affinity for other nucleotide mutations.
Collapse
Affiliation(s)
- Anna Marabotti
- Laboratory of Bioinformatics and Computational Biology, Institute of Food Science, National Research Council, Avellino, Italy.
| | | | | |
Collapse
|
125
|
Teif VB. General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator of phage lambda. Nucleic Acids Res 2007; 35:e80. [PMID: 17526526 PMCID: PMC1920246 DOI: 10.1093/nar/gkm268] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2007] [Revised: 04/09/2007] [Accepted: 04/09/2007] [Indexed: 11/24/2022] Open
Abstract
The transfer matrix methodology is proposed as a systematic tool for the statistical-mechanical description of DNA-protein-drug binding involved in gene regulation. We show that a genetic system of several cis-regulatory modules is calculable using this method, considering explicitly the site-overlapping, competitive, cooperative binding of regulatory proteins, their multilayer assembly and DNA looping. In the methodological section, the matrix models are solved for the basic types of short- and long-range interactions between DNA-bound proteins, drugs and nucleosomes. We apply the matrix method to gene regulation at the O(R) operator of phage lambda. The transfer matrix formalism allowed the description of the lambda-switch at a single-nucleotide resolution, taking into account the effects of a range of inter-protein distances. Our calculations confirm previously established roles of the contact CI-Cro-RNAP interactions. Concerning long-range interactions, we show that while the DNA loop between the O(R) and O(L) operators is important at the lysogenic CI concentrations, the interference between the adjacent promoters P(R) and P(RM) becomes more important at small CI concentrations. A large change in the expression pattern may arise in this regime due to anticooperative interactions between DNA-bound RNA polymerases. The applicability of the matrix method to more complex systems is discussed.
Collapse
Affiliation(s)
- Vladimir B Teif
- Institute of Bioorganic Chemistry, Belarus National Academy of Sciences, Street Kuprevich 5/2, 220141, Minsk, Belarus.
| |
Collapse
|
126
|
Morozov AV, Siggia ED. Connecting protein structure with predictions of regulatory sites. Proc Natl Acad Sci U S A 2007; 104:7068-73. [PMID: 17438293 PMCID: PMC1855371 DOI: 10.1073/pnas.0701356104] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A common task posed by microarray experiments is to infer the binding site preferences for a known transcription factor from a collection of genes that it regulates and to ascertain whether the factor acts alone or in a complex. The converse problem can also be posed: Given a collection of binding sites, can the regulatory factor or complex of factors be inferred? Both tasks are substantially facilitated by using relatively simple homology models for protein-DNA interactions, as well as the rapidly expanding protein structure database. For budding yeast, we are able to construct reliable structural models for 67 transcription factors and with them redetermine factor binding sites by using a Bayesian Gibbs sampling algorithm and an extensive protein localization data set. For 49 factors in common with a prior analysis of this data set (based largely on phylogenetic conservation), we find that half of the previously predicted binding motifs are in need of some revision. We also solve the inverse problem of ascertaining the factors from the binding sites by assigning a correct protein fold to 25 of the 49 cases from a previous study. Our approach is easily extended to other organisms, including higher eukaryotes. Our study highlights the utility of enlarging current structural genomics projects that exhaustively sample fold structure space to include all factors with significantly different DNA-binding specificities.
Collapse
Affiliation(s)
- Alexandre V Morozov
- Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA.
| | | |
Collapse
|
127
|
Abnizova I, Subhankulova T, Gilks WR. Recent computational approaches to understand gene regulation: mining gene regulation in silico. Curr Genomics 2007; 8:79-91. [PMID: 18660846 PMCID: PMC2435357 DOI: 10.2174/138920207780368150] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2006] [Revised: 12/13/2006] [Accepted: 12/15/2006] [Indexed: 01/03/2023] Open
Abstract
This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.
Collapse
Affiliation(s)
| | - T Subhankulova
- Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK
| | | |
Collapse
|
128
|
Siggers TW, Honig B. Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry. Nucleic Acids Res 2007; 35:1085-97. [PMID: 17264128 PMCID: PMC1851644 DOI: 10.1093/nar/gkl1155] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Predicting the binding specificity of transcription factors is a critical step in the characterization and computational identification and of cis-regulatory elements in genomic sequences. Here we use protein–DNA structures to predict binding specificity and consider the possibility of predicting position weight matrices (PWM) for an entire protein family based on the structures of just a few family members. A particular focus is the sensitivity of prediction accuracy to the docking geometry of the structure used. We investigate this issue with the goal of determining how similar two docking geometries must be for binding specificity predictions to be accurate. Docking similarity is quantified using our recently described interface alignment score (IAS). Using a molecular-mechanics force field, we predict high-affinity nucleotide sequences that bind to the second zinc-finger (ZF) domain from the Zif268 protein, using different C2H2 ZF domains as structural templates. We identify a strong relationship between IAS values and prediction accuracy, and define a range of IAS values for which accurate structure-based predictions of binding specificity is to be expected. The implication of our results for large-scale, structure-based prediction of PWMs is discussed.
Collapse
Affiliation(s)
| | - Barry Honig
- *To whom correspondence should be addressed. Tel: + 1 212 851 4651; Fax: + 1 212 8514 650;
| |
Collapse
|
129
|
Abstract
Protein–DNA interactions are vital for many processes in living cells, especially transcriptional regulation and DNA modification. To further our understanding of these important processes on the microscopic level, it is necessary that theoretical models describe the macromolecular interaction energetics accurately. While several methods have been proposed, there has not been a careful comparison of how well the different methods are able to predict biologically important quantities such as the correct DNA binding sequence, total binding free energy and free energy changes caused by DNA mutation. In addition to carrying out the comparison, we present two important theoretical models developed initially in protein folding that have not yet been tried on protein–DNA interactions. In the process, we find that the results of these knowledge-based potentials show a strong dependence on the interaction distance and the derivation method. Finally, we present a knowledge-based potential that gives comparable or superior results to the best of the other methods, including the molecular mechanics force field AMBER99.
Collapse
Affiliation(s)
- Jason E Donald
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford St. Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
130
|
Reddy TE, Shakhnovich BE, Roberts DS, Russek SJ, DeLisi C. Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABAA receptor subunit genes. Nucleic Acids Res 2007; 35:e20. [PMID: 17204484 PMCID: PMC1807961 DOI: 10.1093/nar/gkl1062] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Revised: 10/18/2006] [Accepted: 11/20/2006] [Indexed: 11/12/2022] Open
Abstract
Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A gamma-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system.
Collapse
Affiliation(s)
- Timothy E. Reddy
- Bioinformatics Program, Boston University24 Cummington Street, Boston, MA 02215, USA
| | - Boris E. Shakhnovich
- Bioinformatics Program, Boston University24 Cummington Street, Boston, MA 02215, USA
| | - Daniel S. Roberts
- Laboratory of Molecular Neurobiology, Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine715 Albany St., Boston, MA 02118, USA
- Program in BioMedical Neuroscience, Boston University44 Cummington Street, Boston, MA 02215, USA
| | - Shelley J. Russek
- Laboratory of Molecular Neurobiology, Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine715 Albany St., Boston, MA 02118, USA
| | - Charles DeLisi
- Bioinformatics Program, Boston University24 Cummington Street, Boston, MA 02215, USA
- Laboratory of Molecular Neurobiology, Department of Pharmacology and Experimental Therapeutics, Boston University School of Medicine715 Albany St., Boston, MA 02118, USA
- Biomedical Engineering, Boston University44 Cummington Street, Boston, MA 02215, USA
| |
Collapse
|
131
|
Liu LA, Bader JS. Decoding transcriptional regulatory interactions. PHYSICA D. NONLINEAR PHENOMENA 2006; 224:174-181. [PMID: 17364011 PMCID: PMC1827156 DOI: 10.1016/j.physd.2006.09.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Transcription factor proteins control the temporal and spatial expression of genes by binding specific regulatory elements, or motifs, in DNA. Mapping a transcription factor to its motif is an important step towards defining the structure of transcriptional regulatory networks and understanding their dynamics. The information to map a transcription factor to its DNA binding specificity is in principle contained in the protein sequence. Nevertheless, methods that map directly from protein sequence to target DNA sequence have been lacking, and generation of regulatory maps has required experimental data. Here we describe a purely computational method for predicting transcription factor binding. The method calculates the free energy of binding between a transcription factor and possible target DNA sequences using thermodynamic integration. Approximations of additivity (each DNA basepair contributes independently to the binding energy) and linear response (the DNA-protein and DNA-solvent couplings are linear in an effective reaction coordinate representing the basepair character at a specific position) make the computations feasible and can be verified by more detailed simulations. Results obtained for MAT-alpha2, a yeast homeodomain transcription factor, are in good agreement with known results. This method promises to provide a general, computationally feasible route from a genome sequence to a gene regulatory network.
Collapse
Affiliation(s)
| | - Joel S. Bader
- Email address: (L. Angela Liu and Joel S. Bader). URL:www.jhubiomed.org (L. Angela Liu and Joel S. Bader)
| |
Collapse
|
132
|
Robertson TA, Varani G. An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure. Proteins 2006; 66:359-74. [PMID: 17078093 DOI: 10.1002/prot.21162] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have developed an all-atom statistical potential function for the prediction of protein-DNA interactions from their structures, and show that this method outperforms similar, lower-resolution statistical potentials in a series of decoy discrimination experiments. The all-atom formalism appears to capture details of atomic interactions that are missed by the lower-resolution methods, with the majority of the discriminatory power arising from its description of short-range atomic contacts. We show that, on average, the method is able to identify 90% of near-native docking decoys within the best-scoring 10% of structures in a given decoy set, and it compares favorably with an optimized physical potential function in a test of structure-based identification of DNA binding-sequences. These results demonstrate that all-atom statistical functions specific to protein-DNA interactions can achieve great discriminatory power despite the limited size of the structural database. They also suggest that the statistical scores may soon be able to achieve accuracy on par with more complex, physical potential functions.
Collapse
Affiliation(s)
- Timothy A Robertson
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
133
|
Affiliation(s)
- Carl O Pabo
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, WAB 536, Boston, MA 02115, USA.
| |
Collapse
|
134
|
Aeling KA, Opel ML, Steffen NR, Tretyachenko-Ladokhina V, Hatfield GW, Lathrop RH, Senear DF. Indirect recognition in sequence-specific DNA binding by Escherichia coli integration host factor: the role of DNA deformation energy. J Biol Chem 2006; 281:39236-48. [PMID: 17035240 DOI: 10.1074/jbc.m606363200] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Integration host factor (IHF) is a bacterial histone-like protein whose primary biological role is to condense the bacterial nucleoid and to constrain DNA supercoils. It does so by binding in a sequence-independent manner throughout the genome. However, unlike other structurally related bacterial histone-like proteins, IHF has evolved a sequence-dependent, high affinity DNA-binding motif. The high affinity binding sites are important for the regulation of a wide range of cellular processes. A remarkable feature of IHF is that it employs an indirect readout mechanism to bind and wrap DNA at both the nonspecific and high affinity (sequence-dependent) DNA sites. In this study we assessed the contributions of pre-formed and protein-induced DNA conformations to the energetics of IHF binding. Binding energies determined experimentally were compared with energies predicted for the IHF-induced deformation of the DNA helix (DNA deformation energy) in the IHF-DNA complex. Combinatorial sets of de novo DNA sequences were designed to systematically evaluate the influence of sequence-dependent structural characteristics of the conserved IHF recognition elements of the consensus DNA sequence. We show that IHF recognizes pre-formed conformational characteristics of the consensus DNA sequence at high affinity sites, whereas at all other sites relative affinity is determined by the deformational energy required for nearest-neighbor base pairs to adopt the DNA structure of the bound DNA-IHF complex.
Collapse
Affiliation(s)
- Kimberly A Aeling
- Institute for Genomics and Bioinformatics, Department of Microbiology and Molecular Genetics, School of Medicine, University of California 92697, USA
| | | | | | | | | | | | | |
Collapse
|
135
|
Becker NB, Wolff L, Everaers R. Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials. Nucleic Acids Res 2006; 34:5638-49. [PMID: 17038333 PMCID: PMC1636474 DOI: 10.1093/nar/gkl683] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2006] [Revised: 09/05/2006] [Accepted: 09/06/2006] [Indexed: 01/23/2023] Open
Abstract
Essential biological processes require that proteins bind to a set of specific DNA sites with tuned relative affinities. We focus on the indirect readout mechanism and discuss its theoretical description in relation to the present understanding of DNA elasticity on the rigid base pair level. Combining existing parametrizations of elastic potentials for DNA, we derive elastic free energies directly related to competitive binding experiments, and propose a computationally inexpensive local marker for elastically optimized subsequences in protein-DNA co-crystals. We test our approach in an application to the bacteriophage 434 repressor. In agreement with known results we find that indirect readout dominates at the central, non-contacted bases of the binding site. Elastic optimization involves all deformation modes and is mainly due to the adapted equilibrium structure of the operator, while sequence-dependent elasticity plays a minor role. These qualitative observations are robust with respect to current parametrization uncertainties. Predictions for relative affinities mediated by indirect readout depend sensitively on the chosen parametrization. Their quantitative comparison with experimental data allows for a critical evaluation of DNA elastic potentials and of the correspondence between crystal and solution structures. The software written for the presented analysis is included as Supplementary Data.
Collapse
Affiliation(s)
- Nils B Becker
- Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Strasse 38, 01187 Dresden, Germany.
| | | | | |
Collapse
|
136
|
GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res 2006; 34:3585-98. [PMID: 16855295 PMCID: PMC1524905 DOI: 10.1093/nar/gkl372] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges.
Collapse
Affiliation(s)
- Debraj GuhaThakurta
- Research Genetics Division, Rosetta Inpharmatics LLC, Merck & Co., Inc, 401 Terry Avenue North, Seattle, WA 98109, USA.
| |
Collapse
|
137
|
Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ, Stoddard BL, Baker D. Computational redesign of endonuclease DNA binding and cleavage specificity. Nature 2006; 441:656-9. [PMID: 16738662 PMCID: PMC2999987 DOI: 10.1038/nature04818] [Citation(s) in RCA: 248] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2006] [Accepted: 04/21/2006] [Indexed: 11/09/2022]
Abstract
The reprogramming of DNA-binding specificity is an important challenge for computational protein design that tests current understanding of protein-DNA recognition, and has considerable practical relevance for biotechnology and medicine. Here we describe the computational redesign of the cleavage specificity of the intron-encoded homing endonuclease I-MsoI using a physically realistic atomic-level forcefield. Using an in silico screen, we identified single base-pair substitutions predicted to disrupt binding by the wild-type enzyme, and then optimized the identities and conformations of clusters of amino acids around each of these unfavourable substitutions using Monte Carlo sampling. A redesigned enzyme that was predicted to display altered target site specificity, while maintaining wild-type binding affinity, was experimentally characterized. The redesigned enzyme binds and cleaves the redesigned recognition site approximately 10,000 times more effectively than does the wild-type enzyme, with a level of target discrimination comparable to the original endonuclease. Determination of the structure of the redesigned nuclease-recognition site complex by X-ray crystallography confirms the accuracy of the computationally predicted interface. These results suggest that computational protein design methods can have an important role in the creation of novel highly specific endonucleases for gene therapy and other applications.
Collapse
Affiliation(s)
- Justin Ashworth
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA.
| | | | | | | | | | | | | |
Collapse
|
138
|
Zhan H, Swint-Kruse L, Matthews KS. Extrinsic interactions dominate helical propensity in coupled binding and folding of the lactose repressor protein hinge helix. Biochemistry 2006; 45:5896-906. [PMID: 16669632 PMCID: PMC2701349 DOI: 10.1021/bi052619p] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
A significant number of eukaryotic regulatory proteins are predicted to have disordered regions. Many of these proteins bind DNA, which may serve as a template for protein folding. Similar behavior is seen in the prokaryotic LacI/GalR family of proteins that couple hinge-helix folding with DNA binding. These hinge regions form short alpha-helices when bound to DNA but appear to be disordered in other states. An intriguing question is whether and to what degree intrinsic helix propensity contributes to the function of these proteins. In addition to its interaction with operator DNA, the LacI hinge helix interacts with the hinge helix of the homodimer partner as well as to the surface of the inducer-binding domain. To explore the hierarchy of these interactions, we made a series of substitutions in the LacI hinge helix at position 52, the only site in the helix that does not interact with DNA and/or the inducer-binding domain. The substitutions at V52 have significant effects on operator binding affinity and specificity, and several substitutions also impair functional communication with the inducer-binding domain. Results suggest that helical propensity of amino acids in the hinge region alone does not dominate function; helix-helix packing interactions appear to also contribute. Further, the data demonstrate that variation in operator sequence can overcome side chain effects on hinge-helix folding and/or hinge-hinge interactions. Thus, this system provides a direct example whereby an extrinsic interaction (DNA binding) guides internal events that influence folding and functionality.
Collapse
Affiliation(s)
- Hongli Zhan
- Department of Biochemistry and Cell Biology, MS 140, Rice University, Houston, TX 77005
- Department of Biochemistry and Molecular Biology, MS 3030, The University of Kansas Medical Center, Kansas City, KS 66160
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, MS 3030, The University of Kansas Medical Center, Kansas City, KS 66160
| | - Kathleen Shive Matthews
- Department of Biochemistry and Cell Biology, MS 140, Rice University, Houston, TX 77005
- W. M. Keck Center for Computational Biology, MS 140, Rice University, Houston, TX 77005
- To whom correspondence should be addressed. Telephone: 713−348−4871; Fax: 713−348−6149;
| |
Collapse
|
139
|
Endres RG, Wingreen NS. Weight matrices for protein-DNA binding sites from a single co-crystal structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 73:061921. [PMID: 16906878 DOI: 10.1103/physreve.73.061921] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Revised: 01/31/2006] [Indexed: 05/11/2023]
Abstract
Transcription-factor proteins bind to specific DNA sequences to regulate gene expression in cells. DNA-binding sites are often identified using weight matrices calculated from multiple known binding sites. However, in many cases the number of examples is limited. Here, we report on an atomistic method that starts from an x-ray co-crystal structure of the protein bound to one particular DNA sequence, and infers other binding sites, which are used to construct a weight matrix. The emphasis of the paper is on using the Wang-Landau Monte Carlo algorithm to efficiently sample high-affinity binding sites, which demonstrates that sampling can produce accurate weight matrices in analogy to bioinformatics approaches. For cases of low complexity, we compare to the exhaustive (but slow) dead-end elimination algorithm. To recover crystal binding sites, it is important to include bound water in the protein-DNA interface. Our approach can, in principle, even be applied when no native protein-DNA co-crystal structure is available, only the structure of a closely related homologous protein whose amino-acid sequence is changed to the protein of interest.
Collapse
Affiliation(s)
- Robert G Endres
- NEC Laboratories America, Inc., Princeton, New Jersey 08540, USA.
| | | |
Collapse
|