51
|
Dourado DFAR, Flores SC. A multiscale approach to predicting affinity changes in protein-protein interfaces. Proteins 2014; 82:2681-90. [PMID: 24975440 DOI: 10.1002/prot.24634] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Revised: 06/12/2014] [Accepted: 06/18/2014] [Indexed: 11/07/2022]
Abstract
Substitution mutations in protein-protein interfaces can have a substantial effect on binding, which has consequences in basic and applied biomedical research. Experimental expression, purification, and affinity determination of protein complexes is an expensive and time-consuming means of evaluating the effect of mutations, making a fast and accurate in silico method highly desirable. When the structure of the wild-type complex is known, it is possible to economically evaluate the effect of point mutations with knowledge based potentials, which do not model backbone flexibility, but these have been validated only for single mutants. Substitution mutations tend to induce local conformational rearrangements only. Accordingly, ZEMu (Zone Equilibration of Mutants) flexibilizes only a small region around the site of mutation, then computes its dynamics under a physics-based force field. We validate with 1254 experimental mutants (with 1-15 simultaneous substitutions) in a wide variety of different protein environments (65 protein complexes), and obtain a significant improvement in the accuracy of predicted ΔΔG.
Collapse
Affiliation(s)
- Daniel F A R Dourado
- Department of Cell and Molecular Biology, Computational and Systems Biology, Uppsala University, 751 24, Uppsala, Sweden
| | | |
Collapse
|
52
|
Abstract
By focusing on essential features, while averaging over less important details, coarse-grained (CG) models provide significant computational and conceptual advantages with respect to more detailed models. Consequently, despite dramatic advances in computational methodologies and resources, CG models enjoy surging popularity and are becoming increasingly equal partners to atomically detailed models. This perspective surveys the rapidly developing landscape of CG models for biomolecular systems. In particular, this review seeks to provide a balanced, coherent, and unified presentation of several distinct approaches for developing CG models, including top-down, network-based, native-centric, knowledge-based, and bottom-up modeling strategies. The review summarizes their basic philosophies, theoretical foundations, typical applications, and recent developments. Additionally, the review identifies fundamental inter-relationships among the diverse approaches and discusses outstanding challenges in the field. When carefully applied and assessed, current CG models provide highly efficient means for investigating the biological consequences of basic physicochemical principles. Moreover, rigorous bottom-up approaches hold great promise for further improving the accuracy and scope of CG models for biomolecular systems.
Collapse
Affiliation(s)
- W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
53
|
Protein thermostability prediction within homologous families using temperature-dependent statistical potentials. PLoS One 2014; 9:e91659. [PMID: 24646884 PMCID: PMC3960129 DOI: 10.1371/journal.pone.0091659] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2014] [Accepted: 02/12/2014] [Indexed: 11/28/2022] Open
Abstract
The ability to rationally modify targeted physical and biological features of a protein of interest holds promise in numerous academic and industrial applications and paves the way towards de novo protein design. In particular, bioprocesses that utilize the remarkable properties of enzymes would often benefit from mutants that remain active at temperatures that are either higher or lower than the physiological temperature, while maintaining the biological activity. Many in silico methods have been developed in recent years for predicting the thermodynamic stability of mutant proteins, but very few have focused on thermostability. To bridge this gap, we developed an algorithm for predicting the best descriptor of thermostability, namely the melting temperature , from the protein's sequence and structure. Our method is applicable when the of proteins homologous to the target protein are known. It is based on the design of several temperature-dependent statistical potentials, derived from datasets consisting of either mesostable or thermostable proteins. Linear combinations of these potentials have been shown to yield an estimation of the protein folding free energies at low and high temperatures, and the difference of these energies, a prediction of the melting temperature. This particular construction, that distinguishes between the interactions that contribute more than others to the stability at high temperatures and those that are more stabilizing at low , gives better performances compared to the standard approach based on -independent potentials which predict the thermal resistance from the thermodynamic stability. Our method has been tested on 45 proteins of known that belong to 11 homologous families. The standard deviation between experimental and predicted 's is equal to 13.6°C in cross validation, and decreases to 8.3°C if the 6 worst predicted proteins are excluded. Possible extensions of our approach are discussed.
Collapse
|
54
|
Li M, Petukh M, Alexov E, Panchenko AR. Predicting the Impact of Missense Mutations on Protein-Protein Binding Affinity. J Chem Theory Comput 2014; 10:1770-1780. [PMID: 24803870 PMCID: PMC3985714 DOI: 10.1021/ct401022c] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Indexed: 01/22/2023]
Abstract
The crucial prerequisite for proper biological function is the protein's ability to establish highly selective interactions with macromolecular partners. A missense mutation that alters the protein binding affinity may cause significant perturbations or complete abolishment of the function, potentially leading to diseases. The availability of computational methods to evaluate the impact of mutations on protein-protein binding is critical for a wide range of biomedical applications. Here, we report an efficient computational approach for predicting the effect of single and multiple missense mutations on protein-protein binding affinity. It is based on a well-tested simulation protocol for structure minimization, modified MM-PBSA and statistical scoring energy functions with parameters optimized on experimental sets of several thousands of mutations. Our simulation protocol yields very good agreement between predicted and experimental values with Pearson correlation coefficients of 0.69 and 0.63 and root-mean-square errors of 1.20 and 1.90 kcal mol-1 for single and multiple mutations, respectively. Compared with other available methods, our approach achieves high speed and prediction accuracy and can be applied to large datasets generated by modern genomics initiatives. In addition, we report a crucial role of water model and the polar solvation energy in estimating the changes in binding affinity. Our analysis also reveals that prediction accuracy and effect of mutations on binding strongly depends on the type of mutation and its location in a protein complex.
Collapse
Affiliation(s)
- Minghui Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, Maryland 20894, United States
| | - Marharyta Petukh
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University , Clemson, South Carolina 29634, United States
| | - Emil Alexov
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University , Clemson, South Carolina 29634, United States
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, Maryland 20894, United States
| |
Collapse
|
55
|
Shen H, Li Y, Ren P, Zhang D, Li G. An Anisotropic Coarse-Grained Model for Proteins Based On Gay-Berne and Electric Multipole Potentials. J Chem Theory Comput 2014; 10:731-750. [PMID: 24659927 PMCID: PMC3958967 DOI: 10.1021/ct400974z] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
![]()
Gay–Berne
anisotropic potential has been widely used to
evaluate the nonbonded interactions between coarse-grained particles
being described as elliptical rigid bodies. In this paper, we are
presenting a coarse-grained model for twenty kinds of amino acids
and proteins, based on the anisotropic Gay–Berne and point
electric multipole (EMP) potentials. We demonstrate that the anisotropic
coarse-grained model, namely GBEMP model, is able to reproduce many
key features observed from experimental protein structures (Dunbrack
Library), as well as from atomistic force field simulations (using
AMOEBA, AMBER, and CHARMM force fields), while saving the computational
cost by a factor of about 10–200 depending on specific cases
and atomistic models. More importantly, unlike other coarse-grained
approaches, our framework is based on the fundamental intermolecular
forces with explicit treatment of electrostatic and repulsion-dispersion
forces. As a result, the coarse-grained protein model presented an
accurate description of nonbonded interactions (particularly electrostatic
component) between hetero/homodimers (such as peptide–peptide,
peptide–water). In addition, the encouraging performance of
the model was reflected by the excellent correlation between GBEMP
and AMOEBA models in the calculations of the dipole moment of peptides.
In brief, the GBEMP model given here is general and transferable,
suitable for simulating complex biomolecular systems.
Collapse
Affiliation(s)
- Hujun Shen
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd. Dalian 116023, PR China
| | - Yan Li
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd. Dalian 116023, PR China
| | - Pengyu Ren
- Department of Biomedical Engineering, The University of Texas at Austin, Austin, TX 78712, USA
| | - Dinglin Zhang
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd. Dalian 116023, PR China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd. Dalian 116023, PR China
| |
Collapse
|
56
|
Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. Optimized atomic statistical potentials: assessment of protein interfaces and loops. Bioinformatics 2013; 29:3158-66. [PMID: 24078704 PMCID: PMC3842762 DOI: 10.1093/bioinformatics/btt560] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 08/13/2013] [Accepted: 09/22/2013] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state. RESULTS We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven 'recovery' functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein-protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures. AVAILABILITY AND IMPLEMENTATION SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Collapse
Affiliation(s)
- Guang Qiang Dong
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry and California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
57
|
Moretti R, Fleishman SJ, Agius R, Torchala M, Bates PA, Kastritis PL, Rodrigues JPGLM, Trellet M, Bonvin AMJJ, Cui M, Rooman M, Gillis D, Dehouck Y, Moal I, Romero-Durana M, Perez-Cano L, Pallara C, Jimenez B, Fernandez-Recio J, Flores S, Pacella M, Kilambi KP, Gray JJ, Popov P, Grudinin S, Esquivel-Rodríguez J, Kihara D, Zhao N, Korkin D, Zhu X, Demerdash ONA, Mitchell JC, Kanamori E, Tsuchiya Y, Nakamura H, Lee H, Park H, Seok C, Sarmiento J, Liang S, Teraguchi S, Standley DM, Shimoyama H, Terashi G, Takeda-Shitaka M, Iwadate M, Umeyama H, Beglov D, Hall DR, Kozakov D, Vajda S, Pierce BG, Hwang H, Vreven T, Weng Z, Huang Y, Li H, Yang X, Ji X, Liu S, Xiao Y, Zacharias M, Qin S, Zhou HX, Huang SY, Zou X, Velankar S, Janin J, Wodak SJ, Baker D. Community-wide evaluation of methods for predicting the effect of mutations on protein-protein interactions. Proteins 2013; 81:1980-7. [PMID: 23843247 PMCID: PMC4143140 DOI: 10.1002/prot.24356] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Revised: 06/13/2013] [Accepted: 06/18/2013] [Indexed: 12/25/2022]
Abstract
Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side-chain sampling and backbone relaxation, evaluated packing, electrostatic, and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of both existing and new prediction methodologies.
Collapse
Affiliation(s)
- Rocco Moretti
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | - Sarel J. Fleishman
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Rudi Agius
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, WC2A 3LY, UK
| | - Mieczyslaw Torchala
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, WC2A 3LY, UK
| | - Paul A. Bates
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, WC2A 3LY, UK
| | - Panagiotis L. Kastritis
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CG, Utrecht, the Netherlands
| | - João P. G. L. M. Rodrigues
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CG, Utrecht, the Netherlands
| | - Mikaël Trellet
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CG, Utrecht, the Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CG, Utrecht, the Netherlands
| | - Meng Cui
- Department of Physiology and Biophysics, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Marianne Rooman
- Department of BioModelling, BioInformatics and BioProcesses, Université Libre de Bruxelles (ULB), 1050 Brussels, Belgium
| | - Dimitri Gillis
- Department of BioModelling, BioInformatics and BioProcesses, Université Libre de Bruxelles (ULB), 1050 Brussels, Belgium
| | - Yves Dehouck
- Department of BioModelling, BioInformatics and BioProcesses, Université Libre de Bruxelles (ULB), 1050 Brussels, Belgium
| | - Iain Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Miguel Romero-Durana
- Joint BSC-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Laura Perez-Cano
- Joint BSC-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Chiara Pallara
- Joint BSC-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Brian Jimenez
- Joint BSC-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Juan Fernandez-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Sciences Department, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Samuel Flores
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, 75124, Sweden
| | - Michael Pacella
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Krishna Praneeth Kilambi
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Petr Popov
- NANO-D, INRIA Grenoble-Rhone-Alpes Research Center, 38334 Saint Ismier Cedex, Montbonnot, France; CNRS, Laboratoire Jean Kuntzmann, BP 53, Grenoble Cedex 9, France
| | - Sergei Grudinin
- NANO-D, INRIA Grenoble-Rhone-Alpes Research Center, 38334 Saint Ismier Cedex, Montbonnot, France; CNRS, Laboratoire Jean Kuntzmann, BP 53, Grenoble Cedex 9, France
| | | | - Daisuke Kihara
- Department of Computer Science, Purdue University ,West Lafayette, IN 47907, USA
- Department of Biological Sciences, Purdue University ,West Lafayette, IN 47907, USA
| | - Nan Zhao
- Informatics Institute and Department of Computer Science, University of Missouri-Columbia, MO 65211, USA
| | - Dmitry Korkin
- Informatics Institute and Department of Computer Science, University of Missouri-Columbia, MO 65211, USA
| | - Xiaolei Zhu
- Departments of Mathematics and Biochemistry, University of Wisconsin, Madison, WI 53706, USA
| | - Omar N. A. Demerdash
- Departments of Mathematics and Biochemistry, University of Wisconsin, Madison, WI 53706, USA
| | - Julie C. Mitchell
- Departments of Mathematics and Biochemistry, University of Wisconsin, Madison, WI 53706, USA
| | - Eiji Kanamori
- Japan Biological Informatics Consortium, Tokyo, Japan
| | - Yuko Tsuchiya
- Division of Life Sciences, Graduate School of Humanities and Sciences, Ochanomizu University, Tokyo, Japan
| | - Haruki Nakamura
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Hasup Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Jamica Sarmiento
- Systems Immunology Lab, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Shide Liang
- Systems Immunology Lab, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Shusuke Teraguchi
- Systems Immunology Lab, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Daron M. Standley
- Systems Immunology Lab, WPI Immunology Frontier Research Center (IFReC), Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan
| | | | | | | | - Mitsuo Iwadate
- Department of Biological Sciences, Faculty of Science and Engineering, Chuo University
| | - Hideaki Umeyama
- Department of Biological Sciences, Faculty of Science and Engineering, Chuo University
| | - Dmitri Beglov
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - David R. Hall
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Dima Kozakov
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Brian G. Pierce
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Howook Hwang
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Thom Vreven
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Yangyu Huang
- Huazhong University of Science and Technology, China
| | - Haotian Li
- Huazhong University of Science and Technology, China
| | - Xiufeng Yang
- Huazhong University of Science and Technology, China
| | - Xiaofeng Ji
- Huazhong University of Science and Technology, China
| | - Shiyong Liu
- Huazhong University of Science and Technology, China
| | - Yi Xiao
- Huazhong University of Science and Technology, China
| | - Martin Zacharias
- Physics Department, Technical University Munich, 85748 Garching, Germany
| | - Sanbo Qin
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, FL 32306, USA
| | - Huan-Xiang Zhou
- Department of Physics and Institute of Molecular Biophysics, Florida State University, Tallahassee, FL 32306, USA
| | - Sheng-You Huang
- Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, Informatics Institute; University of Missouri-Columbia; Columbia, MO 65211, USA
| | - Xiaoqin Zou
- Department of Physics and Astronomy, Department of Biochemistry, Dalton Cardiovascular Research Center, Informatics Institute; University of Missouri-Columbia; Columbia, MO 65211, USA
| | - Sameer Velankar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Joël Janin
- IBBMC, Université Paris-Sud, 91405-Orsay, France
| | - Shoshana J. Wodak
- Department of Biochemistry, University of Toronto, Ontario, Canada M5S 1A8
- Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5K 1X8, Canada
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
58
|
Dehouck Y, Mikhailov AS. Effective harmonic potentials: insights into the internal cooperativity and sequence-specificity of protein dynamics. PLoS Comput Biol 2013; 9:e1003209. [PMID: 24009495 PMCID: PMC3757084 DOI: 10.1371/journal.pcbi.1003209] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 07/19/2013] [Indexed: 11/18/2022] Open
Abstract
The proper biological functioning of proteins often relies on the occurrence of coordinated fluctuations around their native structure, or on their ability to perform wider and sometimes highly elaborated motions. Hence, there is considerable interest in the definition of accurate coarse-grained descriptions of protein dynamics, as an alternative to more computationally expensive approaches. In particular, the elastic network model, in which residue motions are subjected to pairwise harmonic potentials, is known to capture essential aspects of conformational dynamics in proteins, but has so far remained mostly phenomenological, and unable to account for the chemical specificities of amino acids. We propose, for the first time, a method to derive residue- and distance-specific effective harmonic potentials from the statistical analysis of an extensive dataset of NMR conformational ensembles. These potentials constitute dynamical counterparts to the mean-force statistical potentials commonly used for static analyses of protein structures. In the context of the elastic network model, they yield a strongly improved description of the cooperative aspects of residue motions, and give the opportunity to systematically explore the influence of sequence details on protein dynamics.
Collapse
Affiliation(s)
- Yves Dehouck
- Department of Physical Chemistry, Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin, Germany.
| | | |
Collapse
|
59
|
Dehouck Y, Kwasigroch JM, Rooman M, Gilis D. BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations. Nucleic Acids Res 2013; 41:W333-9. [PMID: 23723246 PMCID: PMC3692068 DOI: 10.1093/nar/gkt450] [Citation(s) in RCA: 233] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The ability of proteins to establish highly selective interactions with a variety of (macro)molecular partners is a crucial prerequisite to the realization of their biological functions. The availability of computational tools to evaluate the impact of mutations on protein–protein binding can therefore be valuable in a wide range of industrial and biomedical applications, and help rationalize the consequences of non-synonymous single-nucleotide polymorphisms. BeAtMuSiC (http://babylone.ulb.ac.be/beatmusic) is a coarse-grained predictor of the changes in binding free energy induced by point mutations. It relies on a set of statistical potentials derived from known protein structures, and combines the effect of the mutation on the strength of the interactions at the interface, and on the overall stability of the complex. The BeAtMuSiC server requires as input the structure of the protein–protein complex, and gives the possibility to assess rapidly all possible mutations in a protein chain or at the interface, with predictive performances that are in line with the best current methodologies.
Collapse
Affiliation(s)
- Yves Dehouck
- Department of BioModelling, BioInformatics and BioProcesses, Université Libre de Bruxelles, CP165/61, Av. Fr. Roosevelt 50, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
60
|
Structure-based mutant stability predictions on proteins of unknown structure. J Biotechnol 2012; 161:287-93. [DOI: 10.1016/j.jbiotec.2012.06.020] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Revised: 06/19/2012] [Accepted: 06/22/2012] [Indexed: 11/23/2022]
|
61
|
Lu WW, Huang RB, Wei YT, Meng JZ, Du LQ, Du QS. Statistical energy potential: reduced representation of Dehouck–Gilis–Rooman function by selecting against decoy datasets. Amino Acids 2012; 42:2353-61. [DOI: 10.1007/s00726-011-0977-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Accepted: 07/06/2011] [Indexed: 11/24/2022]
|
62
|
Abstract
UNLABELLED Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. AVAILABILITY Software is available under the Lesser GNU Public License v3. Contact the authors for source code.
Collapse
|
63
|
Shen H, Xia Z, Li G, Ren P. A Review of Physics-Based Coarse-Grained Potentials for the Simulations of Protein Structure and Dynamics. ANNUAL REPORTS IN COMPUTATIONAL CHEMISTRY VOLUME 8 2012. [DOI: 10.1016/b978-0-444-59440-2.00005-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
64
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
65
|
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 2011; 12:151. [PMID: 21569468 PMCID: PMC3113940 DOI: 10.1186/1471-2105-12-151] [Citation(s) in RCA: 367] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/13/2011] [Indexed: 12/31/2022] Open
Abstract
Background The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. Results PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. Conclusion The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av, Fr, Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
66
|
Morin A, Meiler J, Mizoue LS. Computational design of protein-ligand interfaces: potential in therapeutic development. Trends Biotechnol 2011; 29:159-66. [PMID: 21295366 DOI: 10.1016/j.tibtech.2011.01.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2010] [Revised: 12/22/2010] [Accepted: 01/05/2011] [Indexed: 01/16/2023]
Abstract
Computational design of protein-ligand interfaces finds optimal amino acid sequences within a small-molecule binding site of a protein for tight binding of a specific small molecule. It requires a search algorithm that can rapidly sample the vast sequence and conformational space, and a scoring function that can identify low energy designs. This review focuses on recent advances in computational design methods and their application to protein-small molecule binding sites. Strategies for increasing affinity, altering specificity, creating broad-spectrum binding, and building novel enzymes from scratch are described. Future prospects for applications in drug development are discussed, including limitations that will need to be overcome to achieve computational design of protein therapeutics with novel modes of action.
Collapse
Affiliation(s)
- Andrew Morin
- Departments of Chemistry, Pharmacology, and Biomedical Informatics, Vanderbilt University, 7330 Stevenson Center, Station B 351822, Nashville, TN 37235, USA
| | | | | |
Collapse
|
67
|
Sodt AJ, Head-Gordon T. Driving forces for transmembrane alpha-helix oligomerization. Biophys J 2010; 99:227-37. [PMID: 20655851 DOI: 10.1016/j.bpj.2010.03.071] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2009] [Revised: 03/24/2010] [Accepted: 03/29/2010] [Indexed: 11/25/2022] Open
Abstract
We present what we believe to be a novel statistical contact potential based on solved structures of transmembrane (TM) alpha-helical bundles, and we use this contact potential to investigate the amino acid likelihood of stabilizing helix-helix interfaces. To increase statistical significance, we have reduced the full contact energy matrix to a four-flavor alphabet of amino acids, automatically determined by our methodology, in which we find that polarity is a more dominant factor of group identity than is size, with charged or polar groups most often occupying the same face, whereas polar/apolar residue pairs tend to occupy opposite faces. We found that the most polar residues strongly influence interhelical contact formation, although they occur rarely in TM helical bundles. Two-body contact energies in the reduced letter code are capable of determining native structure from a large decoy set for a majority of test TM proteins, at the same time illustrating that certain higher-order sequence correlations are necessary for more accurate structure predictions.
Collapse
Affiliation(s)
- Alex J Sodt
- Department of Bioengineering, University of California, Berkeley, California, USA.
| | | |
Collapse
|
68
|
Abstract
We extend PRIME, an intermediate-resolution protein model previously used in simulations of the aggregation of polyalanine and polyglutamine, to the description of the geometry and energetics of peptides containing all 20 amino acid residues. The 20 amino acid side chains are classified into 14 groups according to their hydrophobicity, polarity, size, charge, and potential for side chain hydrogen bonding. The parameters for extended PRIME, called PRIME 20, include hydrogen-bonding energies, side chain interaction range and energy, and excluded volume. The parameters are obtained by applying a perceptron-learning algorithm and a modified stochastic learning algorithm that optimizes the energy gap between 711 known native states from the PDB and decoy structures generated by gapless threading. The number of independent pair interaction parameters is chosen to be small enough to be physically meaningful yet large enough to give reasonably accurate results in discriminating decoys from native structures. The most physically meaningful results are obtained with 19 energy parameters.
Collapse
Affiliation(s)
- Mookyung Cheon
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA
| | | | | |
Collapse
|
69
|
Abstract
Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat many-body correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.
Collapse
|
70
|
Klenin K, Strodel B, Wales DJ, Wenzel W. Modelling proteins: conformational sampling and reconstruction of folding kinetics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1814:977-1000. [PMID: 20851219 DOI: 10.1016/j.bbapap.2010.09.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Revised: 09/03/2010] [Accepted: 09/05/2010] [Indexed: 01/08/2023]
Abstract
In the last decades biomolecular simulation has made tremendous inroads to help elucidate biomolecular processes in-silico. Despite enormous advances in molecular dynamics techniques and the available computational power, many problems involve long time scales and large-scale molecular rearrangements that are still difficult to sample adequately. In this review we therefore summarise recent efforts to fundamentally improve this situation by decoupling the sampling of the energy landscape from the description of the kinetics of the process. Recent years have seen the emergence of many advanced sampling techniques, which permit efficient characterisation of the relevant family of molecular conformations by dispensing with the details of the short-term kinetics of the process. Because these methods generate thermodynamic information at best, they must be complemented by techniques to reconstruct the kinetics of the process using the ensemble of relevant conformations. Here we review recent advances for both types of methods and discuss their perspectives to permit efficient and accurate modelling of large-scale conformational changes in biomolecules. This article is part of a Special Issue entitled: Protein Dynamics: Experimental and Computational Approaches.
Collapse
Affiliation(s)
- Konstantin Klenin
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, P.O. Box 3640, D-76021 Karlsruhe, Germany
| | | | | | | |
Collapse
|
71
|
Thermo- and mesostabilizing protein interactions identified by temperature-dependent statistical potentials. Biophys J 2010; 98:667-77. [PMID: 20159163 DOI: 10.1016/j.bpj.2009.10.050] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2009] [Revised: 10/26/2009] [Accepted: 10/28/2009] [Indexed: 11/24/2022] Open
Abstract
The goal of controlling protein thermostability is tackled here through establishing, by in silico analyses, the relative weight of residue-residue interactions in proteins as a function of temperature. We have designed for that purpose a (melting-) temperature-dependent, statistical distance potential, where the interresidue distances are computed between the side-chain geometric centers or their functional centers. Their separate derivation from proteins of either high or low thermal resistance reveals the interactions that contribute most to stability in different temperature ranges. Thermostabilizing interactions include salt bridges and cation-pi interactions (especially those involving arginine), aromatic interactions, and H-bonds between negatively charged and some aromatic residues. In contrast, H-bonds between two polar noncharged residues or between a polar noncharged residue and a negatively charged residue are relatively less stabilizing at high temperatures. An important observation is that it is necessary to consider both repulsive and attractive interactions in overall thermostabilization, as the degree of repulsion may also vary with temperature. These temperature-dependent potentials are not only useful for the identification of meso- and thermostabilizing pair interactions, but also exhibit predictive power, as illustrated by their ability to predict the melting temperature of a protein based on the melting temperature of homologous proteins.
Collapse
|
72
|
Rata IA, Li Y, Jakobsson E. Backbone statistical potential from local sequence-structure interactions in protein loops. J Phys Chem B 2010; 114:1859-69. [PMID: 20070091 DOI: 10.1021/jp909874g] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Native proteins have been optimized by evolution simultaneously for structure and sequence. Structural databases reflect this interdependency. In this paper, we present a new statistical potential for a reduced backbone representation that has both structure and sequence characteristics as variables. We use information from structural data available in the Protein Coil Library, selected on the basis of resolution and refinement factor. In these structures, the nonlocal interactions are randomly distributed and, thus, average out in statistics, so structural propensities due to local backbone-based interactions can be studied separately. We collect data in the form of local sequence-specific phi-psi backbone dihedral pairs. From these data, we construct dihedral probability density functions (DPDFs) that quantify any adjacent phi-psi pair distribution in the context of all possible combinations of local residue types. We use a probabilistic analysis to deduce how the correlations encoded in the various DPDFs as well as in residue frequencies propagate along the sequence and can be cumulated in a statistical potential capable of efficiently scoring a loop by its backbone conformation and sequence only. Our potential is able to identify with high accuracy the native structure of a loop with a given sequence among possible alternative conformations from sets of well-constructed decoys. Conversely, the potential can also be used for sequence prediction problems and is shown to score the native sequence of a given loop structure among the most fit of the possible sequence combinations. Applications for both structure prediction and sequence design are discussed.
Collapse
Affiliation(s)
- Ionel A Rata
- Department of Molecular and Integrative Physiology, UIUC Program in Biophysics, National Center for Supercomputing Applications, and Beckman Institute, University of Illinois, Urbana, Illinois 61801, USA.
| | | | | |
Collapse
|
73
|
Májek P, Elber R. A coarse-grained potential for fold recognition and molecular dynamics simulations of proteins. Proteins 2009; 76:822-36. [PMID: 19291741 DOI: 10.1002/prot.22388] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A coarse-grained potential for protein simulations and fold ranking is presented. The potential is based on a two-point model of individual amino acids and a specific implementation of hydrogen bonding. Parameters are determined for distance dependent pair interactions, pseudo bonds, angles, and torsions. A scaling factor for a hydrogen bonding term is also determined. Iterative sampling for 4867 proteins reproduces distributions of internal coordinates and distances observed in the Protein Data Bank. The adjustment of the potential and resampling are in the spirit of the generalized ensemble approach. No native structure information (e.g., secondary structure) is used in the calculation of the potential or in the simulation of a particular protein. The potential is subject to two tests as follows: (i) simulations of 956 globular proteins in the neighborhood of their native folds (these proteins were not used in the training set) and (ii) discrimination between native and decoy structures for 2470 proteins with 305,000 decoys and the "Decoys 'R' Us" dataset. In the first test, 58% of tested proteins stay within 5 A from the native fold in Molecular Dynamics simulations of more than 20 nanoseconds using the new potential. The potential is also useful in differentiating between correct and approximate folds providing significant signal for structure prediction algorithms. Sampling with the potential consistently regenerates the distribution of distances and internal coordinates it learned. Nevertheless, during Molecular Dynamics simulations structures are found that reproduce the learned distributions but are far from the native fold.
Collapse
Affiliation(s)
- Peter Májek
- Department of Computer Science, Upson Hall 4130, Cornell University, Ithaca, New York 14853-7501, USA
| | | |
Collapse
|
74
|
Ma J. Explicit orientation dependence in empirical potentials and its significance to side-chain modeling. Acc Chem Res 2009; 42:1087-96. [PMID: 19445451 DOI: 10.1021/ar900009e] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein structure modeling and prediction have important applications throughout the biological sciences, from the design of pharmaceuticals to the elucidation of enzyme mechanisms. At the core of most protein modeling is an energy function, the minimum of which represents the free energy "cost" for forming a correct protein structure. The most commonly used energy functions are knowledge-based statistical potential functions; that is, they are empirically derived from statistical analysis of a set of high-resolution protein structures. When that kind of potential function is constructed, the anisotropic orientation dependence between the interacting groups is a critical component for accurately representing key molecular interactions, such as those involved in protein side-chain packing. In the literature, however, many potential functions are limited in their ability to describe orientation dependence. In all-atom potentials, they typically ignore heterogeneous chemical-bond connectivity. In coarse-grained potentials, such as (semi)-residue-based potentials, the simplified representation of residues often reduces the sensitivity of the potential to side-chain orientation. Recently, in an effort to maximally capture the orientation dependence in side-chain interactions, a new type of all-atom statistical potential was developed: OPUS-PSP (potential derived from side-chain packing). The key feature of this potential is its explicit description of orientation dependence in molecular interactions, which is achieved with a basis set of 19 rigid-body blocks extracted from the chemical structures of 20 amino acid residues. This basis set is specifically designed to maximally capture the essential elements of orientation dependence in molecular packing interactions. The potential is constructed from the orientation-specific packing statistics of pairs of those blocks in a nonredundant structural database. On decoy set tests, OPUS-PSP significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and its consistency in achieving high Z scores across decoy sets. The application of OPUS-PSP to conformational modeling of side chains has led to another method, called OPUS-Rota. In terms of combined speed and accuracy, OPUS-Rota outperforms all of the other methods in modeling side-chain conformation. In this Account, we briefly outline the basic scheme of the OPUS-PSP potential and its application to side-chain modeling via OPUS-Rota. Future perspectives on the modeling of orientation dependence are also discussed. The computer programs for OPUS-PSP and OPUS-Rota can be downloaded at http://sigler.bioch.bcm.tmc.edu/MaLab . They are free for academic users.
Collapse
Affiliation(s)
- Jianpeng Ma
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, and Department of Bioengineering, Rice University, Houston, Texas 77005
| |
Collapse
|
75
|
Dehouck Y, Grosfils A, Folch B, Gilis D, Bogaerts P, Rooman M. Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. ACTA ACUST UNITED AC 2009; 25:2537-43. [PMID: 19654118 DOI: 10.1093/bioinformatics/btp445] [Citation(s) in RCA: 294] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge number of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting from all possible mutations in a protein. RESULTS We exploit newly developed statistical potentials, based on a formalism that highlights the coupling between four protein sequence and structure descriptors, and take into account the amino acid volume variation upon mutation. The stability change is expressed as a linear combination of these energy functions, whose proportionality coefficients vary with the solvent accessibility of the mutated residue and are identified with the help of a neural network. A correlation coefficient of R = 0.63 and a root mean square error of sigma(c) = 1.15 kcal/mol between measured and predicted stability changes are obtained upon cross-validation. These scores reach R = 0.79, and sigma(c) = 0.86 kcal/mol after exclusion of 10% outliers. The predictive power of our method is shown to be significantly higher than that of other programs described in the literature. AVAILABILITY http://babylone.ulb.ac.be/popmusic
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles. Av Fr. Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | | | | | |
Collapse
|
76
|
Cohen M, Potapov V, Schreiber G. Four distances between pairs of amino acids provide a precise description of their interaction. PLoS Comput Biol 2009; 5:e1000470. [PMID: 19680437 PMCID: PMC2715887 DOI: 10.1371/journal.pcbi.1000470] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/15/2009] [Indexed: 11/18/2022] Open
Abstract
The three-dimensional structures of proteins are stabilized by the interactions between amino acid residues. Here we report a method where four distances are calculated between any two side chains to provide an exact spatial definition of their bonds. The data were binned into a four-dimensional grid and compared to a random model, from which the preference for specific four-distances was calculated. A clear relation between the quality of the experimental data and the tightness of the distance distribution was observed, with crystal structure data providing far tighter distance distributions than NMR data. Since the four-distance data have higher information content than classical bond descriptions, we were able to identify many unique inter-residue features not found previously in proteins. For example, we found that the side chains of Arg, Glu, Val and Leu are not symmetrical in respect to the interactions of their head groups. The described method may be developed into a function, which computationally models accurately protein structures.
Collapse
Affiliation(s)
- Mati Cohen
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Vladimir Potapov
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Gideon Schreiber
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
77
|
Lappe M, Bagler G, Filippis I, Stehr H, Duarte JM, Sathyapriya R. Designing evolvable libraries using multi-body potentials. Curr Opin Biotechnol 2009; 20:437-46. [DOI: 10.1016/j.copbio.2009.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Revised: 07/15/2009] [Accepted: 07/25/2009] [Indexed: 01/13/2023]
|
78
|
Gu J, Li H, Jiang H, Wang X. Optimizing energy potential for protein fold recognition with parametric evaluation function. J Comput Biol 2009; 16:427-42. [PMID: 19254182 DOI: 10.1089/cmb.2008.0128] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this paper, a new optimization method is proposed to determine a simplified energy potential for protein fold recognition, which consists of the residue-residue contact, hydrophobicity, and pseudodihedral potentials. With a parametric evaluation function method, the Z-scores of all the proteins in a training set are optimized simultaneously to obtain the best parameter set of the potential. For this multi-objective and multi-constraint problem, the new optimization scheme is very effective. The derived potential is then tested on two high-quality decoy sets and compared with other classical fold recognition potentials. With the simplified energy potential, we achieve a high level of discrimination capability between correct and incorrect folds.
Collapse
Affiliation(s)
- Junfeng Gu
- Department of Engineering Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Dalian, China
| | | | | | | |
Collapse
|
79
|
Combelles C, Gracy J, Heitz A, Craik DJ, Chiche L. Structure and folding of disulfide-rich miniproteins: insights from molecular dynamics simulations and MM-PBSA free energy calculations. Proteins 2009; 73:87-103. [PMID: 18393393 DOI: 10.1002/prot.22054] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The fold of small disulfide-rich proteins largely relies on two or more disulfide bridges that are main components of the hydrophobic core. Because of the small size of these proteins and their high cystine content, the cysteine connectivity has been difficult to ascertain in some cases, leading to uncertainties and debates in the literature. Here, we use molecular dynamics simulations and MM-PBSA free energy calculations to compare similar folds with different disulfide pairings in two disulfide-rich miniprotein families, namely the knottins and the short-chain scorpion toxins, for which the connectivity has been discussed. We first show that the MM-PBSA approach is able to discriminate the correct knotted topology of knottins from the laddered one. Interestingly, a comparison of the free energy components for kalata B1 and MCoTI-II suggests that cyclotides and squash inhibitors, although sharing the same scaffold, are stabilized through different interactions. Application to short-chain scorpion toxins suggests that the conventional cysteine pairing found in many homologous toxins is significantly more stable than the unconventional pairing reported for maurotoxin and for spinoxin. This would mean that native maurotoxin and spinoxin are not at the lowest free energy minimum and might result from kinetically rather than thermodynamically driven oxidative folding processes. For both knottins and toxins, the correct or conventional disulfide connectivities provide lower flexibilities and smaller deviations from the initial conformations. Overall, our work suggests that molecular dynamics simulations and the MM-PBSA approach to estimate free energies are useful tools to analyze and compare disulfide bridge connectivities in miniproteins.
Collapse
Affiliation(s)
- Cecil Combelles
- Université de Montpellier, CNRS, UMR5048, Centre de Biochimie Structurale, 34090 Montpellier, France
| | | | | | | | | |
Collapse
|
80
|
Solis AD, Rackovsky S. Information and discrimination in pairwise contact potentials. Proteins 2008; 71:1071-87. [PMID: 18004788 DOI: 10.1002/prot.21733] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | |
Collapse
|
81
|
Folch B, Rooman M, Dehouck Y. Thermostability of salt bridges versus hydrophobic interactions in proteins probed by statistical potentials. J Chem Inf Model 2007; 48:119-27. [PMID: 18161956 DOI: 10.1021/ci700237g] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The temperature dependence of the interactions that stabilize protein structures is a long-standing issue, the elucidation of which would enable the prediction and the rational modification of the thermostability of a target protein. It is tackled here by deriving distance-dependent amino acid pair potentials from four datasets of proteins with increasing melting temperatures (Tm). The temperature dependence of the interactions is determined from the differences in the shape of the potentials derived from the four datasets. Note that, here, we use an unusual dataset definition, which is based on the Tm values, rather than on the living temperature of the host organisms. Our results show that the stabilizing weight of hydrophobic interactions (between Ile, Leu, and Val) remains constant as the temperature increases, compared to the other interactions. In contrast, the two minima of the Arg--Glu and Arg--Asp salt bridge potentials show a significant Tm dependence. These two minima correspond to two geometries: the fork--fork geometry, where the side chains point toward each other, and the fork--stick geometry, which involves the N(epsilon) side chain atom of Arg. These two types of salt bridges were determined to be significantly more stabilizing at high temperature. Moreover, a preference for more-compact salt bridges is noticeable in heat-resistant proteins, especially for the fork--fork geometry. The Tm-dependent potentials that have been defined here should be useful for predicting thermal stability changes upon mutation.
Collapse
Affiliation(s)
- Benjamin Folch
- Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av. F. Roosevelt 50, CP 165/61, 1050 Bruxelles, Belgium.
| | | | | |
Collapse
|
82
|
OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing. J Mol Biol 2007; 376:288-301. [PMID: 18177896 DOI: 10.1016/j.jmb.2007.11.033] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2007] [Revised: 11/06/2007] [Accepted: 11/13/2007] [Indexed: 11/22/2022]
Abstract
Here we report an orientation-dependent statistical all-atom potential derived from side-chain packing, named OPUS-PSP. It features a basis set of 19 rigid-body blocks extracted from the chemical structures of all 20 amino acid residues. The potential is generated from the orientation-specific packing statistics of pairs of those blocks in a non-redundant structural database. The purpose of such an approach is to capture the essential elements of orientation dependence in molecular packing interactions. Tests of OPUS-PSP on commonly used decoy sets demonstrate that it significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and consistency in achieving high Z-scores across decoy sets. As OPUS-PSP excludes interactions among main-chain atoms, its success highlights the crucial importance of side-chain packing in forming native protein structures. Moreover, OPUS-PSP does not explicitly include solvation terms, and thus the potential should perform well when the solvation effect is difficult to determine, such as in membrane proteins. Overall, OPUS-PSP is a generally applicable potential for protein structure modeling, especially for handling side-chain conformations, one of the most difficult steps in high-accuracy protein structure prediction and refinement.
Collapse
|
83
|
Lin MS, Fawzi NL, Head-Gordon T. Hydrophobic potential of mean force as a solvation function for protein structure prediction. Structure 2007; 15:727-40. [PMID: 17562319 DOI: 10.1016/j.str.2007.05.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2006] [Revised: 05/04/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
We have developed a solvation function that combines a Generalized Born model for polarization of protein charge by the high dielectric solvent, with a hydrophobic potential of mean force (HPMF) as a model for hydrophobic interaction, to aid in the discrimination of native structures from other misfolded states in protein structure prediction. We find that our energy function outperforms other reported scoring functions in terms of correct native ranking for 91% of proteins and low Z scores for a variety of decoy sets, including the challenging Rosetta decoys. This work shows that the stabilizing effect of hydrophobic exposure to aqueous solvent that defines the HPMF hydration physics is an apparent improvement over solvent-accessible surface area models that penalize hydrophobic exposure. Decoys generated by thermal sampling around the native-state basin reveal a potentially important role for side-chain entropy in the future development of even more accurate free energy surfaces.
Collapse
Affiliation(s)
- Matthew S Lin
- UCSF/UCB Joint Graduate Group in Bioengineering, University of California-Berkeley, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
84
|
Wu Y, Lu M, Chen M, Li J, Ma J. OPUS-Ca: a knowledge-based potential function requiring only Calpha positions. Protein Sci 2007; 16:1449-63. [PMID: 17586777 PMCID: PMC2206690 DOI: 10.1110/ps.072796107] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this paper, we report a knowledge-based potential function, named the OPUS-Ca potential, that requires only Calpha positions as input. The contributions from other atomic positions were established from pseudo-positions artificially built from a Calpha trace for auxiliary purposes. The potential function is formed based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientational preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and solvation energy. From the testing of decoy recognition on a number of commonly used decoy sets, it is shown that the new potential function outperforms all known Calpha-based potentials and most other coarse-grained ones that require more information than Calpha positions. We hope that this potential function adds a new tool for protein structural modeling.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | | | | | | |
Collapse
|
85
|
Boas FE, Harbury PB. Potential energy functions for protein design. Curr Opin Struct Biol 2007; 17:199-204. [PMID: 17387014 DOI: 10.1016/j.sbi.2007.03.006] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2007] [Revised: 02/07/2007] [Accepted: 03/14/2007] [Indexed: 10/23/2022]
Abstract
Different potential energy functions have predominated in protein dynamics simulations, protein design calculations, and protein structure prediction. Clearly, the same physics applies in all three cases. The differences in potential energy functions reflect differences in how the calculations are performed. With improvements in computer power and algorithms, the same potential energy function should be applicable to all three problems. In this review, we examine energy functions currently used for protein design, and look to the molecular mechanics field for advances that could be used in the next generation of design algorithms. In particular, we focus on improved models of the hydrophobic effect, polarization and hydrogen bonding.
Collapse
Affiliation(s)
- F Edward Boas
- Department of Biochemistry, Stanford University School of Medicine, Beckman B437, Stanford, CA 94305-5307, USA
| | | |
Collapse
|
86
|
Fogolari F, Pieri L, Dovier A, Bortolussi L, Giugliarelli G, Corazza A, Esposito G, Viglino P. Scoring predictive models using a reduced representation of proteins: model and energy definition. BMC STRUCTURAL BIOLOGY 2007; 7:15. [PMID: 17378941 PMCID: PMC1854906 DOI: 10.1186/1472-6807-7-15] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2006] [Accepted: 03/23/2007] [Indexed: 11/25/2022]
Abstract
Background Reduced representations of proteins have been playing a keyrole in the study of protein folding. Many such models are available, with different representation detail. Although the usefulness of many such models for structural bioinformatics applications has been demonstrated in recent years, there are few intermediate resolution models endowed with an energy model capable, for instance, of detecting native or native-like structures among decoy sets. The aim of the present work is to provide a discrete empirical potential for a reduced protein model termed here PC2CA, because it employs a PseudoCovalent structure with only 2 Centers of interactions per Amino acid, suitable for protein model quality assessment. Results All protein structures in the set top500H have been converted in reduced form. The distribution of pseudobonds, pseudoangle, pseudodihedrals and distances between centers of interactions have been converted into potentials of mean force. A suitable reference distribution has been defined for non-bonded interactions which takes into account excluded volume effects and protein finite size. The correlation between adjacent main chain pseudodihedrals has been converted in an additional energetic term which is able to account for cooperative effects in secondary structure elements. Local energy surface exploration is performed in order to increase the robustness of the energy function. Conclusion The model and the energy definition proposed have been tested on all the multiple decoys' sets in the Decoys'R'us database. The energetic model is able to recognize, for almost all sets, native-like structures (RMSD less than 2.0 Å). These results and those obtained in the blind CASP7 quality assessment experiment suggest that the model compares well with scoring potentials with finer granularity and could be useful for fast exploration of conformational space. Parameters are available at the url: .
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Lidia Pieri
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
- INAF – Astronomical Observatory of Padova Vicolo dell'Osservatorio 5, I-35122 Padova, Italy
| | - Agostino Dovier
- Dipartimento di Matematica e Informatica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Luca Bortolussi
- Dipartimento di Matematica e Informatica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Gilberto Giugliarelli
- Dipartimento di Fisica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Alessandra Corazza
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Gennaro Esposito
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Paolo Viglino
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|
87
|
Cheng J, Pei J, Lai L. A free-rotating and self-avoiding chain model for deriving statistical potentials based on protein structures. Biophys J 2007; 92:3868-77. [PMID: 17351015 PMCID: PMC1868969 DOI: 10.1529/biophysj.106.102152] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Statistical potentials have been widely used in protein studies despite the much-debated theoretical basis. In this work, we have applied two physical reference states for deriving the statistical potentials based on protein structure features to achieve zero interaction and orthogonalization. The free-rotating chain-based potential applies a local free-rotating chain reference state, which could theoretically be described by the Gaussian distribution. The self-avoiding chain-based potential applies a reference state derived from a database of artificial self-avoiding backbones generated by Monte Carlo simulation. These physical reference states are independent of known protein structures and are based solely on the analytical formulation or simulation method. The new potentials performed better and yielded higher Z-scores and success rates compared to other statistical potentials. The end-to-end distance distribution produced by the self-avoiding chain model was similar to the distance distribution of protein atoms in structure database. This fact may partly explain the basis of the reference states that depend on the atom pair frequency observed in the protein database. The current study showed that a more physical reference model improved the performance of statistical potentials in protein fold recognition, which could also be extended to other types of applications.
Collapse
Affiliation(s)
- Ji Cheng
- State Key Laboratory for Structural Chemistry of Stable and Unstable Species, College of Chemistry and Molecular Engineering, and Center for Theoretical Biology, Peking University, Beijing, China
| | | | | |
Collapse
|
88
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1765] [Impact Index Per Article: 103.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|