1
|
Stahl K, Schneider M, Brock O. EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinformatics 2017; 18:303. [PMID: 28623886 PMCID: PMC5474060 DOI: 10.1186/s12859-017-1713-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/30/2017] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/.
Collapse
Affiliation(s)
- Kolja Stahl
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Marchstraße 23, Berlin, 10587 Germany
| |
Collapse
|
2
|
Schneider M, Brock O. Combining physicochemical and evolutionary information for protein contact prediction. PLoS One 2014; 9:e108438. [PMID: 25338092 PMCID: PMC4206277 DOI: 10.1371/journal.pone.0108438] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 07/28/2014] [Indexed: 11/18/2022] Open
Abstract
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.
Collapse
Affiliation(s)
- Michael Schneider
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
| | - Oliver Brock
- Robotics and Biology Laboratory, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany
- * E-mail:
| |
Collapse
|
3
|
Arab S, Sadeghi M, Eslahchi C, Pezeshk H, Sheari A. A pairwise residue contact area-based mean force potential for discrimination of native protein structure. BMC Bioinformatics 2010; 11:16. [PMID: 20064218 PMCID: PMC2821318 DOI: 10.1186/1471-2105-11-16] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2009] [Accepted: 01/09/2010] [Indexed: 11/21/2022] Open
Abstract
Background Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. Results We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. Conclusions This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield
Collapse
Affiliation(s)
- Shahriar Arab
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | | | | | | | | |
Collapse
|
4
|
Handl J, Knowles J, Lovell SC. Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction. Bioinformatics 2009; 25:1271-9. [PMID: 19297350 PMCID: PMC2677743 DOI: 10.1093/bioinformatics/btp150] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2008] [Revised: 03/06/2009] [Accepted: 03/14/2009] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. RESULTS We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. AVAILABILITY Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.
Collapse
Affiliation(s)
- Julia Handl
- Faculty of Life Sciences, University of Manchester, Manchester, UK
| | | | | |
Collapse
|
5
|
Arnautova YA, Scheraga HA. Use of decoys to optimize an all-atom force field including hydration. Biophys J 2008; 95:2434-49. [PMID: 18502794 PMCID: PMC2517034 DOI: 10.1529/biophysj.108.133587] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 05/07/2008] [Indexed: 11/18/2022] Open
Abstract
A novel method of parameter optimization is proposed. It makes use of large sets of decoys generated for six nonhomologous proteins with different architecture. Parameter optimization is achieved by creating a free energy gap between sets of nativelike and nonnative conformations. The method is applied to optimize the parameters of a physics-based scoring function consisting of the all-atom ECEPP05 force field coupled with an implicit solvent model (a solvent-accessible surface area model). The optimized force field is able to discriminate near-native from nonnative conformations of the six training proteins when used either for local energy minimization or for short Monte Carlo simulated annealing runs after local energy minimization. The resulting force field is validated with an independent set of six nonhomologous proteins, and appears to be transferable to proteins not included in the optimization; i.e., for five out of the six test proteins, decoys with 1.7- to 4.0-A all-heavy-atom root mean-square deviations emerge as those with the lowest energy. In addition, we examined the set of misfolded structures created by Park and Levitt using a four-state reduced model. The results from these additional calculations confirm the good discriminative ability of the optimized force field obtained with our decoy sets.
Collapse
Affiliation(s)
- Yelena A Arnautova
- Department of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, New York 14853-1301, USA
| | | |
Collapse
|
6
|
Chiu YY, Hwang JK, Yang JM. Soft energy function and generic evolutionary method for discriminating native from nonnative protein conformations. J Comput Chem 2008; 29:1364-73. [PMID: 18181137 DOI: 10.1002/jcc.20897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We have developed a soft energy function, termed GEMSCORE, for the protein structure prediction, which is one of emergent issues in the computational biology. The GEMSORE consists of the van der Waals, the hydrogen-bonding potential and the solvent potential with 12 parameters which are optimized by using a generic evolutionary method. The GEMSCORE is able to successfully identify 86 native proteins among 96 target proteins on six decoy sets from more 70,000 near-native structures. For these six benchmark datasets, the predictive performance of the GEMSCORE, based on native structure ranking and Z-scores, was superior to eight other energy functions. Our method is based solely on a simple and linear function and thus is considerably faster than other methods that rely on the additional complex calculations. In addition, the GEMSCORE recognized 17 and 2 native structures as the first and the second rank, respectively, among 21 targets in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction). These results suggest that the GEMSCORE is fast and performs well to discriminate between native and nonnative structures from thousands of protein structure candidates. We believe that GEMSCORE is robust and should be a useful energy function for the protein structure prediction.
Collapse
Affiliation(s)
- Yi-yuan Chiu
- Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan
| | | | | |
Collapse
|
7
|
Panjkovich A, Melo F, Marti-Renom MA. Evolutionary potentials: structure specific knowledge-based potentials exploiting the evolutionary record of sequence homologs. Genome Biol 2008; 9:R68. [PMID: 18397517 PMCID: PMC2643939 DOI: 10.1186/gb-2008-9-4-r68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 04/02/2008] [Accepted: 04/08/2008] [Indexed: 11/10/2022] Open
Abstract
So-called ‘Evolutionary potentials’ for protein structure prediction are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. We introduce a new type of knowledge-based potentials for protein structure prediction, called 'evolutionary potentials', which are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. The new potentials have been benchmarked against other knowledge-based potentials, resulting in a significant increase in accuracy for model assessment. In contrast to standard knowledge-based potentials, we propose that evolutionary potentials capture key determinants of thermodynamic stability and specific sequence constraints required for fast folding.
Collapse
Affiliation(s)
- Alejandro Panjkovich
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | | | |
Collapse
|
8
|
Strodel B, Wales DJ. Implicit Solvent Models and the Energy Landscape for Aggregation of the Amyloidogenic KFFE Peptide. J Chem Theory Comput 2008; 4:657-72. [DOI: 10.1021/ct700305w] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Birgit Strodel
- University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, U.K
| | - David J. Wales
- University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
9
|
Lin MS, Fawzi NL, Head-Gordon T. Hydrophobic potential of mean force as a solvation function for protein structure prediction. Structure 2007; 15:727-40. [PMID: 17562319 DOI: 10.1016/j.str.2007.05.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2006] [Revised: 05/04/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
We have developed a solvation function that combines a Generalized Born model for polarization of protein charge by the high dielectric solvent, with a hydrophobic potential of mean force (HPMF) as a model for hydrophobic interaction, to aid in the discrimination of native structures from other misfolded states in protein structure prediction. We find that our energy function outperforms other reported scoring functions in terms of correct native ranking for 91% of proteins and low Z scores for a variety of decoy sets, including the challenging Rosetta decoys. This work shows that the stabilizing effect of hydrophobic exposure to aqueous solvent that defines the HPMF hydration physics is an apparent improvement over solvent-accessible surface area models that penalize hydrophobic exposure. Decoys generated by thermal sampling around the native-state basin reveal a potentially important role for side-chain entropy in the future development of even more accurate free energy surfaces.
Collapse
Affiliation(s)
- Matthew S Lin
- UCSF/UCB Joint Graduate Group in Bioengineering, University of California-Berkeley, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
10
|
Staritzbichler R, Gu W, Helms V. Are solvation free energies of homogeneous helical peptides additive? J Phys Chem B 2007; 109:19000-7. [PMID: 16853446 DOI: 10.1021/jp052403x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We investigated the additivity of the solvation free energy of amino acids in homogeneous helices of different length in water and in chloroform. Solvation free energies were computed by multiconfiguration thermodynamic integration involving extended molecular dynamics simulations and by applying the generalized-born surface area solvation model to static helix geometries. The investigation focused on homogeneous peptides composed of uncharged amino acids, where the backbone atoms are kept fixed in an ideal helical conformation. We found nonlinearity especially for short peptides, which does not allow a simple treatment of the interaction of amino acids with their surroundings. For homogeneous peptides longer than five residues, the results from both methods are in quite good agreement and solvation energies are to a good extent additive.
Collapse
|
11
|
Zhu J, Alexov E, Honig B. Comparative study of generalized born models: Born radii and peptide folding. J Phys Chem B 2007; 109:3008-22. [PMID: 16851315 DOI: 10.1021/jp046307s] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this study, we have implemented four analytical generalized Born (GB) models and investigated their performance in conjunction with the GROMOS96 force field. The four models include that of Still and co-workers, the HCT model of Cramer, Truhlar, and co-workers, a modified form of the AGB model of Levy and co-workers, and the GBMV2 model of Brooks and co-workers. The models were coded independently and implemented in the GROMOS software package and in TINKER. They were compared in terms of their ability to reproduce the results of Poisson-Boltzmann (PB) calculations and in their performance in the ab initio peptide folding of two peptides, one that forms a beta-hairpin in solution and one that forms an alpha-helix. In agreement with previous work, the GBMV2 model is most successful in reproducing PB results while the other models tend to underestimate the effective Born radii of buried atoms. In contrast, stochastic dynamics simulations on the folding of the two peptides, the C-terminus beta-hairpin of the B1 domain of protein G and the alanine-based alpha-helical peptide 3K(I), suggest that the simpler GB models are more effective in sampling conformational space. Indeed, the Still model used in conjunction with the GROMOS96 force field is able to fold the hairpin peptide to a native-like structure without the benefit of enhanced sampling techniques. This is due in part to the properties of the united-atom GROMOS96 force field which appears to be more flexible, and hence to sample more efficiently, than force fields such as OPLSAA. Our results suggest a general strategy which involves using different combinations of force fields and solvent models in different applications, for example, using GROMOS96 and a simple GB model in sampling and OPLSAA and a more accurate GB model in refinement. The fact that various methods have been implemented in a unified way should facilitate the testing and subsequent use of different methods to evaluate conformational free energies in different applications. Our results also bear on some general issues involved in peptide folding and structure prediction which are addressed in the Discussion.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, New York 10032, USA
| | | | | |
Collapse
|
12
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|
13
|
Graña O, Baker D, MacCallum RM, Meiler J, Punta M, Rost B, Tress ML, Valencia A. CASP6 assessment of contact prediction. Proteins 2006; 61 Suppl 7:214-224. [PMID: 16187364 DOI: 10.1002/prot.20739] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Here we present the evaluation results of the Critical Assessment of Protein Structure Prediction (CASP6) contact prediction category. Contact prediction was assessed with standard measures well known in the field and the performance of specialist groups was evaluated alongside groups that submitted models with 3D coordinates. The evaluation was mainly focused on long range contact predictions for the set of new fold targets, although we analyzed predictions for all targets. Three groups with similar levels of accuracy and coverage performed a little better than the others. Comparisons of the predictions of the three best methods with those of CASP5/CAFASP3 suggested some improvement, although there were not enough targets in the comparisons to make this statistically significant.
Collapse
Affiliation(s)
- Osvaldo Graña
- Protein Design Group, Centro Nacional de Biotecnologia (CNB-CSIC), C/Darwin 3, Cantoblanco, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Narang P, Bhushan K, Bose S, Jayaram B. Protein Structure Evaluation using an All-Atom Energy Based Empirical Scoring Function. J Biomol Struct Dyn 2006; 23:385-406. [PMID: 16363875 DOI: 10.1080/07391102.2006.10531234] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Arriving at the native conformation of a polypeptide chain characterized by minimum most free energy is a problem of long standing interest in protein structure prediction endeavors. Owing to the computational requirements in developing free energy estimates, scoring functions--energy based or statistical--have received considerable renewed attention in recent years for distinguishing native structures of proteins from non-native like structures. Several cleverly designed decoy sets, CASP (Critical Assessment of Techniques for Protein Structure Prediction) structures and homology based internet accessible three dimensional model builders are now available for validating the scoring functions. We describe here an all-atom energy based empirical scoring function and examine its performance on a wide series of publicly available decoys. Barring two protein sequences where native structure is ranked second and seventh, native is identified as the lowest energy structure in 67 protein sequences from among 61,659 decoys belonging to 12 different decoy sets. We further illustrate a potential application of the scoring function in bracketing native-like structures of two small mixed alpha/beta globular proteins starting from sequence and secondary structural information. The scoring function has been web enabled at www.scfbio-iitd.res.in/utility/proteomics/energy.jsp.
Collapse
Affiliation(s)
- Pooja Narang
- Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi - 110016, India.
| | | | | | | |
Collapse
|
15
|
Lee MC, Yang R, Duan Y. Comparison between Generalized-Born and Poisson-Boltzmann methods in physics-based scoring functions for protein structure prediction. J Mol Model 2005; 12:101-10. [PMID: 16096807 DOI: 10.1007/s00894-005-0013-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Accepted: 06/23/2005] [Indexed: 11/28/2022]
Abstract
Continuum solvent models such as Generalized-Born and Poisson-Boltzmann methods hold the promise to treat solvation effect efficiently and to enable rapid scoring of protein structures when they are combined with physics-based energy functions. Yet, direct comparison of these two approaches on large protein data set is lacking. Building on our previous work with a scoring function based on a Generalized-Born (GB) solvation model, and short molecular-dynamics simulations, we further extended the scoring function to compare with the MM-PBSA method to treat the solvent effect. We benchmarked this scoring function against seven publicly available decoy sets. We found that, somewhat surprisingly, the results of MM-PBSA approach are comparable to the previous GB-based scoring function. We also discussed the effect to the scoring function accuracy due to presence of large ligands and ions in some native structures of the decoy sets.
Collapse
Affiliation(s)
- Matthew C Lee
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE 19716, USA
| | | | | |
Collapse
|
16
|
Feig M, Brooks CL. Recent advances in the development and application of implicit solvent models in biomolecule simulations. Curr Opin Struct Biol 2005; 14:217-24. [PMID: 15093837 DOI: 10.1016/j.sbi.2004.03.009] [Citation(s) in RCA: 403] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Advances have recently been made in the development of implicit solvent methodologies and their application to the modeling of biomolecules, particularly with regard to generalized Born approaches, dielectric screening function formulations and models based on solvent-accessible surface areas. Interesting new developments include more refined non-polar solvation energy estimators, and implicit methods for modeling low-dielectric and heterogeneous environments such as membrane systems. These have been successfully applied to molecular dynamics simulations, the scoring of protein conformations, and the calculation of binding affinities and folding free energy landscapes.
Collapse
Affiliation(s)
- Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | | |
Collapse
|
17
|
Buchete NV, Straub JE, Thirumalai D. Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol 2005; 14:225-32. [PMID: 15093838 DOI: 10.1016/j.sbi.2004.03.002] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The need to perform large-scale studies of protein fold recognition, structure prediction and protein-protein interactions has led to novel developments of residue-level minimal models of proteins. A minimum requirement for useful protein force-fields is that they be successful in the recognition of native conformations. The balance between the level of detail in describing the specific interactions within proteins and the accuracy obtained using minimal protein models is the focus of many current protein studies. Recent results suggest that the introduction of explicit orientation dependence in a coarse-grained, residue-level model improves the ability of inter-residue potentials to recognize the native state. New statistical and optimization computational algorithms can be used to obtain accurate residue-dependent potentials for use in protein fold recognition and, more importantly, structure prediction.
Collapse
Affiliation(s)
- N-V Buchete
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | |
Collapse
|
18
|
Abstract
Energy functions are crucial ingredients of protein tertiary structure prediction methods. Assessing the quality of energy functions is therefore of prime importance. It requires the elaboration of a standard evaluation scheme, whose key elements are: i). sets that contain the native and several non-native structures of proteins (decoys) in order to test whether the energy functions display the expected quality features and ii). measures to evaluate the reliability of energy functions. We present here a survey of the recent advances in these two related fields. In a first part, we analyze and review the large number of decoy sets that are available on the web, and we summarize the characteristics of a challenging decoy set. We then discuss how to define the quality of energy functions and review the measures related to it.
Collapse
Affiliation(s)
- D Gilis
- Center of Applied Molecular Engineering, Institute of Chemistry and Biochemistry, University of Salzburg, Jakob Haringerstrabe 3, A-5020 Salzburg, Austria.
| |
Collapse
|
19
|
Im W, Chen J, Brooks CL. Peptide and protein folding and conformational equilibria: theoretical treatment of electrostatics and hydrogen bonding with implicit solvent models. ADVANCES IN PROTEIN CHEMISTRY 2005; 72:173-98. [PMID: 16581377 DOI: 10.1016/s0065-3233(05)72007-6] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Since biomolecules exist in aqueous and membrane environments, the accurate modeling of solvation, and hydrogen bonding interactions in particular, is essential for the exploration of structure and function in theoretical and computational studies. In this chapter, we focus on alternatives to explicit solvent models and discuss recent advances in generalized Born (GB) implicit solvent theories. We present a brief review of the successes and shortcomings of the application of these theories to biomolecular problems that are strongly linked to backbone H-bonding and electrostatics. This discussion naturally leads us to explore existing areas for improvement in current GB theories and our approach towards addressing a number of the key issues that remain in the refinement of these models. Specifically, the critical importance of balancing solvation forces and intramolecular forces in GB models is illustrated by examining the influence of backbone hydrogen bond strength and backbone dihedral energetics on conformational equilibria of small peptids.
Collapse
Affiliation(s)
- Wonpil Im
- Department of Molecular Biology and Center for Theoretical Biological Physics, The Scripps Research Institute, La Jolla, California 92037
| | | | | |
Collapse
|
20
|
Wang K, Fain B, Levitt M, Samudrala R. Improved protein structure selection using decoy-dependent discriminatory functions. BMC STRUCTURAL BIOLOGY 2004; 4:8. [PMID: 15207004 PMCID: PMC449718 DOI: 10.1186/1472-6807-4-8] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2004] [Accepted: 06/18/2004] [Indexed: 11/10/2022]
Abstract
BACKGROUND A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations. RESULTS We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement. CONCLUSIONS Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Boris Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ram Samudrala
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|