651
|
Zhang J, Chen R, Liang J. Empirical potential function for simplified protein models: combining contact and local sequence-structure descriptors. Proteins 2006; 63:949-60. [PMID: 16477624 DOI: 10.1002/prot.20809] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only Calpha or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue-specific reduced discrete-state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in a test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or Calpha atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side-chain centers or coordinates of all side-chain atoms. By reducing the residue alphabets down to size 10 for contact descriptors, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Bioengineering, University of Illinois, Chicago, Illinois, USA
| | | | | |
Collapse
|
652
|
de Sancho D, Rey A. Assessment of protein folding potentials with an evolutionary method. J Chem Phys 2006; 125:014904. [PMID: 16863330 DOI: 10.1063/1.2210931] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many different protein folding potentials have been developed in the last decades, based upon knowledge of experimentally determined protein structures. Decoy-based techniques are frequently used to assess these force fields, but other methods can explore different features in the performance of the interaction schemes, thus helping in their evaluation. Here, we propose an evolutionary strategy to efficiently assess folding potentials. We apply it to three potentials with different characteristics, taken from the bibliography. A search for minimum energy protein topologies, treated as arrangements of rigid protein fragments, is performed. The method, applied to a set of helix bundle proteins, shows the different behavior of the studied potentials, providing a reasonably fast tool to evaluate their advantages and limitations.
Collapse
Affiliation(s)
- David de Sancho
- Departamento de Química Física I, Facultad de Ciencias Químicas, Universidad Complutense, E-28040 Madrid, Spain
| | | |
Collapse
|
653
|
de Vries SJ, Bonvin AMJJ. Intramolecular surface contacts contain information about protein-protein interface regions. ACTA ACUST UNITED AC 2006; 22:2094-8. [PMID: 16766554 DOI: 10.1093/bioinformatics/btl275] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Some amino acids clearly show preferences over others in protein-protein interfaces. These preferences, or so-called interface propensities can be used for a priori interface prediction. We investigated whether the prediction accuracy could be improved by considering not single but pairs of residues in an interface. Here we present the first systematic analysis of intramolecular surface contacts in interface prediction. RESULTS We show that preferences do exist for contacts within and around an interface region within one molecule: specific pairs of amino acids are more often occurring than others. Using intramolecular contact propensities in a blind test, higher average scores were assigned to interface residues than to non-interface residues. This effect persisted as small but significant when the contact propensities were corrected to eliminate the influence of single amino acid interface propensity. This indicates that intramolecular contact propensities may replace interface propensities in protein-protein interface prediction. AVAILABILITY The source code is available on request from the authors.
Collapse
Affiliation(s)
- Sjoerd J de Vries
- Faculty of Sciences, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584CH, Utrecht, The Netherlands
| | | |
Collapse
|
654
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|
655
|
Qiu J, Elber R. Atomically detailed potentials to recognize native and approximate protein structures. Proteins 2006; 61:44-55. [PMID: 16080157 DOI: 10.1002/prot.20585] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Atomically detailed potentials for recognition of protein folds are presented. The potentials consist of pair interactions between atoms. One or three distance steps are used to describe the range of interactions between a pair. Training is carried out with the mathematical programming approach on the decoy sets of Baker, Levitt, and some of our own design. Recognition is required not only for decoy-native structural pairs but also for pairs of decoy and homologous structures. Performance is tested on the targets of CASP5 using templates from the Protein Data Bank, on two test ab initio decoy sets from Skolnick's laboratory, and on decoy sets from Moult's laboratory. We conclude that the newly derived potentials have significant recognition capacity, comparable to the best models derived from other techniques. The new potentials require a significantly smaller number of parameters. The enhanced recognition capacity extends primarily to the identification of structures generated by ab initio simulation and less to the recognition of approximate shapes created by homology.
Collapse
Affiliation(s)
- Jian Qiu
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
656
|
Zhang C, Liu S, Zhou Y. Docking prediction using biological information, ZDOCK sampling technique, and clustering guided by the DFIRE statistical energy function. Proteins 2006; 60:314-8. [PMID: 15981255 DOI: 10.1002/prot.20576] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We entered the CAPRI experiment during the middle of Round 4 and have submitted predictions for all 6 targets released since then. We used the following procedures for docking prediction: (1) the identification of possible binding region(s) of a target based on known biological information, (2) rigid-body sampling around the binding region(s) by using the docking program ZDOCK, (3) ranking of the sampled complex conformations by employing the DFIRE-based statistical energy function, (4) clustering based on pairwise root-mean-square distance and the DFIRE energy, and (5) manual inspection and relaxation of the side-chain conformations of the top-ranked structures by geometric constraint. Reasonable predictions were made for 4 of the 6 targets. The best fraction of native contacts within the top 10 models are 89.1% for Target 12, 54.3% for Target 13, 29.3% for Target 14, and 94.1% for Target 18. The origin of successes and failures is discussed. .
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, 14214, USA
| | | | | |
Collapse
|
657
|
Fang Q, Shortle D. Protein refolding in silico with atom-based statistical potentials and conformational search using a simple genetic algorithm. J Mol Biol 2006; 359:1456-67. [PMID: 16678202 DOI: 10.1016/j.jmb.2006.04.033] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2006] [Revised: 04/10/2006] [Accepted: 04/12/2006] [Indexed: 11/23/2022]
Abstract
A distance-dependent atom-pair potential that treats long range and local interactions separately has been developed and optimized to distinguish native protein structures from sets of incorrect or decoy structures. Atoms are divided into 30 types based on chemical properties and relative position in the amino acid side-chains. Several parameters affecting the calculation and evaluation of this statistical potential, such as the reference state, the bin width, cutoff distances between pairs, and the number of residues separating the atom pairs, are adjusted to achieve the best discrimination. The native structure has the lowest energy for 39 of the 40 sets of original ROSETTA decoys (1000 structures per set) and 23 of the 25 improved decoys (approximately 1900 structures per set). Combined with the orientation-dependent backbone hydrogen bonding potential used by ROSETTA and a statistical solvation potential based on the solvent exclusion model of Lazaridis & Karplus, this potential is used as a scoring function for conformational search based on a genetic algorithm method. After unfolding the native structure by changing every phi and psi angle by either +/-3, +/-5 or +/-7 degrees, five small proteins can be efficiently refolded, in some cases to within 0.5 A C(alpha) distance matrix error (DME) to the native state. Although no significant correlation is found between the total energy and structural similarity to the native state, a surprisingly strong correlation exists between the radius of gyration and the DME for low energy structures.
Collapse
Affiliation(s)
- Qiaojun Fang
- Department of Biological Chemistry, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | |
Collapse
|
658
|
Yesylevskyy SO, Kharkyanen VN, Demchenko AP. Dynamic protein domains: identification, interdependence, and stability. Biophys J 2006; 91:670-85. [PMID: 16632509 PMCID: PMC1483087 DOI: 10.1529/biophysj.105.078584] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Existing methods of domain identification in proteins usually provide no information about the degree of domain independence and stability. However, this information is vital for many areas of protein research. The recently developed hierarchical clustering of correlation patterns (HCCP) technique provides machine-based domain identification in a computationally simple and physically consistent way. Here we present the modification of this technique, which not only allows determination of the most plausible number of dynamic domains but also makes it possible to estimate the degree of their independence (the extent of correlated motion) and stability (the range of environmental conditions, where domains remain intact). With this technique we provided domain assignments and calculated intra- and interdomain correlations and interdomain energies for >2500 test proteins. It is shown that mean intradomain correlation of motions can serve as a quantitative criterion of domain independence, and the HCCP stability gap is a measure of their stability. Our data show that the motions of domains with high stability are usually independent. In contrast, the domains with moderate stability usually exhibit a substantial degree of correlated motions. It is shown that in multidomain proteins the domains are most stable if they are of similar size, and this correlates with the observed abundance of such proteins.
Collapse
Affiliation(s)
- Semen O Yesylevskyy
- Department of Physics of Biological Systems, Institute of Physics, National Academy of Sciences of Ukraine, Kiev, Ukraine.
| | | | | |
Collapse
|
659
|
Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 2006; 62:1125-32. [PMID: 16372356 DOI: 10.1002/prot.20810] [Citation(s) in RCA: 647] [Impact Index Per Article: 35.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.
Collapse
Affiliation(s)
- Jianlin Cheng
- Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, California 92697-3425, USA
| | | | | |
Collapse
|
660
|
Abstract
Scoring functions are widely used in the final step of model selection in protein structure prediction. This is of interest both for comparative modeling targets, where it is important to select the best model among a set of many good, "correct" ones, as well as for other (fold recognition or novel fold) targets, where the set may contain many incorrect models. A novel combination of four knowledge-based potentials recognizing different features of native protein structures is introduced and tested. The pairwise, solvation, hydrogen bond, and torsion angle potentials contain largely orthogonal information. Of these, the torsion angle potential is found to show the strongest correlation with model quality. Combining these features with a linear weighting function, it was possible to construct a robust energy function capable of discriminating native-like structures on several benchmarking sets. In a recent blind test (CAFASP-4 MQAP), the scoring function ranked consistently well and was able to reliably distinguish the correct template from an ensemble of high quality decoys in 52 of 70 cases (33 of 34 for comparative modeling). An executable version of the Victor/FRST function for Linux PCs is available for download from the URL http://protein.cribi.unipd.it/frst/.
Collapse
|
661
|
Mayewski S. A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins 2006; 59:152-69. [PMID: 15723360 DOI: 10.1002/prot.20397] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A new multibody, whole-residue potential for protein tertiary structure is described. The potential is based on the local environment surrounding each main-chain alpha carbon (CA), defined as the set of all residues whose CA coordinates lie within a spherical volume of set radius in 3-dimensional (3D) space surrounding that position. It is shown that the relative positions of the CAs in these local environments belong to a set of preferred templates. The templates are derived by cluster analysis of the presently available database of over 3000 protein chains (750,000 residues) having not more than 30% sequence similarity. For each template is derived also a set of residue propensities for each topological position in the template. Using lookup tables of these derived templates, it is then possible to calculate an energy for any conformation of a given protein sequence. The application of the potential to ab initio protein tertiary structure prediction is evaluated by performing Monte Carlo simulated annealing on test protein sequences.
Collapse
Affiliation(s)
- Stefan Mayewski
- Max-Planck-Institut für Biochemie, 82152 Martinsried, Germany.
| |
Collapse
|
662
|
Abstract
We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decomposition of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addition, this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
Collapse
Affiliation(s)
- Y Dehouck
- Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles, 1050 Brussels, Belgium.
| | | | | |
Collapse
|
663
|
Skolnick J. In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol 2006; 16:166-71. [PMID: 16524716 DOI: 10.1016/j.sbi.2006.02.004] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2005] [Revised: 02/10/2006] [Accepted: 02/23/2006] [Indexed: 11/19/2022]
Abstract
Key to successful protein structure prediction is a potential that recognizes the native state from misfolded structures. Recent advances in empirical potentials based on known protein structures include improved reference states for assessing random interactions, sidechain-orientation-dependent pair potentials, potentials for describing secondary or supersecondary structural preferences and, most importantly, optimization protocols that sculpt the energy landscape to enhance the correlation between native-like features and the energy. Improved clustering algorithms that select native-like structures on the basis of cluster density also resulted in greater prediction accuracy. For template-based modeling, these advances allowed improvement in predicted structures relative to their initial template alignments over a wide range of target-template homology. This represents significant progress and suggests applications to proteome-scale structure prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street, Buffalo, NY 14203, USA.
| |
Collapse
|
664
|
Xu Z, Zhang C, Liu S, Zhou Y. QBES: Predicting real values of solvent accessibility from sequences by efficient, constrained energy optimization. Proteins 2006; 63:961-6. [PMID: 16514609 DOI: 10.1002/prot.20934] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Solvent accessibility, one of the key properties of amino acid residues in proteins, can be used to assist protein structure prediction. Various approaches such as neural network, support vector machines, probability profiles, information theory, Bayesian theory, logistic function, and multiple linear regression have been developed for solvent accessibility prediction. In this article, a much simpler quadratic programming method based on the buriability parameter set of amino acid residues is developed. The new method, called QBES (Quadratic programming and Buriability Energy function for Solvent accessibility prediction), is reasonably accurate for predicting the real value of solvent accessibility. By using a dataset of 30 proteins to optimize three parameters, the average correlation coefficients between the predicted and actual solvent accessibility are about 0.5 for all four independent test sets ranging from 126 to 513 proteins. The method is efficient. It takes only 20 min for a regular PC to obtain results of 30 proteins with an average length of 263 amino acids. Although the proposed method is less accurate than a few more sophisticated methods based on neural network or support vector machines, this is the first attempt to predict solvent accessibility by energy optimization with constraints. Possible improvements and other applications of the method are discussed.
Collapse
Affiliation(s)
- Zhigang Xu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
665
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
666
|
Fogolari F, Tosatto SCE, Colombo G. A decoy set for the thermostable subdomain from chicken villin headpiece, comparison of different free energy estimators. BMC Bioinformatics 2005; 6:301. [PMID: 16354298 PMCID: PMC1351271 DOI: 10.1186/1471-2105-6-301] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2005] [Accepted: 12/14/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Estimators of free energies are routinely used to judge the quality of protein structural models. As these estimators still present inaccuracies, they are frequently evaluated by discriminating native or native-like conformations from large ensembles of so-called decoy structures. RESULTS A decoy set is obtained from snapshots taken from 5 long (100 ns) molecular dynamics (MD) simulations of the thermostable subdomain from chicken villin headpiece. An evaluation of the energy of the decoys is given using: i) a residue based contact potential supplemented by a term for the quality of dihedral angles; ii) a recently introduced combination of four statistical scoring functions for model quality estimation (FRST); iii) molecular mechanics with solvation energy estimated either according to the generalized Born surface area (GBSA) or iv) the Poisson-Boltzmann surface area (PBSA) method. CONCLUSION The decoy set presented here has the following features which make it attractive for testing energy scoring functions:1) it covers a broad range of RMSD values (from less than 2.0 A to more than 12 A);2) it has been obtained from molecular dynamics trajectories, starting from different non-native-like conformations which have diverse behaviour, with secondary structure elements correctly or incorrectly formed, and in one case folding to a native-like structure. This allows not only for scoring of static structures, but also for studying, using free energy estimators, the kinetics of folding;3) all structures have been obtained from accurate MD simulations in explicit solvent and after molecular mechanics (MM) energy minimization using an implicit solvent method. The quality of the covalent structure therefore does not suffer from steric or covalent problems. The statistical and physical effective energy functions tested on the set behave differently when native simulation snapshots are included or not in the set and when averaging over the trajectory is performed.
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Silvio CE Tosatto
- Dipartimento di Biologia and CRIBI Biotech Centre, Università di Padova, Viale G. Colombo 3, 35131 Padova, Italy
| | - Giorgio Colombo
- Istituto di Chimica del Riconoscimento Molecolare, CNR, Via Mario Bianco 9, 20131 Milano, Italy
| |
Collapse
|
667
|
González-Díaz H, Uriarte E. Proteins QSAR with Markov average electrostatic potentials. Bioorg Med Chem Lett 2005; 15:5088-94. [PMID: 16169216 DOI: 10.1016/j.bmcl.2005.07.056] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2005] [Revised: 06/28/2005] [Accepted: 07/05/2005] [Indexed: 11/30/2022]
Abstract
Classic physicochemical and topological indices have been largely used in small molecules QSAR but less in proteins QSAR. In this study, a Markov model is used to calculate, for the first time, average electrostatic potentials xik for an indirect interaction between aminoacids placed at topologic distances k within a given protein backbone. The short-term average stochastic potential xi1 for 53 Arc repressor mutants was used to model the effect of Alanine scanning on thermal stability. The Arc repressor is a model protein of relevance for biochemical studies on bioorganics and medicinal chemistry. A linear discriminant analysis model developed correctly classified 43 out of 53, 81.1% of proteins according to their thermal stability. More specifically, the model classified 20/28, 71.4% of proteins with near wild-type stability and 23/25, 92.0% of proteins with reduced stability. Moreover, predictability in cross-validation procedures was of 81.0%. Expansion of the electrostatic potential in the series xi0, xi1, xi2, and xi3, justified the use of the abrupt truncation approach, being the overall accuracy >70.0% for xi0 but equal for xi1, xi2, and xi3. The xi1 model compared favorably with respect to others based on D-Fire potential, surface area, volume, partition coefficient, and molar refractivity, with less than 77.0% of accuracy [Ramos de Armas, R.; González-Díaz, H.; Molina, R.; Uriarte, E. Protein Struct. Func. Bioinf.2004, 56, 715]. The xi1 model also has more tractable interpretation than others based on Markovian negentropies and stochastic moments. Finally, the model is notably simpler than the two models based on quadratic and linear indices. Both models, reported by Marrero-Ponce et al., use four-to-five time more descriptors. Introduction of average stochastic potentials may be useful for QSAR applications; having xik amenable physical interpretation and being very effective.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain.
| | | |
Collapse
|
668
|
Summa CM, Levitt M, Degrado WF. An atomic environment potential for use in protein structure prediction. J Mol Biol 2005; 352:986-1001. [PMID: 16126228 DOI: 10.1016/j.jmb.2005.07.054] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2004] [Revised: 06/20/2005] [Accepted: 07/20/2005] [Indexed: 11/25/2022]
Abstract
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.
Collapse
Affiliation(s)
- Christopher M Summa
- Department of Biochemistry and Biophysics, The University of Pennsylvania Medical School, Philadelphia, PA 19104-6059, USA
| | | | | |
Collapse
|
669
|
Odintsov SG, Sabała I, Bourenkov G, Rybin V, Bochtler M. Substrate access to the active sites in aminopeptidase T, a representative of a new metallopeptidase clan. J Mol Biol 2005; 354:403-12. [PMID: 16242715 DOI: 10.1016/j.jmb.2005.09.042] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2005] [Revised: 09/10/2005] [Accepted: 09/14/2005] [Indexed: 11/26/2022]
Abstract
Aminopeptidase T (AmpT) from Thermus thermophilus is a metalloexopeptidase with no similarity to prototypical metallopeptidases with an HExxH or HxxEH motif. The crystal structure of the Staphylococcus aureus homologue of AmpT, which is known as aminopeptidase S (AmpS), has been reported recently. This structure revealed a dimeric protein with a very unusual, elongated shape and a large internal cavity. The active sites were found on the inner walls of the cavity and were entirely shielded from the environment, which suggested either that the dimer in the crystals was not physiologically relevant, or that an inactive conformation had been crystallized. Here, we show by gel-filtration and analytical ultracentrifugation that AmpT, like AmpS, forms dimers in solution, and we present the structure of AmpT in a crystal form with five protomers in the asymmetric unit. The five protomers take conformations that range from fully closed, as in the AmpS structure, to nearly open, so that the active site is almost directly accessible. The different conformations indicate flexibility between the AmpT N and C-domains, and explain how AmpT can be active, although the unusual AmpS dimerization mode applies to AmpT as well.
Collapse
Affiliation(s)
- Sergey G Odintsov
- International Institute of Molecular and Cell Biology, ul. Trojdena 4, 02-109 Warsaw, Poland
| | | | | | | | | |
Collapse
|
670
|
Chen WW, Shakhnovich EI. Lessons from the design of a novel atomic potential for protein folding. Protein Sci 2005; 14:1741-52. [PMID: 15987903 PMCID: PMC2253347 DOI: 10.1110/ps.051440705] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
We investigate all-atom potentials of mean force for estimating free energies in protein folding and fold recognition. We search through the space potentials and design novel atomic potentials with a random mixing approximation and a contact-correlated Gaussian approximation of decoy states. We show that the two derived potentials are highly correlated, supporting the use of the random energy model as an accurate statistical description of protein conformational states. The novel atomic potentials perform well in a Z-score and fold decoy recognition test. Furthermore, the designed atomic potential performs slightly and significantly better than atomic potentials derived under a quasi-chemical assumption. While accounting for connectivity correlations between atom types does not improve the performance of the designed potential, we show these correlations lead to ambiguities in the distribution of energetic contributions for atoms on the same residue. Within the confines of the model then, many potentials may exist which stabilize all native folds in subtly different ways. Comparison of different protein conformations under the various atomic potentials reveals both a remarkable degree of correspondence in the estimated free energies and a remarkable degree of correspondence in the identity of the contacts types that make the dominant contributions to the estimated free energies. This consistency may be interpreted as a sign that the design procedure is extracting physically meaningful quantities.
Collapse
Affiliation(s)
- William W Chen
- Department of Biophysics, Harvard University, Boston, MA 02115, USA
| | | |
Collapse
|
671
|
Hoppe C, Schomburg D. Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potential. Protein Sci 2005; 14:2682-92. [PMID: 16155198 PMCID: PMC2253293 DOI: 10.1110/ps.04940705] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The increasing use of enzymes in industrial processes and the importance of understanding protein folding and stability have led to several attempts to predict and quantify the effect of every possible amino acid exchange (mutation) on the thermostability of proteins. In this article we describe a knowledge-based discrimination function that acts as a fast and reliable guide in protein engineering and optimization. The function used consists of two parts, a pairwise energy function based on a distance- and direction-dependent atomic description of the amino acid environment, and a torsion angle energy function. In a first step a training set of 11 proteins including 646 mutant proteins with experimentally determined thermostability was used to optimize the knowledge-based energy functions. The resulting potential function was then tested using a test mutant database consisting of 918 various point mutations introduced in 27 proteins. The best correlation coefficient obtained for the experimental data and the predicted thermostability for the training set is r = 0.81 (561 data points). A total of 76% of the mutations could be predicted correctly as being either stabilizing or destabilizing. The results for the test set are r = 0.74 (747 data points) and 72%, respectively. The global correlation over the combined data (1308 mutants) obtained is 0.78.
Collapse
Affiliation(s)
- Christian Hoppe
- Institut für Biochemie, Zülpicher Strasse 47, 50674 Köln, Germany
| | | |
Collapse
|
672
|
Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 2005; 33:W306-10. [PMID: 15980478 PMCID: PMC1160136 DOI: 10.1093/nar/gki375] [Citation(s) in RCA: 1252] [Impact Index Per Article: 65.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodynamic experimental data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, respectively. When predicting ΔΔG values associated with mutations, the correlation of predicted with expected/experimental values is 0.71 (with a standard error of 1.30 kcal/mol) and 0.62 (with a standard error of 1.45 kcal/mol) when structural or sequence information are respectively adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with atomic resolution. Availability: .
Collapse
Affiliation(s)
| | | | - Rita Casadio
- To whom correspondence should be addressed. Tel: +39 051 2094005; Fax: +39 051 242576;
| |
Collapse
|
673
|
Fogolari F, Tosatto SCE. Application of MM/PBSA colony free energy to loop decoy discrimination: toward correlation between energy and root mean square deviation. Protein Sci 2005; 14:889-901. [PMID: 15772305 PMCID: PMC2253447 DOI: 10.1110/ps.041004105] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Accurate free energy estimation is needed in many predictive tasks. The molecular mechanics/Poisson-Boltzmann solvent accessible surface area (MM/PBSA) approach has proven to be accurate. However, the correlation between the estimated free energy and the distance (e.g., root mean square deviation [RMSD]) from the most stable conformation is hindered by the strong free energy dependence on minor conformational variations. In this paper, a protocol for MM/PBSA free energy estimation is designed and tested on several loop decoy sets. We show that further integration of MM/PBSA free energy estimator with the colony energy approach makes the correlation between the free energy and RMSD from the native structure apparent, for the test sets on which it could be applied. Our results suggest that (1) the MM/PBSA free energy estimator is able to detect native-like structures for most decoy sets, and (2) application of the colony energy approach greatly hampers the MM/energy strong dependence on minor conformational changes.
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, Piazzale Kolbe 4, 33100 Udine, Italy.
| | | |
Collapse
|
674
|
Li H, Zhou Y. SCUD: fast structure clustering of decoys using reference state to remove overall rotation. J Comput Chem 2005; 26:1189-92. [PMID: 15954080 DOI: 10.1002/jcc.20251] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We developed a method for fast decoy clustering by using reference root-mean-squared distance (rRMSD) rather than commonly used pairwise RMSD (pRMSD) values. For 41 proteins with 2000 decoys each, the computing efficiency increases nine times without a significant change in the accuracy of near-native selections. Tests on additional protein decoys based on different reference conformations confirmed this result. Further analysis indicates that the pRMSD and rRMSD values are highly correlated (with an average correlation coefficient of 0.82) and the clusters obtained from pRMSD and rRMSD values are highly similar (the representative structures of the top five largest clusters from the two methods are 74% identical). SCUD (Structure ClUstering of Decoys) with an automatic cutoff value is available at http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Hongzhi Li
- Department of Physiology & Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | |
Collapse
|
675
|
González-Díaz H, Molina R, Uriarte E. Recognition of stable protein mutants with 3D stochastic average electrostatic potentials. FEBS Lett 2005; 579:4297-301. [PMID: 16081074 DOI: 10.1016/j.febslet.2005.06.065] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2004] [Revised: 06/07/2005] [Accepted: 06/23/2005] [Indexed: 11/15/2022]
Abstract
As more and more proteins are applied to biochemical research there is increasing interest in studying their stability. In this study, a Markov model has been used to calculate molecular descriptors of the protein structure and these are called the average electrostatic potentials (xi(k)). These descriptors were intended to encode indirect electrostatic pair-wise interactions between amino acids located at Euclidean distance k within a given 3D protein backbone. The different xi(k) values could be calculated for the protein as a whole or for specific protein regions (orbits), which include amino acids that lie within a given range of distances from the center of charge of the protein. In this work we calculated the xi(k) values for 657 mutants of different proteins. A Linear Discriminant Analysis model correctly classified a subset of 435 out of 493 proteins according to their thermal stability - a level of predictability of 88.2%. This experiment was repeated with three additional subsets of proteins selected at random from the initial series of 657. More specifically, the model predicted 314/356 (88.2%) of mutants with higher stability than the corresponding wild-type protein and 264/301 (86.7%) of proteins with near wild-type stability. These results illustrate the possibilities for the average stochastic potentials xi(k) in the study of 3D-structure/property relationships for biochemically relevant proteins.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain.
| | | | | |
Collapse
|
676
|
Zhou H, Zhang C, Liu S, Zhou Y. Web-based toolkits for topology prediction of transmembrane helical proteins, fold recognition, structure and binding scoring, folding-kinetics analysis and comparative analysis of domain combinations. Nucleic Acids Res 2005; 33:W193-7. [PMID: 15980453 PMCID: PMC1160121 DOI: 10.1093/nar/gki360] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2005] [Revised: 02/11/2005] [Accepted: 02/22/2005] [Indexed: 11/13/2022] Open
Abstract
We have developed the following web servers for protein structural modeling and analysis at http://theory.med.buffalo.edu: THUMBUP, UMDHMM(TMHP) and TUPS, predictors of transmembrane helical protein topology based on a mean-burial-propensity scale of amino acid residues (THUMBUP), hidden Markov model (UMDHMM(TMHP)) and their combinations (TUPS); SPARKS 2.0 and SP3, two profile-profile alignment methods, that match input query sequence(s) to structural templates by integrating sequence profile with knowledge-based structural score (SPARKS 2.0) and structure-derived profile (SP3); DFIRE, a knowledge-based potential for scoring free energy of monomers (DMONOMER), loop conformations (DLOOP), mutant stability (DMUTANT) and binding affinity of protein-protein/peptide/DNA complexes (DCOMPLEX & DDNA); TCD, a program for protein-folding rate and transition-state analysis of small globular proteins; and DOGMA, a web-server that allows comparative analysis of domain combinations between plant and other 55 organisms. These servers provide tools for prediction and/or analysis of proteins on the secondary structure, tertiary structure and interaction levels, respectively.
Collapse
Affiliation(s)
- Hongyi Zhou
- Department of Physiology & Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York at Buffalo124 Sherman Hall, Buffalo, NY 14214, USA
| | - Chi Zhang
- Department of Physiology & Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York at Buffalo124 Sherman Hall, Buffalo, NY 14214, USA
| | - Song Liu
- Department of Physiology & Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York at Buffalo124 Sherman Hall, Buffalo, NY 14214, USA
| | - Yaoqi Zhou
- Department of Physiology & Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York at Buffalo124 Sherman Hall, Buffalo, NY 14214, USA
- Department of Macromolecular Science, The Key Laboratory of Molecular Engineering of Polymers, Fudan UniversityShanghai, China
| |
Collapse
|
677
|
Abstract
We studied slower global coupled motions of the ribosome with half a microsecond of coarse-grained molecular dynamics. A low-resolution anharmonic network model that allows for the evolution of tertiary structure and long-scale sampling was developed and parameterized. Most importantly, we find that functionally important movements of L7/L12 and L1 lateral stalks are anticorrelated. Other principal directions of motions include widening of the tRNA cleft and the rotation of the small subunit which occurs as one block and is in phase with the movement of L1 stalk. The effect of the dynamical correlation pattern on the elongation process is discussed. Small fluctuations of the 3' tRNA termini and anticodon nucleotides show tight alignment of substrates for the reaction. Our model provides an efficient and reliable way to study the dynamics of large biomolecular systems composed of both proteins and nucleic acids.
Collapse
Affiliation(s)
- Joanna Trylska
- Department of Chemistry and Biochemistry and Center for Theoretical Biological Physics, University of California at San Diego, La Jolla, California, USA.
| | | | | |
Collapse
|
678
|
Buchete NV, Straub JE, Thirumalai D. Development of novel statistical potentials for protein fold recognition. Curr Opin Struct Biol 2005; 14:225-32. [PMID: 15093838 DOI: 10.1016/j.sbi.2004.03.002] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The need to perform large-scale studies of protein fold recognition, structure prediction and protein-protein interactions has led to novel developments of residue-level minimal models of proteins. A minimum requirement for useful protein force-fields is that they be successful in the recognition of native conformations. The balance between the level of detail in describing the specific interactions within proteins and the accuracy obtained using minimal protein models is the focus of many current protein studies. Recent results suggest that the introduction of explicit orientation dependence in a coarse-grained, residue-level model improves the ability of inter-residue potentials to recognize the native state. New statistical and optimization computational algorithms can be used to obtain accurate residue-dependent potentials for use in protein fold recognition and, more importantly, structure prediction.
Collapse
Affiliation(s)
- N-V Buchete
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | |
Collapse
|
679
|
|
680
|
Li X, Liang J. Geometric cooperativity and anticooperativity of three-body interactions in native proteins. Proteins 2005; 60:46-65. [PMID: 15849756 DOI: 10.1002/prot.20438] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Characterizing multibody interactions of hydrophobic, polar, and ionizable residues in protein is important for understanding the stability of protein structures. We introduce a geometric model for quantifying 3-body interactions in native proteins. With this model, empirical propensity values for many types of 3-body interactions can be reliably estimated from a database of native protein structures, despite the overwhelming presence of pairwise contacts. In addition, we define a nonadditive coefficient that characterizes cooperativity and anticooperativity of residue interactions in native proteins by measuring the deviation of 3-body interactions from 3 independent pairwise interactions. It compares the 3-body propensity value from what would be expected if only pairwise interactions were considered, and highlights the distinction of propensity and cooperativity of 3-body interaction. Based on the geometric model, and what can be inferred from statistical analysis of such a model, we find that hydrophobic interactions and hydrogen-bonding interactions make nonadditive contributions to protein stability, but the nonadditive nature depends on whether such interactions are located in the protein interior or on the protein surface. When located in the interior, many hydrophobic interactions such as those involving alkyl residues are anticooperative. Salt-bridge and regular hydrogen-bonding interactions, such as those involving ionizable residues and polar residues, are cooperative. When located on the protein surface, these salt-bridge and regular hydrogen-bonding interactions are anticooperative, and hydrophobic interactions involving alkyl residues become cooperative. We show with examples that incorporating 3-body interactions improves discrimination of protein native structures against decoy conformations. In addition, analysis of cooperative 3-body interaction may reveal spatial motifs that can suggest specific protein functions.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioengineering, SEO, MC-063, University of Illinois at Chicago, Chicago, Illinois 60607-7052, USA
| | | |
Collapse
|
681
|
Liu Z, Mao F, Guo JT, Yan B, Wang P, Qu Y, Xu Y. Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential. Nucleic Acids Res 2005; 33:546-58. [PMID: 15673715 PMCID: PMC548349 DOI: 10.1093/nar/gki204] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Computational evaluation of protein–DNA interaction is important for the identification of DNA-binding sites and genome annotation. It could validate the predicted binding motifs by sequence-based approaches through the calculation of the binding affinity between a protein and DNA. Such an evaluation should take into account structural information to deal with the complicated effects from DNA structural deformation, distance-dependent multi-body interactions and solvation contributions. In this paper, we present a knowledge-based potential built on interactions between protein residues and DNA tri-nucleotides. The potential, which explicitly considers the distance-dependent two-body, three-body and four-body interactions between protein residues and DNA nucleotides, has been optimized in terms of a Z-score. We have applied this knowledge-based potential to evaluate the binding affinities of zinc-finger protein–DNA complexes. The predicted binding affinities are in good agreement with the experimental data (with a correlation coefficient of 0.950). On a larger test set containing 48 protein–DNA complexes with known experimental binding free energies, our potential has achieved a high correlation coefficient of 0.800, when compared with the experimental data. We have also used this potential to identify binding motifs in DNA sequences of transcription factors (TF). The TFs in 79.4% of the known TF–DNA complexes have accurately found their native binding sequences from a large pool of DNA sequences. When tested in a genome-scale search for TF-binding motifs of the cyclic AMP regulatory protein (CRP) of Escherichia coli, this potential ranks all known binding motifs of CRP in the top 15% of all candidate sequences.
Collapse
Affiliation(s)
- Zhijie Liu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Fenglou Mao
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Jun-tao Guo
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Bo Yan
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Peng Wang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Youxing Qu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
- Computational Biology Institute, Oak Ridge National LaboratoryOak Ridge, TN 37831, USA
- To whom correspondence should be addressed. Tel: +1 706 542 9779; Fax: +1 706 542 9751;
| |
Collapse
|
682
|
Zhang C, Liu S, Zhou H, Zhou Y. The dependence of all-atom statistical potentials on structural training database. Biophys J 2005; 86:3349-58. [PMID: 15189839 PMCID: PMC1304244 DOI: 10.1529/biophysj.103.035998] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An accurate statistical energy function that is suitable for the prediction of protein structures of all classes should be independent of the structural database used for energy extraction. Here, two high-resolution, low-sequence-identity structural databases of 333 alpha-proteins and 271 beta-proteins were built for examining the database dependence of three all-atom statistical energy functions. They are RAPDF (residue-specific all-atom conditional probability discriminatory function), atomic KBP (atomic knowledge-based potential), and DFIRE (statistical potential based on distance-scaled finite ideal-gas reference state). These energy functions differ in the reference states used for energy derivation. The energy functions extracted from the different structural databases are used to select native structures from multiple decoys of 64 alpha-proteins and 28 beta-proteins. The performance in native structure selections indicates that the DFIRE-based energy function is mostly independent of the structural database whereas RAPDF and KBP have a significant dependence. The construction of two additional structural databases of alpha/beta and alpha + beta-proteins further confirmed the weak dependence of DFIRE on the structural databases of various structural classes. The possible source for the difference between the three all-atom statistical energy functions is that the physical reference state of ideal gas used in the DFIRE-based energy function is least dependent on the structural database.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
683
|
González-Díaz H, Uriarte E, Ramos de Armas R. Predicting stability of Arc repressor mutants with protein stochastic moments. Bioorg Med Chem 2005; 13:323-31. [PMID: 15598555 DOI: 10.1016/j.bmc.2004.10.024] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 10/08/2004] [Accepted: 10/09/2004] [Indexed: 11/18/2022]
Abstract
As more and more protein structures are determined and applied to drug manufacture, there is increasing interest in studying their stability. In this study, the stochastic moments ((SR)pi(k)) of 53 Arc repressor mutants were introduced as molecular descriptors modeling protein stability. The Linear Discriminant Analysis model developed correctly classified 43 out of 53, 81.13% of proteins according to their thermal stability. More specifically, the model classified 20/28 (71.4%) proteins with near wild-type stability and 23/25 (92%) proteins with reduced stability. Moreover, validation of the model was carried out by re-substitution procedures (81.0%). In addition, the stochastic moments based model compared favorably with respect to others based on physicochemical and geometric parameters such as D-Fire potential, surface area, volume, partition coefficient, and molar refractivity, which presented less than 77% of accuracy. This result illustrates the possibilities of the stochastic moments' method for the study of bioorganic and medicinal chemistry relevant proteins.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15706, Spain.
| | | | | |
Collapse
|
684
|
Wiederstein M, Sippl MJ. Protein sequence randomization: efficient estimation of protein stability using knowledge-based potentials. J Mol Biol 2004; 345:1199-212. [PMID: 15644215 DOI: 10.1016/j.jmb.2004.11.012] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2004] [Revised: 11/05/2004] [Accepted: 11/07/2004] [Indexed: 11/27/2022]
Abstract
Modifications of the amino acid sequence generally affect protein stability. Here, we use knowledge-based potentials to estimate the stability of protein structures under sequence variation. Calculations on a variety of protein scaffolds result in a clear distinction of known mutable regions from arbitrarily chosen control patches. For example, randomly changing the sequence of an antibody paratope yields a significantly lower number of destabilized mutants as compared to the randomization of comparable regions on the protein surface. The technique is computationally efficient and can be used to screen protein structures for regions that are amenable to molecular tinkering by preserving the stability of the mutated proteins.
Collapse
Affiliation(s)
- Markus Wiederstein
- Center of Applied Molecular Engineering, University of Salzburg, Jakob Haringerstrasse 5, 5020 Salzburg, Austria
| | | |
Collapse
|
685
|
Devos D, Dokudovskaya S, Alber F, Williams R, Chait BT, Sali A, Rout MP. Components of coated vesicles and nuclear pore complexes share a common molecular architecture. PLoS Biol 2004; 2:e380. [PMID: 15523559 PMCID: PMC524472 DOI: 10.1371/journal.pbio.0020380] [Citation(s) in RCA: 309] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2004] [Accepted: 08/07/2004] [Indexed: 11/23/2022] Open
Abstract
Numerous features distinguish prokaryotes from eukaryotes, chief among which are the distinctive internal membrane systems of eukaryotic cells. These membrane systems form elaborate compartments and vesicular trafficking pathways, and sequester the chromatin within the nuclear envelope. The nuclear pore complex is the portal that specifically mediates macromolecular trafficking across the nuclear envelope. Although it is generally understood that these internal membrane systems evolved from specialized invaginations of the prokaryotic plasma membrane, it is not clear how the nuclear pore complex could have evolved from organisms with no analogous transport system. Here we use computational and biochemical methods to perform a structural analysis of the seven proteins comprising the yNup84/vNup107–160 subcomplex, a core building block of the nuclear pore complex. Our analysis indicates that all seven proteins contain either a β-propeller fold, an α-solenoid fold, or a distinctive arrangement of both, revealing close similarities between the structures comprising the yNup84/vNup107–160 subcomplex and those comprising the major types of vesicle coating complexes that maintain vesicular trafficking pathways. These similarities suggest a common evolutionary origin for nuclear pore complexes and coated vesicles in an early membrane-curving module that led to the formation of the internal membrane systems in modern eukaryotes. Structural similarities between the proteins of a nuclear pore subcomplex and proteins comprising vesicle coating complexes indicate a common origin for nuclear pore complexes and coated vesicles
Collapse
Affiliation(s)
- Damien Devos
- 1Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, CaliforniaUnited States of America
| | - Svetlana Dokudovskaya
- 2Laboratory of Cellular and Structural Biology, Rockefeller UniversityNew York, New YorkUnited States of America
| | - Frank Alber
- 1Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, CaliforniaUnited States of America
| | - Rosemary Williams
- 2Laboratory of Cellular and Structural Biology, Rockefeller UniversityNew York, New YorkUnited States of America
| | - Brian T Chait
- 3Laboratory of Mass Spectrometry and Gaseous Ion Chemistry, Rockefeller UniversityNew York, New YorkUnited States of America
| | - Andrej Sali
- 1Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of CaliforniaSan Francisco, CaliforniaUnited States of America
| | - Michael P Rout
- 2Laboratory of Cellular and Structural Biology, Rockefeller UniversityNew York, New YorkUnited States of America
| |
Collapse
|
686
|
Zhang C, Liu S, Zhou H, Zhou Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci 2004; 13:400-11. [PMID: 14739325 PMCID: PMC2286718 DOI: 10.1110/ps.03348304] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Structure prediction on a genomic scale requires a simplified energy function that can efficiently sample the conformational space of polypeptide chains. A good energy function at minimum should discriminate native structures against decoys. Here, we show that a recently developed, residue-specific, all-atom knowledge-based potential (167 atomic types) based on distance-scaled, finite ideal-gas reference state (DFIRE-all-atom) can be substantially simplified to 20 residue types located at side-chain center of mass (DFIRE-SCM) without a significant change in its capability of structure discrimination. Using 96 standard multiple decoy sets, we show that there is only a small reduction (from 80% to 78%) in success rate of ranking native structures as the top 1. The success rate is higher than two previously developed, all-atom distance-dependent statistical pair potentials. Applied to structure selections of 21 docking decoys without modification, the DFIRE-SCM potential is 29% more successful in recognizing native complex structures than an all-atom statistical potential trained by a database of dimeric interfaces. The potential also achieves 92% accuracy in distinguishing true dimeric interfaces from artificial crystal interfaces. In addition, the DFIRE potential with the C(alpha) positions as the interaction centers recognizes 123 native structures out of a comprehensive 125-protein TOUCHSTONE decoy set in which each protein has 24,000 decoys with only C(alpha) positions. Furthermore, the performance by DFIRE-SCM on newly established 25 monomeric and 31 docking Rosetta-decoy sets is comparable to (or better than in the case of monomeric decoy sets) that of a recently developed, all-atom Rosetta energy function enhanced with an orientation-dependent hydrogen bonding potential.
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, SUNY Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | | | | | |
Collapse
|
687
|
Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004; 55:1005-13. [PMID: 15146497 DOI: 10.1002/prot.20007] [Citation(s) in RCA: 163] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, New York 14214, USA
| | | |
Collapse
|
688
|
Bordner AJ, Abagyan RA. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins 2004; 57:400-13. [PMID: 15340927 DOI: 10.1002/prot.20185] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We have developed a method to both predict the geometry and the relative stability of point mutants that may be used for arbitrary mutations. The geometry optimization procedure was first tested on a new benchmark of 2141 ordered pairs of X-ray crystal structures of proteins that differ by a single point mutation, the largest data set to date. An empirical energy function, which includes terms representing the energy contributions of the folded and denatured proteins and uses the predicted mutant side chain conformation, was fit to a training set consisting of half of a diverse set of 1816 experimental stability values for single point mutations in 81 different proteins. The data included a substantial number of small to large residue mutations not considered by previous prediction studies. After removing 22 (approximately 2%) outliers, the stability calculation gave a standard deviation of 1.08 kcal/mol with a correlation coefficient of 0.82. The prediction method was then tested on the remaining half of the experimental data, giving a standard deviation of 1.10 kcal/mol and covariance of 0.66 for 97% of the test set. A regression fit of the energy function to a subset of 137 mutants, for which both native and mutant structures were available, gave a prediction error comparable to that for the complete training set with predicted side chain conformations. We found that about half of the variation is due to conformation-independent residue contributions. Finally, a fit to the experimental stability data using these residue parameters exclusively suggests guidelines for improving protein stability in the absence of detailed structure information.
Collapse
Affiliation(s)
- A J Bordner
- The Scripps Research Institute, 10550 North Torrey Pines Rd., Mail TPC-28, San Diego, California, USA.
| | | |
Collapse
|
689
|
Ramos de Armas R, González Díaz H, Molina R, Uriarte E. Markovian Backbone Negentropies: Molecular descriptors for protein research. I. Predicting protein stability in Arc repressor mutants. Proteins 2004; 56:715-23. [PMID: 15281125 DOI: 10.1002/prot.20159] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
As more and more protein structures are determined and applied to drug manufacture, there is increasing interest in studying their stability. In this sense, developing novel computational methods to predict and study protein stability in relation to their amino acid sequences has become a significant goal in applied Proteomics. In the study described here, Markovian Backbone Negentropies (MBN) have been introduced in order to model the effect on protein stability of a complete set of alanine substitutions in the Arc repressor. A total of 53 proteins were studied by means of Linear Discriminant Analysis using MBN as molecular descriptors. MBN are molecular descriptors based on a Markov chain model of electron delocalization throughout the protein backbone. The model correctly classified 43 out of 53 (81.13%) proteins according to their thermal stability. More specifically, the model classified 20/28 (71.4%) proteins with near wild-type stability and 23/25 (92%) proteins with reduced stability. Moreover, the model presented a good Mathew's regression coefficient of 0.643. Validation of the model was carried out by several Jackknife procedures. The method compares favorably with surface-dependent and thermodynamic parameter stability scoring functions. For instance, the D-FIRE potential classification function shows a level of good classification of 76.9%. On the other hand, surface, volume, logP, and molar refractivity show accuracies of 70.7, 62.3, 59.0, and 60.0%, respectively.
Collapse
|
690
|
Liu S, Zhang C, Zhou H, Zhou Y. A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins 2004; 56:93-101. [PMID: 15162489 DOI: 10.1002/prot.20019] [Citation(s) in RCA: 159] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Extracting knowledge-based statistical potential from known structures of proteins is proved to be a simple, effective method to obtain an approximate free-energy function. However, the different compositions of amino acid residues at the core, the surface, and the binding interface of proteins prohibited the establishment of a unified statistical potential for folding and binding despite the fact that the physical basis of the interaction (water-mediated interaction between amino acids) is the same. Recently, a physical state of ideal gas, rather than a statistically averaged state, has been used as the reference state for extracting the net interaction energy between amino acid residues of monomeric proteins. Here, we find that this monomer-based potential is more accurate than an existing all-atom knowledge-based potential trained with interfacial structures of dimers in distinguishing native complex structures from docking decoys (100% success rate vs. 52% in 21 dimer/trimer decoy sets). It is also more accurate than a recently developed semiphysical empirical free-energy functional enhanced by an orientation-dependent hydrogen-bonding potential in distinguishing native state from Rosetta docking decoys (94% success rate vs. 74% in 31 antibody-antigen and other complexes based on Z score). In addition, the monomer potential achieved a 93% success rate in distinguishing true dimeric interfaces from artificial crystal interfaces. More importantly, without additional parameters, the potential provides an accurate prediction of binding free energy of protein-peptide and protein-protein complexes (a correlation coefficient of 0.87 and a root-mean-square deviation of 1.76 kcal/mol with 69 experimental data points). This work marks a significant step toward a unified knowledge-based potential that quantitatively captures the common physical principle underlying folding and binding. A Web server for academic users, established for the prediction of binding free energy and the energy evaluation of the protein-protein complexes, may be found at http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Song Liu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
691
|
Zhang C, Liu S, Zhou Y. Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential. Protein Sci 2004; 13:391-9. [PMID: 14739324 PMCID: PMC2286705 DOI: 10.1110/ps.03411904] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Revised: 10/17/2003] [Accepted: 10/17/2003] [Indexed: 10/26/2022]
Abstract
The conformations of loops are determined by the water-mediated interactions between amino acid residues. Energy functions that describe the interactions can be derived either from physical principles (physical-based energy function) or statistical analysis of known protein structures (knowledge-based statistical potentials). It is commonly believed that statistical potentials are appropriate for coarse-grained representation of proteins but are not as accurate as physical-based potentials when atomic resolution is required. Several recent applications of physical-based energy functions to loop selections appear to support this view. In this article, we apply a recently developed DFIRE-based statistical potential to three different loop decoy sets (RAPPER, Jacobson, and Forrest-Woolf sets). Together with a rotamer library for side-chain optimization, the performance of DFIRE-based potential in the RAPPER decoy set (385 loop targets) is comparable to that of AMBER/GBSA for short loops (two to eight residues). The DFIRE is more accurate for longer loops (9 to 12 residues). Similar trend is observed when comparing DFIRE with another physical-based OPLS/SGB-NP energy function in the large Jacobson decoy set (788 loop targets). In the Forrest-Woolf decoy set for the loops of membrane proteins, the DFIRE potential performs substantially better than the combination of the CHARMM force field with several solvation models. The results suggest that a single-term DFIRE-statistical energy function can provide an accurate loop prediction at a fraction of computing cost required for more complicate physical-based energy functions. A Web server for academic users is established for loop selection at the softwares/services section of the Web site http://theory.med.buffalo.edu/.
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics and Department of Physiology and Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | | | |
Collapse
|
692
|
Abstract
A protein model that is simple enough to be used in protein-folding simulations but accurate enough to identify a protein native fold is described. Its geometry consists of describing the residues by one, two, or three pseudoatoms, depending on the residue size. Its energy is given by a pairwise, knowledge-based potential obtained for all the pseudoatoms as a function of their relative distance. The pseudoatomic potential is also a function of the primary chain separation and residue order. The model is tested by gapless threading on a large, representative set of known protein and decoy structures obtained from the "Decoys 'R' Us" database. It is also tested by threading on gapped decoys generated for proteins with many homologs. The gapless threading tests show near 98% native-structure recognition as the lowest energy structure and almost 100% as one of the three lowest energy structures for over 2200 test proteins. In decoy threading tests, the model recognized the majority of the native structures. It is also able to recognize native structures among gapped decoys, in spite of close structural similarities. The results indicate that the pseudoatomic model has native recognition ability similar to comparable atomic-based models but much better than equivalent residue-based models.
Collapse
Affiliation(s)
- Marcos R Betancourt
- University at Buffalo Center of Excellence in Bioinformatics, Buffalo, New York 14203, USA
| |
Collapse
|
693
|
De Sancho D, Prieto L, Rubio AM, Rey A. Evolutionary method for the assembly of rigid protein fragments. J Comput Chem 2004; 26:131-41. [PMID: 15584079 DOI: 10.1002/jcc.20150] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Genetic algorithms constitute a powerful optimization method that has already been used in the study of the protein folding problem. However, they often suffer from a lack of convergence in a reasonably short time for complex fitness functions. Here, we propose an evolutionary strategy that can reproducibly find structures close to the minimum of a potential function for a simplified protein model in an efficient way. The model reduces the number of degrees of freedom of the system by treating the protein structure as composed of rigid fragments. The search incorporates a double encoding procedure and a merging operation from subpopulations that evolve independently of one another, both contributing to the good performance of the full algorithm. We have tested it with protein structures of different degrees of complexity, and present our conclusions related to its possible application as an efficient tool for the analysis of folding potentials.
Collapse
Affiliation(s)
- David De Sancho
- Departamento de Química Física, Facultad de Ciencias Químicas, Universidad Complutense, E-28040 Madrid, Spain
| | | | | | | |
Collapse
|
694
|
Abstract
The average contribution of individual residue to folding stability and its dependence on buried accessible surface area (ASA) are obtained by two different approaches. One is based on experimental mutation data, and the other uses a new knowledge-based atom-atom potential of mean force. We show that the contribution of a residue has a significant correlation with buried ASA and the regression slopes of 20 amino acid residues (called the buriability) are all positive (pro-burial). The buriability parameter provides a quantitative measure of the driving force for the burial of a residue. The large buriability gap observed between hydrophobic and hydrophilic residues is responsible for the burial of hydrophobic residues in soluble proteins. Possible factors that contribute to the buriability gap are discussed.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, 14214, USA
| | | |
Collapse
|
695
|
Zhu J, Zhu Q, Shi Y, Liu H. How well can we predict native contacts in proteins based on decoy structures and their energies? Proteins 2003; 52:598-608. [PMID: 12910459 DOI: 10.1002/prot.10444] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
One strategy for ab initio protein structure prediction is to generate a large number of possible structures (decoys) and select the most fitting ones based on a scoring or free energy function. The conformational space of a protein is huge, and chances are rare that any heuristically generated structure will directly fall in the neighborhood of the native structure. It is desirable that, instead of being thrown away, the unfitting decoy structures can provide insights into native structures so prediction can be made progressively. First, we demonstrate that a recently parameterized physics-based effective free energy function based on the GROMOS96 force field and a generalized Born/surface area solvent model is, as several other physics-based and knowledge-based models, capable of distinguishing native structures from decoy structures for a number of widely used decoy databases. Second, we observe a substantial increase in correlations of the effective free energies with the degree of similarity between the decoys and the native structure, if the similarity is measured by the content of native inter-residue contacts in a decoy structure rather than its root-mean-square deviation from the native structure. Finally, we investigate the possibility of predicting native contacts based on the frequency of occurrence of contacts in decoy structures. For most proteins contained in the decoy databases, a meaningful amount of native contacts can be predicted based on plain frequencies of occurrence at a relatively high level of accuracy. Relative to using plain frequencies, overwhelming improvements in sensitivity of the predictions are observed for the 4_state_reduced decoy sets by applying energy-dependent weighting of decoy structures in determining the frequency. There, approximately 80% native contacts can be predicted at an accuracy of approximately 80% using energy-weighted frequencies. The sensitivity of the plain frequency approach is much lower (20% to 40%). Such improvements are, however, not observed for the other decoy databases. The rationalization and implications of the results are discussed.
Collapse
Affiliation(s)
- Jiang Zhu
- Key Laboratory of Structural Biology, University of Science and Technology of China, Chinese Academy of Sciences, School of Life Sciences, Hefei, Anhui, 230026, China.
| | | | | | | |
Collapse
|