Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vendruscolo M, Najmanovich R, Domany E. Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? Proteins 2000;38:134-48. [PMID: 10656261 DOI: 10.1002/(sici)1097-0134(20000201)38:2<134::aid-prot3>3.0.co;2-a] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

For:	Vendruscolo M, Najmanovich R, Domany E. Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? Proteins 2000;38:134-48. [PMID: 10656261 DOI: 10.1002/(sici)1097-0134(20000201)38:2<134::aid-prot3>3.0.co;2-a] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Number

Cited by Other Article(s)

Wang D, Frechette LB, Best RB. On the role of native contact cooperativity in protein folding. Proc Natl Acad Sci U S A 2024;121:e2319249121. [PMID: 38776371 PMCID: PMC11145220 DOI: 10.1073/pnas.2319249121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/11/2024] [Indexed: 05/25/2024] Open

Blaszczyk M, Gront D, Kmiecik S, Kurcinski M, Kolinski M, Ciemny MP, Ziolkowska K, Panek M, Kolinski A. Protein Structure Prediction Using Coarse-Grained Models. SPRINGER SERIES ON BIO- AND NEUROSYSTEMS 2019. [DOI: 10.1007/978-3-319-95843-9_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Masso M. All-atom four-body knowledge-based statistical potential to distinguish native tertiary RNA structures from nonnative folds. J Theor Biol 2018;453:58-67. [PMID: 29782930 DOI: 10.1016/j.jtbi.2018.05.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 04/05/2018] [Accepted: 05/17/2018] [Indexed: 11/16/2022]

Saravanan KM, Suvaithenamudhan S, Parthasarathy S, Selvaraj S. Pairwise contact energy statistical potentials can help to find probability of point mutations. Proteins 2016;85:54-64. [PMID: 27761949 DOI: 10.1002/prot.25191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 06/16/2016] [Accepted: 10/13/2016] [Indexed: 11/10/2022]

Chen L, He J. A distance- and orientation-dependent energy function of amino acid key blocks. Biopolymers 2016;101:681-92. [PMID: 24222511 DOI: 10.1002/bip.22440] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 10/31/2013] [Accepted: 11/01/2013] [Indexed: 01/03/2023]

Elhefnawy W, Chen L, Han Y, Li Y. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling. J Mol Biol 2015;427:2562-2576. [DOI: 10.1016/j.jmb.2015.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 05/21/2015] [Indexed: 11/16/2022]

Chae MH, Krull F, Knapp EW. Optimized distance-dependent atom-pair-based potential DOOP for protein structure prediction. Proteins 2015;83:881-90. [PMID: 25693513 DOI: 10.1002/prot.24782] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 02/06/2015] [Accepted: 02/10/2015] [Indexed: 12/20/2022]

Abstract

The DOcking decoy-based Optimized Potential (DOOP) energy function for protein structure prediction is based on empirical distance-dependent atom-pair interactions. To optimize the atom-pair interactions, native protein structures are decomposed into polypeptide chain segments that correspond to structural motives involving complete secondary structure elements. They constitute near native ligand-receptor systems (or just pairs). Thus, a total of 8609 ligand-receptor systems were prepared from 954 selected proteins. For each of these hypothetical ligand-receptor systems, 1000 evenly sampled docking decoys with 0-10 Å interface root-mean-square-deviation (iRMSD) were generated with a method used before for protein-protein docking. A neural network-based optimization method was applied to derive the optimized energy parameters using these decoys so that the energy function mimics the funnel-like energy landscape for the interaction between these hypothetical ligand-receptor systems. Thus, our method hierarchically models the overall funnel-like energy landscape of native protein structures. The resulting energy function was tested on several commonly used decoy sets for native protein structure recognition and compared with other statistical potentials. In combination with a torsion potential term which describes the local conformational preference, the atom-pair-based potential outperforms other reported statistical energy functions in correct ranking of native protein structures for a variety of decoy sets. This is especially the case for the most challenging ROSETTA decoy set, although it does not take into account side chain orientation-dependence explicitly. The DOOP energy function for protein structure prediction, the underlying database of protein structures with hypothetical ligand-receptor systems and their decoys are freely available at http://agknapp.chemie.fu-berlin.de/doop/.

Collapse

On simplified global nonlinear function for fitness landscape: a case study of inverse protein folding. PLoS One 2014;9:e104403. [PMID: 25110986 PMCID: PMC4128808 DOI: 10.1371/journal.pone.0104403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Accepted: 07/14/2014] [Indexed: 11/19/2022] Open

How good are simplified models for protein structure prediction? Adv Bioinformatics 2014;2014:867179. [PMID: 24876837 PMCID: PMC4022063 DOI: 10.1155/2014/867179] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/22/2014] [Accepted: 01/23/2014] [Indexed: 11/18/2022] Open

Custódio FL, Barbosa HJ, Dardenne LE. A multiple minima genetic algorithm for protein structure prediction. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.10.029] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Chhajer M, Crippen GM. Toward correct protein folding potentials. J Biol Phys 2013;30:171-85. [PMID: 23345867 DOI: 10.1023/b:jobp.0000035854.68334.dd] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Pandey RB, Farmer BL. Random coil to globular thermal response of a protein (H3.1) with three knowledge-based coarse-grained potentials. PLoS One 2012;7:e49352. [PMID: 23166645 PMCID: PMC3498164 DOI: 10.1371/journal.pone.0049352] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/10/2012] [Indexed: 11/19/2022] Open

Romero PA, Arnold FH. Random field model reveals structure of the protein recombinational landscape. PLoS Comput Biol 2012;8:e1002713. [PMID: 23055915 PMCID: PMC3464211 DOI: 10.1371/journal.pcbi.1002713] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 08/03/2012] [Indexed: 11/28/2022] Open

Abstract

We are interested in how intragenic recombination contributes to the evolution of proteins and how this mechanism complements and enhances the diversity generated by random mutation. Experiments have revealed that proteins are highly tolerant to recombination with homologous sequences (mutation by recombination is conservative); more surprisingly, they have also shown that homologous sequence fragments make largely additive contributions to biophysical properties such as stability. Here, we develop a random field model to describe the statistical features of the subset of protein space accessible by recombination, which we refer to as the recombinational landscape. This model shows quantitative agreement with experimental results compiled from eight libraries of proteins that were generated by recombining gene fragments from homologous proteins. The model reveals a recombinational landscape that is highly enriched in functional sequences, with properties dominated by a large-scale additive structure. It also quantifies the relative contributions of parent sequence identity, crossover locations, and protein fold to the tolerance of proteins to recombination. Intragenic recombination explores a unique subset of sequence space that promotes rapid molecular diversification and functional adaptation.

Mutation and recombination are the primary sources of genetic variation in evolving populations. The relative benefit of these two diversification mechanisms and how they complement each other has been a long-standing question in evolutionary biology. While it is clear what types of genetic diversity these two mechanisms can create, a significant challenge is relating these sequence changes to changes in fitness. The fitness landscape, which describes this mapping from genotype to phenotype, is extraordinarily complex and defined over an incomprehensibly large space of sequences. Here, we develop a model of the landscape that relies not on the details of this mapping, but rather on the statistical relationships between sequences. By studying the expected values of landscape properties, we can gain insights into the structure of the landscape that are independent of the details of how genotype dictates phenotype. We use this random field model to understand how recombination explores a functionally enriched and diverse subset of protein sequence space.

Collapse

Zhao F, Xu J. A position-specific distance-dependent statistical potential for protein structure and functional study. Structure 2012;20:1118-26. [PMID: 22608968 PMCID: PMC3372698 DOI: 10.1016/j.str.2012.04.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Revised: 04/09/2012] [Accepted: 04/10/2012] [Indexed: 10/28/2022]

Zimmermann MT, Leelananda SP, Kloczkowski A, Jernigan RL. Combining statistical potentials with dynamics-based entropies improves selection from protein decoys and docking poses. J Phys Chem B 2012;116:6725-31. [PMID: 22490366 DOI: 10.1021/jp2120143] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Ravikant DVS, Elber R. Energy design for protein-protein interactions. J Chem Phys 2012;135:065102. [PMID: 21842951 DOI: 10.1063/1.3615722] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011;51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Abstract

Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).

Collapse

Dong Q, Zhou S. Novel nonlinear knowledge-based mean force potentials based on machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:476-486. [PMID: 20820079 DOI: 10.1109/tcbb.2010.86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Abstract

The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.

Collapse

Procacci P. Thermodynamics of stacking interactions in proteins. ACTA ACUST UNITED AC 2011. [DOI: 10.1039/c1pc90009a] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010;11:283. [PMID: 20507547 PMCID: PMC3583236 DOI: 10.1186/1471-2105-11-283] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/27/2010] [Indexed: 11/23/2022] Open

Abstract

Background

Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure.

Results

We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the C_βatoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity.

Conclusions

Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.

Collapse

Ravikant DVS, Elber R. PIE-efficient filters and coarse grained potentials for unbound protein-protein docking. Proteins 2010;78:400-19. [PMID: 19768784 DOI: 10.1002/prot.22550] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Wang Z, Tegge AN, Cheng J. Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 2009;75:638-47. [PMID: 19004001 DOI: 10.1002/prot.22275] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Reconstruction and stability of secondary structure elements in the context of protein structure prediction. Biophys J 2009;96:4399-408. [PMID: 19486664 DOI: 10.1016/j.bpj.2009.02.057] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Revised: 01/28/2009] [Accepted: 02/19/2009] [Indexed: 11/20/2022] Open

Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009;76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Abstract

Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.

Collapse

Li Q, Zhou C, Liu H. Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities. Proteins 2009;74:820-36. [PMID: 18704928 DOI: 10.1002/prot.22191] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Gu J, Li H, Jiang H, Wang X. Optimizing energy potential for protein fold recognition with parametric evaluation function. J Comput Biol 2009;16:427-42. [PMID: 19254182 DOI: 10.1089/cmb.2008.0128] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Data Mining in Organic Crystallography. DATA MINING IN CRYSTALLOGRAPHY 2009. [DOI: 10.1007/978-3-642-04759-6_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Randall A, Baldi P. SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs. BMC STRUCTURAL BIOLOGY 2008;8:52. [PMID: 19055744 PMCID: PMC2667183 DOI: 10.1186/1472-6807-8-52] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Accepted: 12/03/2008] [Indexed: 11/10/2022]

Abstract

Background

Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.

Results

Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, β-strand pairing, and side-chain hydrogen bonding.

SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER).

Conclusion

SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: . SELECTpro is also available as a public server at the same site.

Collapse

Hoang TX, Seno F, Trovato A, Banavar JR, Maritan A. Inference of the solvation energy parameters of amino acids using maximum entropy approach. J Chem Phys 2008;129:035102. [PMID: 18647046 DOI: 10.1063/1.2953691] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Wendel C, Gohlke H. Predicting transmembrane helix pair configurations with knowledge-based distance-dependent pair potentials. Proteins 2008;70:984-99. [PMID: 17847096 DOI: 10.1002/prot.21574] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

As a first step toward a novel de novo structure prediction approach for alpha-helical membrane proteins, we developed coarse-grained knowledge-based potentials to score the mutual configuration of transmembrane (TM) helices. Using a comprehensive database of 71 known membrane protein structures, pairwise potentials depending solely on amino acid types and distances between C(alpha)-atoms were derived. To evaluate the potentials, they were used as an objective function for the rigid docking of 442 TM helix pairs. This is by far the largest test data set reported to date for that purpose. After clustering 500 docking runs for each pair and considering the largest cluster, we found solutions with a root mean squared (RMS) deviation <2 A for about 30% of all helix pairs. Encouragingly, if only clusters that contain at least 20% of all decoys are considered, a success rate >71% (with a RMS deviation <2 A) is obtained. The cluster size thus serves as a measure of significance to identify good docking solutions. In a leave-one-protein-family-out cross-validation study, more than 2/3 of the helix pairs were still predicted with an RMS deviation <2.5 A (if only clusters that contain at least 20% of all decoys are considered). This demonstrates the predictive power of the potentials in general, although it is advisable to further extend the knowledge base to derive more robust potentials in the future. When compared to the scoring function of Fleishman and Ben-Tal, a comparable performance is found by our cross-validated potentials. Finally, well-predicted "anchor helix pairs" can be reliably identified for most of the proteins of the test data set. This is important for an extension of the approach towards TM helix bundles because these anchor pairs will act as "nucleation sites" to which more helices will be added subsequently, which alleviates the sampling problem.

Collapse

Wallner B, Elofsson A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 2008;69 Suppl 8:184-93. [PMID: 17894353 DOI: 10.1002/prot.21774] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

On the optimal contact potential of proteins. Chem Phys Lett 2008. [DOI: 10.1016/j.cplett.2007.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Feng Y, Kloczkowski A, Jernigan RL. Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins 2007;68:57-66. [PMID: 17393455 DOI: 10.1002/prot.21362] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Ferrada E, Vergara IA, Melo F. A knowledge-based potential with an accurate description of local interactions improves discrimination between native and near-native protein conformations. Cell Biochem Biophys 2007;49:111-24. [PMID: 17906366 DOI: 10.1007/s12013-007-0050-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Revised: 11/30/1999] [Accepted: 07/16/2007] [Indexed: 10/22/2022]

Fogolari F, Pieri L, Dovier A, Bortolussi L, Giugliarelli G, Corazza A, Esposito G, Viglino P. Scoring predictive models using a reduced representation of proteins: model and energy definition. BMC STRUCTURAL BIOLOGY 2007;7:15. [PMID: 17378941 PMCID: PMC1854906 DOI: 10.1186/1472-6807-7-15] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2006] [Accepted: 03/23/2007] [Indexed: 11/25/2022]

Abstract

Background

Reduced representations of proteins have been playing a keyrole in the study of protein folding. Many such models are available, with different representation detail. Although the usefulness of many such models for structural bioinformatics applications has been demonstrated in recent years, there are few intermediate resolution models endowed with an energy model capable, for instance, of detecting native or native-like structures among decoy sets. The aim of the present work is to provide a discrete empirical potential for a reduced protein model termed here PC2CA, because it employs a PseudoCovalent structure with only 2 Centers of interactions per Amino acid, suitable for protein model quality assessment.

Results

All protein structures in the set top500H have been converted in reduced form. The distribution of pseudobonds, pseudoangle, pseudodihedrals and distances between centers of interactions have been converted into potentials of mean force. A suitable reference distribution has been defined for non-bonded interactions which takes into account excluded volume effects and protein finite size. The correlation between adjacent main chain pseudodihedrals has been converted in an additional energetic term which is able to account for cooperative effects in secondary structure elements. Local energy surface exploration is performed in order to increase the robustness of the energy function.

Conclusion

The model and the energy definition proposed have been tested on all the multiple decoys' sets in the Decoys'R'us database. The energetic model is able to recognize, for almost all sets, native-like structures (RMSD less than 2.0 Å). These results and those obtained in the blind CASP7 quality assessment experiment suggest that the model compares well with scoring potentials with finer granularity and could be useful for fast exploration of conformational space. Parameters are available at the url: .

Collapse

Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007;15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1778] [Impact Index Per Article: 104.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Xu YO, Hall RW, Goldstein RA, Pollock DD. Divergence, recombination and retention of functionality during protein evolution. Hum Genomics 2006;2:158-67. [PMID: 16197733 PMCID: PMC2943960 DOI: 10.1186/1479-7364-2-3-158] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

We have only a vague idea of precisely how protein sequences evolve in the context of protein structure and function. This is primarily because structural and functional contexts are not easily predictable from the primary sequence, and evaluating patterns of evolution at individual residue positions is also difficult. As a result of increasing biodiversity in genomics studies, progress is being made in detecting context-dependent variation in substitution processes, but it remains unclear exactly what context-dependent patterns we should be looking for. To address this, we have been simulating protein evolution in the context of structure and function using lattice models of proteins and ligands (or substrates). These simulations include thermodynamic features of protein stability and population dynamics. We refer to this approach as 'ab initio evolution' to emphasise the fact that the equilibrium details of fitness distributions arise from the physical principles of the system and not from any preconceived notions or arbitrary mathematical distributions. Here, we present results on the retention of functionality in homologous recombinants following population divergence. A central result is that protein structure characteristics can strongly influence recombinant functionality. Exceptional structures with many sequence options evolve quickly and tend to retain functionality--even in highly diverged recombinants. By contrast, the more common structures with fewer sequence options evolve more slowly, but the fitness of recombinants drops off rapidly as homologous proteins diverge. These results have implications for understanding viral evolution, speciation and directed evolutionary experiments. Our analysis of the divergence process can also guide improved methods for accurately approximating folding probabilities in more complex but realistic systems.

Collapse

Duan MJ, Zhou YH. A contact energy function considering residue hydrophobic environment and its application in protein fold recognition. GENOMICS PROTEOMICS & BIOINFORMATICS 2006;3:218-24. [PMID: 16689689 PMCID: PMC5172539 DOI: 10.1016/s1672-0229(05)03030-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Zhang J, Chen R, Liang J. Empirical potential function for simplified protein models: combining contact and local sequence-structure descriptors. Proteins 2006;63:949-60. [PMID: 16477624 DOI: 10.1002/prot.20809] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N. A maximum likelihood framework for protein design. BMC Bioinformatics 2006;7:326. [PMID: 16808841 PMCID: PMC1570151 DOI: 10.1186/1471-2105-7-326] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 06/29/2006] [Indexed: 11/21/2022] Open

Abstract

Background

The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility.

Results

We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered.

Conclusion

Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.

Collapse

Rastogi S, Reuter N, Liberles DA. Evaluation of models for the evolution of protein sequences and functions under structural constraint. Biophys Chem 2006;124:134-44. [PMID: 16837122 DOI: 10.1016/j.bpc.2006.06.008] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Revised: 06/13/2006] [Accepted: 06/14/2006] [Indexed: 12/01/2022]

Tobi D, Bahar I. Optimal design of protein docking potentials: efficiency and limitations. Proteins 2006;62:970-81. [PMID: 16385562 DOI: 10.1002/prot.20859] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Mayewski S. A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins 2006;59:152-69. [PMID: 15723360 DOI: 10.1002/prot.20397] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 2006;15:900-13. [PMID: 16522791 PMCID: PMC2242478 DOI: 10.1110/ps.051799606] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Chen WW, Shakhnovich EI. Lessons from the design of a novel atomic potential for protein folding. Protein Sci 2005;14:1741-52. [PMID: 15987903 PMCID: PMC2253347 DOI: 10.1110/ps.051440705] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Heo M, Kim S, Moon EJ, Cheon M, Chung K, Chang I. Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005;72:011906. [PMID: 16090000 DOI: 10.1103/physreve.72.011906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2004] [Revised: 05/10/2005] [Indexed: 05/03/2023]

Abstract

Although a coarse-grained description of proteins is a simple and convenient way to attack the protein folding problem, the construction of a global pairwise energy function which can simultaneously recognize the native folds of many proteins has resulted in partial success. We have sought the possibility of a systematic improvement of this pairwise-contact energy function as we extended the parameter space of amino acids, incorporating local environments of amino acids, beyond a 20 x 20 matrix. We have studied the pairwise contact energy functions of 20 x 20, 60 x 60, and 180 x 180 matrices depending on the extent of parameter space, and compared their effect on the learnability of energy parameters in the context of a gapless threading, bearing in mind that a 20 x 20 pairwise contact matrix has been shown to be too simple to recognize the native folds of many proteins. In this paper, we show that the construction of a global pairwise energy function was achieved using 1006 training proteins of a homology of less than 30%, which include all representatives of different protein classes. After parametrizing the local environments of the amino acids into nine categories depending on three secondary structures and three kinds of hydrophobicity (desolvation), the 16290 pairwise contact energies (scores) of the amino acids could be determined by perceptron learning and protein threading. These could simultaneously recognize all the native folds of the 1006 training proteins. When these energy parameters were tested on the 382 test proteins of a homology of less than 90%, 370 (96.9%) proteins could recognize their native folds. We set up a simple thermodynamic framework in the conformational space of decoys to calculate the unfolded fraction and the specific heat of real proteins. The different thermodynamic stabilities of E.coli ribonuclease H (RNase H) and its mutants were well described in our calculation, agreeing with the experiment.

Collapse

Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognit Lett 2005. [DOI: 10.1016/j.patrec.2005.01.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. J Comput Aided Mol Des 2005;18:797-810. [PMID: 16075311 DOI: 10.1007/s10822-005-0578-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2004] [Accepted: 12/14/2004] [Indexed: 10/25/2022]

Threading with environment-specific score by artificial neural networks. Soft comput 2005. [DOI: 10.1007/s00500-005-0488-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]