1
|
Wang D, Frechette LB, Best RB. On the role of native contact cooperativity in protein folding. Proc Natl Acad Sci U S A 2024; 121:e2319249121. [PMID: 38776371 PMCID: PMC11145220 DOI: 10.1073/pnas.2319249121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/11/2024] [Indexed: 05/25/2024] Open
Abstract
The consistency of energy landscape theory predictions with available experimental data, as well as direct evidence from molecular simulations, have shown that protein folding mechanisms are largely determined by the contacts present in the native structure. As expected, native contacts are generally energetically favorable. However, there are usually at least as many energetically favorable nonnative pairs owing to the greater number of possible nonnative interactions. This apparent frustration must therefore be reduced by the greater cooperativity of native interactions. In this work, we analyze the statistics of contacts in the unbiased all-atom folding trajectories obtained by Shaw and coworkers, focusing on the unfolded state. By computing mutual cooperativities between contacts formed in the unfolded state, we show that native contacts form the most cooperative pairs, while cooperativities among nonnative or between native and nonnative contacts are typically much less favorable or even anticooperative. Furthermore, we show that the largest network of cooperative interactions observed in the unfolded state consists mainly of native contacts, suggesting that this set of mutually reinforcing interactions has evolved to stabilize the native state.
Collapse
Affiliation(s)
- David Wang
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, MD20892-0520
- Department of Biology, Johns Hopkins University, Baltimore, MD21218
| | - Layne B. Frechette
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, MD20892-0520
- Martin A. Fisher School of Physics, Brandeis University, Waltham, MA02453
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, MD20892-0520
| |
Collapse
|
2
|
Blaszczyk M, Gront D, Kmiecik S, Kurcinski M, Kolinski M, Ciemny MP, Ziolkowska K, Panek M, Kolinski A. Protein Structure Prediction Using Coarse-Grained Models. SPRINGER SERIES ON BIO- AND NEUROSYSTEMS 2019. [DOI: 10.1007/978-3-319-95843-9_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
3
|
Masso M. All-atom four-body knowledge-based statistical potential to distinguish native tertiary RNA structures from nonnative folds. J Theor Biol 2018; 453:58-67. [PMID: 29782930 DOI: 10.1016/j.jtbi.2018.05.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 04/05/2018] [Accepted: 05/17/2018] [Indexed: 11/16/2022]
Abstract
Scientific breakthroughs in recent decades have uncovered the capability of RNA molecules to fulfill a wide array of structural, functional, and regulatory roles in living cells, leading to a concomitantly significant increase in both the number and diversity of experimentally determined RNA three-dimensional (3D) structures. Atomic coordinates from a representative training set of solved RNA structures, displaying low sequence and structure similarity, facilitate derivation of knowledge-based energy functions. Here we develop an all-atom four-body statistical potential and evaluate its capacity to distinguish native RNA 3D structures from nonnative folds based on calculated free energy scores. Atomic four-body nearest-neighbors are objectively identified by their occurrence as tetrahedral vertices in the Delaunay tessellations of RNA structures, and rates of atomic quadruplet interactions expected by chance are obtained from a multinomial reference distribution. Our four-body energy function, referred to as RAMP (ribonucleic acids multibody potential), is subsequently derived by applying the inverted Boltzmann principle to the frequency data, yielding an energy score for each type of atomic quadruplet interaction. Several well-known benchmark datasets reveal that RAMP is comparable with, and often outperforms, existing knowledge- and physics-based energy functions. To the best of our knowledge, this is the first study detailing an RNA tertiary structure-based multibody statistical potential and its comparative evaluation.
Collapse
Affiliation(s)
- Majid Masso
- School of Systems Biology, 10900 University Blvd. MS 5B3, George Mason University, Manassas, VA 20110 USA.
| |
Collapse
|
4
|
Saravanan KM, Suvaithenamudhan S, Parthasarathy S, Selvaraj S. Pairwise contact energy statistical potentials can help to find probability of point mutations. Proteins 2016; 85:54-64. [PMID: 27761949 DOI: 10.1002/prot.25191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 06/16/2016] [Accepted: 10/13/2016] [Indexed: 11/10/2022]
Abstract
To adopt a particular fold, a protein requires several interactions between its amino acid residues. The energetic contribution of these residue-residue interactions can be approximated by extracting statistical potentials from known high resolution structures. Several methods based on statistical potentials extracted from unrelated proteins are found to make a better prediction of probability of point mutations. We postulate that the statistical potentials extracted from known structures of similar folds with varying sequence identity can be a powerful tool to examine probability of point mutation. By keeping this in mind, we have derived pairwise residue and atomic contact energy potentials for the different functional families that adopt the (α/β)8 TIM-Barrel fold. We carried out computational point mutations at various conserved residue positions in yeast Triose phosphate isomerase enzyme for which experimental results are already reported. We have also performed molecular dynamics simulations on a subset of point mutants to make a comparative study. The difference in pairwise residue and atomic contact energy of wildtype and various point mutations reveals probability of mutations at a particular position. Interestingly, we found that our computational prediction agrees with the experimental studies of Silverman et al. (Proc Natl Acad Sci 2001;98:3092-3097) and perform better prediction than iMutant and Cologne University Protein Stability Analysis Tool. The present work thus suggests deriving pairwise contact energy potentials and molecular dynamics simulations of functionally important folds could help us to predict probability of point mutations which may ultimately reduce the time and cost of mutation experiments. Proteins 2016; 85:54-64. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- K M Saravanan
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai, Tamilnadu, 600 025, India
| | - S Suvaithenamudhan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tirchirappalli, Tamilnadu, 620 024, India
| | - S Parthasarathy
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tirchirappalli, Tamilnadu, 620 024, India
| | - S Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tirchirappalli, Tamilnadu, 620 024, India
| |
Collapse
|
5
|
Chen L, He J. A distance- and orientation-dependent energy function of amino acid key blocks. Biopolymers 2016; 101:681-92. [PMID: 24222511 DOI: 10.1002/bip.22440] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 10/31/2013] [Accepted: 11/01/2013] [Indexed: 01/03/2023]
Abstract
Blocks are the selected portions of amino acids. They have been used effectively to represent amino acids in distinguishing the native conformation from the decoys. Although many statistical energy functions exist, most of them rely on the distances between two or more amino acids. In this study, the authors have developed a pairwise energy function "DOKB" that is both distance and orientation dependent, and it is based on the key blocks that bias the distal ends of side chains. The results suggest that both the distance and the orientation are needed to distinguish the fine details of the packing geometry. DOKB appears to perform well in recognizing native conformations when compared with six other energy functions. Highly packed clusters play important roles in stabilizing the structure. The investigation about the highly packed clusters at the residue level suggests that certain residue pairs in a low-energy region have lower probability to appear in the highly packed clusters than in the entire protein. The cluster energy term appears to significantly improve the recognition of the native conformations in ig_structal decoy set, in which more highly packed clusters are contained than in other decoy sets.
Collapse
Affiliation(s)
- Lin Chen
- Department of Computer Science, Old Dominion University, Norfolk, Virginia
| | | |
Collapse
|
6
|
Elhefnawy W, Chen L, Han Y, Li Y. ICOSA: A Distance-Dependent, Orientation-Specific Coarse-Grained Contact Potential for Protein Structure Modeling. J Mol Biol 2015; 427:2562-2576. [DOI: 10.1016/j.jmb.2015.05.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 05/21/2015] [Indexed: 11/16/2022]
|
7
|
Chae MH, Krull F, Knapp EW. Optimized distance-dependent atom-pair-based potential DOOP for protein structure prediction. Proteins 2015; 83:881-90. [PMID: 25693513 DOI: 10.1002/prot.24782] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Revised: 02/06/2015] [Accepted: 02/10/2015] [Indexed: 12/20/2022]
Abstract
The DOcking decoy-based Optimized Potential (DOOP) energy function for protein structure prediction is based on empirical distance-dependent atom-pair interactions. To optimize the atom-pair interactions, native protein structures are decomposed into polypeptide chain segments that correspond to structural motives involving complete secondary structure elements. They constitute near native ligand-receptor systems (or just pairs). Thus, a total of 8609 ligand-receptor systems were prepared from 954 selected proteins. For each of these hypothetical ligand-receptor systems, 1000 evenly sampled docking decoys with 0-10 Å interface root-mean-square-deviation (iRMSD) were generated with a method used before for protein-protein docking. A neural network-based optimization method was applied to derive the optimized energy parameters using these decoys so that the energy function mimics the funnel-like energy landscape for the interaction between these hypothetical ligand-receptor systems. Thus, our method hierarchically models the overall funnel-like energy landscape of native protein structures. The resulting energy function was tested on several commonly used decoy sets for native protein structure recognition and compared with other statistical potentials. In combination with a torsion potential term which describes the local conformational preference, the atom-pair-based potential outperforms other reported statistical energy functions in correct ranking of native protein structures for a variety of decoy sets. This is especially the case for the most challenging ROSETTA decoy set, although it does not take into account side chain orientation-dependence explicitly. The DOOP energy function for protein structure prediction, the underlying database of protein structures with hypothetical ligand-receptor systems and their decoys are freely available at http://agknapp.chemie.fu-berlin.de/doop/.
Collapse
Affiliation(s)
- Myong-Ho Chae
- Department of Biology, University of Science, Unjong-District, Pyongyang, DPR Korea
| | | | | |
Collapse
|
8
|
On simplified global nonlinear function for fitness landscape: a case study of inverse protein folding. PLoS One 2014; 9:e104403. [PMID: 25110986 PMCID: PMC4128808 DOI: 10.1371/journal.pone.0104403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Accepted: 07/14/2014] [Indexed: 11/19/2022] Open
Abstract
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.
Collapse
|
9
|
How good are simplified models for protein structure prediction? Adv Bioinformatics 2014; 2014:867179. [PMID: 24876837 PMCID: PMC4022063 DOI: 10.1155/2014/867179] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/22/2014] [Accepted: 01/23/2014] [Indexed: 11/18/2022] Open
Abstract
Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP.
Collapse
|
10
|
Custódio FL, Barbosa HJ, Dardenne LE. A multiple minima genetic algorithm for protein structure prediction. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.10.029] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
11
|
Abstract
Empirical protein folding potentialfunctions should have a global minimum nearthe native conformationof globular proteins that fold stably, andthey should give the correct free energy offolding. We demonstrate that otherwise verysuccessful potentials fail to have even alocal minimumanywhere near the native conformation, anda seemingly well validated method ofestimatingthe thermodynamic stability of the nativestate is extremely sensitive to smallperturbations inatomic coordinates. These are bothindicative of fitting a great deal ofirrelevant detail. Here weshow how to devise a robust potentialfunction that succeeds very well at bothtasks, at least for alimited set of proteins, and this involvesdeveloping a novel representation of thedenatured state.Predicted free energies of unfolding for 25mutants of barnase are in close agreementwith theexperimental values, while for 17 mutantsthere are substantial discrepancies.
Collapse
Affiliation(s)
- M Chhajer
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599 U.S.A
| | | |
Collapse
|
12
|
Pandey RB, Farmer BL. Random coil to globular thermal response of a protein (H3.1) with three knowledge-based coarse-grained potentials. PLoS One 2012; 7:e49352. [PMID: 23166645 PMCID: PMC3498164 DOI: 10.1371/journal.pone.0049352] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/10/2012] [Indexed: 11/19/2022] Open
Abstract
The effect of temperature on the conformation of a histone (H3.1) is studied by a coarse-grained Monte Carlo simulation based on three knowledge-based contact potentials (MJ, BT, BFKV). Despite unique energy and mobility profiles of its residues, the histone H3.1 undergoes a systematic (possibly continuous) structural transition from a random coil to a globular conformation on reducing the temperature. The range over which such a systematic response in variation of the radius of gyration (R(g)) with the temperature (T) occurs, however, depends on the potential, i.e. ΔT(MJ) ≈ 0.013-0.020, ΔT(BT) ≈ 0.018-0.026, and ΔT(BFKV) ≈ 0.006-0.013 (in reduced unit). Unlike MJ and BT potentials, results from the BFKV potential show an anomaly where the magnitude of R(g) decreases on raising the temperature in a range ΔT(A) ≈ 0.015-0.018 before reaching its steady-state random coil configuration. Scaling of the structure factor, S(q) ∝ q(-1/ν), with the wave vector, q=2π/λ, and the wavelength, λ, reveals a systematic change in the effective dimension (D(e)∼1/ν) of the histone with all potentials (MJ, BT, BFKV): D(e)∼3 in the globular structure with D(e)∼2 for the random coil. Reproducibility of the general yet unique (monotonic) structural transition of the protein H3.1 with the temperature (in contrast to non-monotonic structural response of a similar but different protein H2AX) with three interaction sets shows that the knowledge-based contact potential is viable tool to investigate structural response of proteins. Caution should be exercise with the quantitative comparisons due to differences in transition regimes with these interactions.
Collapse
Affiliation(s)
- Ras B Pandey
- Department of Physics and Astronomy, University of Southern Mississippi, Hattiesburg, Missouri, USA.
| | | |
Collapse
|
13
|
Romero PA, Arnold FH. Random field model reveals structure of the protein recombinational landscape. PLoS Comput Biol 2012; 8:e1002713. [PMID: 23055915 PMCID: PMC3464211 DOI: 10.1371/journal.pcbi.1002713] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 08/03/2012] [Indexed: 11/28/2022] Open
Abstract
We are interested in how intragenic recombination contributes to the evolution of proteins and how this mechanism complements and enhances the diversity generated by random mutation. Experiments have revealed that proteins are highly tolerant to recombination with homologous sequences (mutation by recombination is conservative); more surprisingly, they have also shown that homologous sequence fragments make largely additive contributions to biophysical properties such as stability. Here, we develop a random field model to describe the statistical features of the subset of protein space accessible by recombination, which we refer to as the recombinational landscape. This model shows quantitative agreement with experimental results compiled from eight libraries of proteins that were generated by recombining gene fragments from homologous proteins. The model reveals a recombinational landscape that is highly enriched in functional sequences, with properties dominated by a large-scale additive structure. It also quantifies the relative contributions of parent sequence identity, crossover locations, and protein fold to the tolerance of proteins to recombination. Intragenic recombination explores a unique subset of sequence space that promotes rapid molecular diversification and functional adaptation. Mutation and recombination are the primary sources of genetic variation in evolving populations. The relative benefit of these two diversification mechanisms and how they complement each other has been a long-standing question in evolutionary biology. While it is clear what types of genetic diversity these two mechanisms can create, a significant challenge is relating these sequence changes to changes in fitness. The fitness landscape, which describes this mapping from genotype to phenotype, is extraordinarily complex and defined over an incomprehensibly large space of sequences. Here, we develop a model of the landscape that relies not on the details of this mapping, but rather on the statistical relationships between sequences. By studying the expected values of landscape properties, we can gain insights into the structure of the landscape that are independent of the details of how genotype dictates phenotype. We use this random field model to understand how recombination explores a functionally enriched and diverse subset of protein sequence space.
Collapse
Affiliation(s)
| | - Frances H. Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
- * E-mail:
| |
Collapse
|
14
|
Zhao F, Xu J. A position-specific distance-dependent statistical potential for protein structure and functional study. Structure 2012; 20:1118-26. [PMID: 22608968 PMCID: PMC3372698 DOI: 10.1016/j.str.2012.04.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Revised: 04/09/2012] [Accepted: 04/10/2012] [Indexed: 10/28/2022]
Abstract
Although studied extensively, designing highly accurate protein energy potential is still challenging. A lot of knowledge-based statistical potentials are derived from the inverse of the Boltzmann law and consist of two major components: observed atomic interacting probability and reference state. These potentials mainly distinguish themselves in the reference state and use a similar simple counting method to estimate the observed probability, which is usually assumed to correlate with only atom types. This article takes a rather different view on the observed probability and parameterizes it by the protein sequence profile context of the atoms and the radius of the gyration, in addition to atom types. Experiments confirm that our position-specific statistical potential outperforms currently the popular ones in several decoy discrimination tests. Our results imply that, in addition to reference state, the observed probability also makes energy potentials different and evolutionary information greatly boost performance of energy potentials.
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute at Chicago, Chicago IL, USA 60637
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago IL, USA 60637
| |
Collapse
|
15
|
Zimmermann MT, Leelananda SP, Kloczkowski A, Jernigan RL. Combining statistical potentials with dynamics-based entropies improves selection from protein decoys and docking poses. J Phys Chem B 2012; 116:6725-31. [PMID: 22490366 DOI: 10.1021/jp2120143] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein structure prediction and protein-protein docking are important and widely used tools, but methods to confidently evaluate the quality of a predicted structure or binding pose have had limited success. Typically, either knowledge-based or physics-based energy functions are employed to evaluate a set of predicted structures (termed "decoys" in structure prediction and "poses" in docking), with the lowest energy structure being assumed to be the one closest to the native state. While successful for many cases, failures are still common. Thus, improvements to structure evaluation methods are essential for future improvements. In this work, we combine multibody statistical potentials with dynamics models, evaluating fluctuation-based entropies that include contributions from the entire structure. This leads to enhanced selection of native-like structures for CASP9 decoys, refined ClusPro docking poses, as well as large sets of docking poses from the Benchmark 3.0 and Dockground data sets. The data used include both bound and unbound docking, and positive results are found for each type. Not only does this method yield improved average results, but for high quality docking poses, we often pick the best pose.
Collapse
Affiliation(s)
- Michael T Zimmermann
- Bioinformatics and Computational Biology Interdepartmental Graduate Program, Iowa State University, Ames, Iowa 50011, USA
| | | | | | | |
Collapse
|
16
|
Abstract
Proteins bind to other proteins efficiently and specifically to carry on many cell functions such as signaling, activation, transport, enzymatic reactions, and more. To determine the geometry and strength of binding of a protein pair, an energy function is required. An algorithm to design an optimal energy function, based on empirical data of protein complexes, is proposed and applied. Emphasis is made on negative design in which incorrect geometries are presented to the algorithm that learns to avoid them. For the docking problem the search for plausible geometries can be performed exhaustively. The possible geometries of the complex are generated on a grid with the help of a fast Fourier transform algorithm. A novel formulation of negative design makes it possible to investigate iteratively hundreds of millions of negative examples while monotonically improving the quality of the potential. Experimental structures for 640 protein complexes are used to generate positive and negative examples for learning parameters. The algorithm designed in this work finds the correct binding structure as the lowest energy minimum in 318 cases of the 640 examples. Further benchmarks on independent sets confirm the significant capacity of the scoring function to recognize correct modes of interactions.
Collapse
Affiliation(s)
- D V S Ravikant
- Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, New York 14853, USA
| | | |
Collapse
|
17
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
18
|
Dong Q, Zhou S. Novel nonlinear knowledge-based mean force potentials based on machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:476-486. [PMID: 20820079 DOI: 10.1109/tcbb.2010.86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
Collapse
Affiliation(s)
- Qiwen Dong
- Shanghai Key Lab of Intelligent Information Processing and the School of Computer Science, Fudan University, Old Yifu Building, Room 202-5, 220 Handan Road, Shanhai 200433, China.
| | | |
Collapse
|
19
|
|
20
|
Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010; 11:283. [PMID: 20507547 PMCID: PMC3583236 DOI: 10.1186/1471-2105-11-283] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/27/2010] [Indexed: 11/23/2022] Open
Abstract
Background Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. Results We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβ atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Conclusions Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.
Collapse
Affiliation(s)
- Jose M Duarte
- Max Planck Institute for Molecular Genetics, Ihnestr, Berlin, Germany.
| | | | | | | | | |
Collapse
|
21
|
Ravikant DVS, Elber R. PIE-efficient filters and coarse grained potentials for unbound protein-protein docking. Proteins 2010; 78:400-19. [PMID: 19768784 DOI: 10.1002/prot.22550] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Identifying correct binding modes in a large set of models is an important step in protein-protein docking. We identified protein docking filter based on overlap area that significantly reduces the number of candidate structures that require detailed examination. We also developed potentials based on residue contacts and overlap areas using a comprehensive learning set of 640 two-chain protein complexes with mathematical programming. Our potential showed substantially better recognition capacity compared to other publicly accessible protein docking potentials in discriminating between native and nonnative binding modes on a large test set of 84 complexes independent of our training set. We were able to rank a near-native model on the top in 43 cases and within top 10 in 51 cases. We also report an atomic potential that ranks a near-native model on the top in 46 cases and within top 10 in 58 cases. Our filter+potential is well suited for selecting a small set of models to be refined to atomic resolution.
Collapse
Affiliation(s)
- D V S Ravikant
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
22
|
Wang Z, Tegge AN, Cheng J. Evaluating the absolute quality of a single protein model using structural features and support vector machines. Proteins 2009; 75:638-47. [PMID: 19004001 DOI: 10.1002/prot.22275] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT-TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross-validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template-based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top-ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking.
Collapse
Affiliation(s)
- Zheng Wang
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
23
|
Reconstruction and stability of secondary structure elements in the context of protein structure prediction. Biophys J 2009; 96:4399-408. [PMID: 19486664 DOI: 10.1016/j.bpj.2009.02.057] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Revised: 01/28/2009] [Accepted: 02/19/2009] [Indexed: 11/20/2022] Open
Abstract
Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing alpha-helices and beta-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in beta-sheets and between the turns of alpha-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 A. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded beta-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed.
Collapse
|
24
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
25
|
Li Q, Zhou C, Liu H. Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities. Proteins 2009; 74:820-36. [PMID: 18704928 DOI: 10.1002/prot.22191] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271-287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.
Collapse
Affiliation(s)
- Quan Li
- School of Life Sciences, and Hefei National Laboratory for Physical Sciences at Microscale, University of Science and Technology of China, Hefei, Anhui 230027, China
| | | | | |
Collapse
|
26
|
Gu J, Li H, Jiang H, Wang X. Optimizing energy potential for protein fold recognition with parametric evaluation function. J Comput Biol 2009; 16:427-42. [PMID: 19254182 DOI: 10.1089/cmb.2008.0128] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this paper, a new optimization method is proposed to determine a simplified energy potential for protein fold recognition, which consists of the residue-residue contact, hydrophobicity, and pseudodihedral potentials. With a parametric evaluation function method, the Z-scores of all the proteins in a training set are optimized simultaneously to obtain the best parameter set of the potential. For this multi-objective and multi-constraint problem, the new optimization scheme is very effective. The derived potential is then tested on two high-quality decoy sets and compared with other classical fold recognition potentials. With the simplified energy potential, we achieve a high level of discrimination capability between correct and incorrect folds.
Collapse
Affiliation(s)
- Junfeng Gu
- Department of Engineering Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Dalian, China
| | | | | | | |
Collapse
|
27
|
|
28
|
Randall A, Baldi P. SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs. BMC STRUCTURAL BIOLOGY 2008; 8:52. [PMID: 19055744 PMCID: PMC2667183 DOI: 10.1186/1472-6807-8-52] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Accepted: 12/03/2008] [Indexed: 11/10/2022]
Abstract
Background Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models. Results Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, β-strand pairing, and side-chain hydrogen bonding. SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER). Conclusion SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: . SELECTpro is also available as a public server at the same site.
Collapse
Affiliation(s)
- Arlo Randall
- School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA.
| | | |
Collapse
|
29
|
Hoang TX, Seno F, Trovato A, Banavar JR, Maritan A. Inference of the solvation energy parameters of amino acids using maximum entropy approach. J Chem Phys 2008; 129:035102. [PMID: 18647046 DOI: 10.1063/1.2953691] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a novel technique, based on the principle of maximum entropy, for deriving the solvation energy parameters of amino acids from the knowledge of the solvent accessible areas in experimentally determined native state structures as well as high quality decoys of proteins. We present the results of detailed studies and analyze the correlations of the solvation energy parameters with the standard hydrophobic scale. We study the ability of the inferred parameters to discriminate between the native state structures of proteins and their decoy conformations.
Collapse
Affiliation(s)
- Trinh X Hoang
- Physics Department, Penn State University, 104 Davey Lab, University Park, Pennsylvania 16801, USA
| | | | | | | | | |
Collapse
|
30
|
Wendel C, Gohlke H. Predicting transmembrane helix pair configurations with knowledge-based distance-dependent pair potentials. Proteins 2008; 70:984-99. [PMID: 17847096 DOI: 10.1002/prot.21574] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
As a first step toward a novel de novo structure prediction approach for alpha-helical membrane proteins, we developed coarse-grained knowledge-based potentials to score the mutual configuration of transmembrane (TM) helices. Using a comprehensive database of 71 known membrane protein structures, pairwise potentials depending solely on amino acid types and distances between C(alpha)-atoms were derived. To evaluate the potentials, they were used as an objective function for the rigid docking of 442 TM helix pairs. This is by far the largest test data set reported to date for that purpose. After clustering 500 docking runs for each pair and considering the largest cluster, we found solutions with a root mean squared (RMS) deviation <2 A for about 30% of all helix pairs. Encouragingly, if only clusters that contain at least 20% of all decoys are considered, a success rate >71% (with a RMS deviation <2 A) is obtained. The cluster size thus serves as a measure of significance to identify good docking solutions. In a leave-one-protein-family-out cross-validation study, more than 2/3 of the helix pairs were still predicted with an RMS deviation <2.5 A (if only clusters that contain at least 20% of all decoys are considered). This demonstrates the predictive power of the potentials in general, although it is advisable to further extend the knowledge base to derive more robust potentials in the future. When compared to the scoring function of Fleishman and Ben-Tal, a comparable performance is found by our cross-validated potentials. Finally, well-predicted "anchor helix pairs" can be reliably identified for most of the proteins of the test data set. This is important for an extension of the approach towards TM helix bundles because these anchor pairs will act as "nucleation sites" to which more helices will be added subsequently, which alleviates the sampling problem.
Collapse
Affiliation(s)
- Christina Wendel
- Department of Biological Sciences, Molecular Bioinformatics Group, J. W. Goethe-University, Frankfurt, Germany
| | | |
Collapse
|
31
|
Wallner B, Elofsson A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 2008; 69 Suppl 8:184-93. [PMID: 17894353 DOI: 10.1002/prot.21774] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The ability to rank and select the best model is important in protein structure prediction. Model Quality Assessment Programs (MQAPs) are programs developed to perform this task. They can be divided into three categories based on the information they use. Consensus based methods use the similarity to other models, structure-based methods use features calculated from the structure and evolutionary based methods use the sequence similarity between a model and a template. These methods can be trained to predict the overall global quality of a model, that is, how much a model is likely to differ from the native structure. The methods can also be trained to pinpoint which local regions in a model are likely to be incorrect. In CASP7, we participated with three predictors of global and four of local quality using information from the three categories described above. The result shows that the MQAP using consensus, Pcons, was significantly better at predicting both global and local quality compared with MQAPs using only structure or sequence based information.
Collapse
Affiliation(s)
- Björn Wallner
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
32
|
|
33
|
Feng Y, Kloczkowski A, Jernigan RL. Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins 2007; 68:57-66. [PMID: 17393455 DOI: 10.1002/prot.21362] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Two-body inter-residue contact potentials for proteins have often been extracted and extensively used for threading. Here, we have developed a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. We use several datasets of protein native structures to demonstrate that around 500 chains are sufficient to provide a good estimate of these four-body contact potentials by obtaining convergent threading results. We also have deliberately chosen two sets of protein native structures differing in resolution, one with all chains' resolution better than 1.5 A and the other with 94.2% of the structures having a resolution worse than 1.5 A to investigate whether potentials from well-refined protein datasets perform better in threading. However, potentials from well-refined proteins did not generate statistically significant better threading results. Our four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. Compared with another set of four-body contact potentials derived by using a Delaunay tessellation algorithm, our four-body contact potentials appear to offer a better characterization of the interactions between backbones and side chains and provide better threading results, somewhat complementary to those found using other potentials.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-0320, USA
| | | | | |
Collapse
|
34
|
Ferrada E, Vergara IA, Melo F. A knowledge-based potential with an accurate description of local interactions improves discrimination between native and near-native protein conformations. Cell Biochem Biophys 2007; 49:111-24. [PMID: 17906366 DOI: 10.1007/s12013-007-0050-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Revised: 11/30/1999] [Accepted: 07/16/2007] [Indexed: 10/22/2022]
Abstract
The correct discrimination between native and near-native protein conformations is essential for achieving accurate computer-based protein structure prediction. However, this has proven to be a difficult task, since currently available physical energy functions, empirical potentials and statistical scoring functions are still limited in achieving this goal consistently. In this work, we assess and compare the ability of different full atom knowledge-based potentials to discriminate between native protein structures and near-native protein conformations generated by comparative modeling. Using a benchmark of 152 near-native protein models and their corresponding native structures that encompass several different folds, we demonstrate that the incorporation of close non-bonded pairwise atom terms improves the discriminating power of the empirical potentials. Since the direct and unbiased derivation of close non-bonded terms from current experimental data is not possible, we obtained and used those terms from the corresponding pseudo-energy functions of a non-local knowledge-based potential. It is shown that this methodology significantly improves the discrimination between native and near-native protein conformations, suggesting that a proper description of close non-bonded terms is important to achieve a more complete and accurate description of native protein conformations. Some external knowledge-based energy functions that are widely used in model assessment performed poorly, indicating that the benchmark of models and the specific discrimination task tested in this work constitutes a difficult challenge.
Collapse
Affiliation(s)
- Evandro Ferrada
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | | | |
Collapse
|
35
|
Fogolari F, Pieri L, Dovier A, Bortolussi L, Giugliarelli G, Corazza A, Esposito G, Viglino P. Scoring predictive models using a reduced representation of proteins: model and energy definition. BMC STRUCTURAL BIOLOGY 2007; 7:15. [PMID: 17378941 PMCID: PMC1854906 DOI: 10.1186/1472-6807-7-15] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2006] [Accepted: 03/23/2007] [Indexed: 11/25/2022]
Abstract
Background Reduced representations of proteins have been playing a keyrole in the study of protein folding. Many such models are available, with different representation detail. Although the usefulness of many such models for structural bioinformatics applications has been demonstrated in recent years, there are few intermediate resolution models endowed with an energy model capable, for instance, of detecting native or native-like structures among decoy sets. The aim of the present work is to provide a discrete empirical potential for a reduced protein model termed here PC2CA, because it employs a PseudoCovalent structure with only 2 Centers of interactions per Amino acid, suitable for protein model quality assessment. Results All protein structures in the set top500H have been converted in reduced form. The distribution of pseudobonds, pseudoangle, pseudodihedrals and distances between centers of interactions have been converted into potentials of mean force. A suitable reference distribution has been defined for non-bonded interactions which takes into account excluded volume effects and protein finite size. The correlation between adjacent main chain pseudodihedrals has been converted in an additional energetic term which is able to account for cooperative effects in secondary structure elements. Local energy surface exploration is performed in order to increase the robustness of the energy function. Conclusion The model and the energy definition proposed have been tested on all the multiple decoys' sets in the Decoys'R'us database. The energetic model is able to recognize, for almost all sets, native-like structures (RMSD less than 2.0 Å). These results and those obtained in the blind CASP7 quality assessment experiment suggest that the model compares well with scoring potentials with finer granularity and could be useful for fast exploration of conformational space. Parameters are available at the url: .
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Lidia Pieri
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
- INAF – Astronomical Observatory of Padova Vicolo dell'Osservatorio 5, I-35122 Padova, Italy
| | - Agostino Dovier
- Dipartimento di Matematica e Informatica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Luca Bortolussi
- Dipartimento di Matematica e Informatica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Gilberto Giugliarelli
- Dipartimento di Fisica, Università di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Alessandra Corazza
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Gennaro Esposito
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| | - Paolo Viglino
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy
| |
Collapse
|
36
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1778] [Impact Index Per Article: 104.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
37
|
Xu YO, Hall RW, Goldstein RA, Pollock DD. Divergence, recombination and retention of functionality during protein evolution. Hum Genomics 2006; 2:158-67. [PMID: 16197733 PMCID: PMC2943960 DOI: 10.1186/1479-7364-2-3-158] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We have only a vague idea of precisely how protein sequences evolve in the context of protein structure and function. This is primarily because structural and functional contexts are not easily predictable from the primary sequence, and evaluating patterns of evolution at individual residue positions is also difficult. As a result of increasing biodiversity in genomics studies, progress is being made in detecting context-dependent variation in substitution processes, but it remains unclear exactly what context-dependent patterns we should be looking for. To address this, we have been simulating protein evolution in the context of structure and function using lattice models of proteins and ligands (or substrates). These simulations include thermodynamic features of protein stability and population dynamics. We refer to this approach as 'ab initio evolution' to emphasise the fact that the equilibrium details of fitness distributions arise from the physical principles of the system and not from any preconceived notions or arbitrary mathematical distributions. Here, we present results on the retention of functionality in homologous recombinants following population divergence. A central result is that protein structure characteristics can strongly influence recombinant functionality. Exceptional structures with many sequence options evolve quickly and tend to retain functionality--even in highly diverged recombinants. By contrast, the more common structures with fewer sequence options evolve more slowly, but the fitness of recombinants drops off rapidly as homologous proteins diverge. These results have implications for understanding viral evolution, speciation and directed evolutionary experiments. Our analysis of the divergence process can also guide improved methods for accurately approximating folding probabilities in more complex but realistic systems.
Collapse
Affiliation(s)
- Yanlong O Xu
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Chemistry, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Randall W Hall
- Department of Chemistry, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Richard A Goldstein
- Division of Mathematical Biology, National Institute for Medical Research, Mill Hill, London NW7 1AA, UK
| | - David D Pollock
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA 70803, USA
| |
Collapse
|
38
|
Duan MJ, Zhou YH. A contact energy function considering residue hydrophobic environment and its application in protein fold recognition. GENOMICS PROTEOMICS & BIOINFORMATICS 2006; 3:218-24. [PMID: 16689689 PMCID: PMC5172539 DOI: 10.1016/s1672-0229(05)03030-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The three-dimensional (3D) structure prediction of proteins is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accuracy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.
Collapse
|
39
|
Zhang J, Chen R, Liang J. Empirical potential function for simplified protein models: combining contact and local sequence-structure descriptors. Proteins 2006; 63:949-60. [PMID: 16477624 DOI: 10.1002/prot.20809] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only Calpha or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue-specific reduced discrete-state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in a test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or Calpha atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side-chain centers or coordinates of all side-chain atoms. By reducing the residue alphabets down to size 10 for contact descriptors, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Bioengineering, University of Illinois, Chicago, Illinois, USA
| | | | | |
Collapse
|
40
|
Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N. A maximum likelihood framework for protein design. BMC Bioinformatics 2006; 7:326. [PMID: 16808841 PMCID: PMC1570151 DOI: 10.1186/1471-2105-7-326] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 06/29/2006] [Indexed: 11/21/2022] Open
Abstract
Background The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. Results We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered. Conclusion Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.
Collapse
Affiliation(s)
- Claudia L Kleinman
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Cécile Bonnard
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| | - Hervé Philippe
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Lartillot
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| |
Collapse
|
41
|
Rastogi S, Reuter N, Liberles DA. Evaluation of models for the evolution of protein sequences and functions under structural constraint. Biophys Chem 2006; 124:134-44. [PMID: 16837122 DOI: 10.1016/j.bpc.2006.06.008] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Revised: 06/13/2006] [Accepted: 06/14/2006] [Indexed: 12/01/2022]
Abstract
In the field of evolutionary structural genomics, methods are needed to evaluate why genomes evolved to contain the fold distributions that are observed. In order to study the effects of population dynamics in the evolved genomes we need fast and accurate evolutionary models which can analyze the effects of selection, drift and fixation of a protein sequence in a population that are grounded by physical parameters governing the folding and binding properties of the sequence. In this study, various knowledge-based, force field, and statistical methods for protein folding have been evaluated with four different folds: SH2 domains, SH3 domains, Globin-like, and Flavodoxin-like, to evaluate the speed and accuracy of the energy functions. Similarly, knowledge-based and force field methods have been used to predict ligand binding specificity in SH2 domain. To demonstrate the applicability of these methods, the dynamics of evolution of new binding capabilities by an SH2 domain is demonstrated.
Collapse
Affiliation(s)
- Shruti Rastogi
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | | | | |
Collapse
|
42
|
Abstract
Protein-protein docking is a challenging computational problem in functional genomics, particularly when one or both proteins undergo conformational change(s) upon binding. The major challenge is to define scoring function soft enough to tolerate these changes and specific enough to distinguish between near-native and "misdocked" conformations. Using a linear programming technique, we derived protein docking potentials (PDPs) that comply with this requirement. We considered a set of 63 nonredundant complexes to this aim, and generated 400,000 putative docked complexes (decoys) based on shape complementarity criterion for each complex. The PDPs were required to yield for the native (correctly docked) structure a potential energy lower than those of all the nonnative (misdocked) structures. The energy constraints applied to all complexes led to ca. 25 million inequalities, the simultaneous solution of which yielded an optimal set of PDPs that discriminated the correctly docked (up to 4.0 A root-mean-square deviation from known complex structure) structure among the 85 top-ranking (0.02%) decoys in 59/63 examined bound-bound cases. The high performance of the potentials was further verified in jackknife tests and by ranking putative docked conformation submitted to CAPRI. In addition to their utility in identifying correctly folded complexes, the PDPs reveal biologically meaningful features that distinguish docking potentials from folding potentials.
Collapse
Affiliation(s)
- Dror Tobi
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | | |
Collapse
|
43
|
Mayewski S. A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins 2006; 59:152-69. [PMID: 15723360 DOI: 10.1002/prot.20397] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A new multibody, whole-residue potential for protein tertiary structure is described. The potential is based on the local environment surrounding each main-chain alpha carbon (CA), defined as the set of all residues whose CA coordinates lie within a spherical volume of set radius in 3-dimensional (3D) space surrounding that position. It is shown that the relative positions of the CAs in these local environments belong to a set of preferred templates. The templates are derived by cluster analysis of the presently available database of over 3000 protein chains (750,000 residues) having not more than 30% sequence similarity. For each template is derived also a set of residue propensities for each topological position in the template. Using lookup tables of these derived templates, it is then possible to calculate an energy for any conformation of a given protein sequence. The application of the potential to ab initio protein tertiary structure prediction is evaluated by performing Monte Carlo simulated annealing on test protein sequences.
Collapse
Affiliation(s)
- Stefan Mayewski
- Max-Planck-Institut für Biochemie, 82152 Martinsried, Germany.
| |
Collapse
|
44
|
Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 2006; 15:900-13. [PMID: 16522791 PMCID: PMC2242478 DOI: 10.1110/ps.051799606] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
In this study we present two methods to predict the local quality of a protein model: ProQres and ProQprof. ProQres is based on structural features that can be calculated from a model, while ProQprof uses alignment information and can only be used if the model is created from an alignment. In addition, we also propose a simple approach based on local consensus, Pcons-local. We show that all these methods perform better than state-of-the-art methodologies and that, when applicable, the consensus approach is by far the best approach to predict local structure quality. It was also found that ProQprof performed better than other methods for models based on distant relationships, while ProQres performed best for models based on closer relationship, i.e., a model has to be reasonably good to make a structural evaluation useful. Finally, we show that a combination of ProQprof and ProQres (ProQlocal) performed better than any other nonconsensus method for both high- and low-quality models. Additional information and Web servers are available at: http://www.sbc.su.se/~bjorn/ProQ/.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
45
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
46
|
Chen WW, Shakhnovich EI. Lessons from the design of a novel atomic potential for protein folding. Protein Sci 2005; 14:1741-52. [PMID: 15987903 PMCID: PMC2253347 DOI: 10.1110/ps.051440705] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
We investigate all-atom potentials of mean force for estimating free energies in protein folding and fold recognition. We search through the space potentials and design novel atomic potentials with a random mixing approximation and a contact-correlated Gaussian approximation of decoy states. We show that the two derived potentials are highly correlated, supporting the use of the random energy model as an accurate statistical description of protein conformational states. The novel atomic potentials perform well in a Z-score and fold decoy recognition test. Furthermore, the designed atomic potential performs slightly and significantly better than atomic potentials derived under a quasi-chemical assumption. While accounting for connectivity correlations between atom types does not improve the performance of the designed potential, we show these correlations lead to ambiguities in the distribution of energetic contributions for atoms on the same residue. Within the confines of the model then, many potentials may exist which stabilize all native folds in subtly different ways. Comparison of different protein conformations under the various atomic potentials reveals both a remarkable degree of correspondence in the estimated free energies and a remarkable degree of correspondence in the identity of the contacts types that make the dominant contributions to the estimated free energies. This consistency may be interpreted as a sign that the design procedure is extracting physically meaningful quantities.
Collapse
Affiliation(s)
- William W Chen
- Department of Biophysics, Harvard University, Boston, MA 02115, USA
| | | |
Collapse
|
47
|
Heo M, Kim S, Moon EJ, Cheon M, Chung K, Chang I. Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:011906. [PMID: 16090000 DOI: 10.1103/physreve.72.011906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2004] [Revised: 05/10/2005] [Indexed: 05/03/2023]
Abstract
Although a coarse-grained description of proteins is a simple and convenient way to attack the protein folding problem, the construction of a global pairwise energy function which can simultaneously recognize the native folds of many proteins has resulted in partial success. We have sought the possibility of a systematic improvement of this pairwise-contact energy function as we extended the parameter space of amino acids, incorporating local environments of amino acids, beyond a 20 x 20 matrix. We have studied the pairwise contact energy functions of 20 x 20, 60 x 60, and 180 x 180 matrices depending on the extent of parameter space, and compared their effect on the learnability of energy parameters in the context of a gapless threading, bearing in mind that a 20 x 20 pairwise contact matrix has been shown to be too simple to recognize the native folds of many proteins. In this paper, we show that the construction of a global pairwise energy function was achieved using 1006 training proteins of a homology of less than 30%, which include all representatives of different protein classes. After parametrizing the local environments of the amino acids into nine categories depending on three secondary structures and three kinds of hydrophobicity (desolvation), the 16290 pairwise contact energies (scores) of the amino acids could be determined by perceptron learning and protein threading. These could simultaneously recognize all the native folds of the 1006 training proteins. When these energy parameters were tested on the 382 test proteins of a homology of less than 90%, 370 (96.9%) proteins could recognize their native folds. We set up a simple thermodynamic framework in the conformational space of decoys to calculate the unfolded fraction and the specific heat of real proteins. The different thermodynamic stabilities of E.coli ribonuclease H (RNase H) and its mutants were well described in our calculation, agreeing with the experiment.
Collapse
Affiliation(s)
- Muyoung Heo
- National Research Laboratory for Computational Proteomics and Biophysics, Department of Physics, Pusan National University, Busan, Korea
| | | | | | | | | | | |
Collapse
|
48
|
Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognit Lett 2005. [DOI: 10.1016/j.patrec.2005.01.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
49
|
Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. J Comput Aided Mol Des 2005; 18:797-810. [PMID: 16075311 DOI: 10.1007/s10822-005-0578-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2004] [Accepted: 12/14/2004] [Indexed: 10/25/2022]
Abstract
Inter-residue contacts map prediction is one of the most important intermediate steps to the protein folding problem. In this paper, we focus on the problem of protein inter-residue contacts map prediction based on neural network technique. Firstly, we use a genetic algorithm (GA) to optimize the radial basis function widths and hidden centers of a radial basis function neural network (RBFNN), then a novel binary encoding scheme is employed to train the network for the purpose of learning and predicting the inter-residue contacts patterns of protein sequences got from the protein data bank (PDB). The experimental evidence indicates the utility of our proposed encoding strategy and GA optimized RBFNN. Moreover, the simulation results demonstrate that the network got a better performance for these proteins, whose residue length falls into the area of (100, 300), and the predicted accuracy with a contact threshold of 7 Angstroms scores higher than the other 3 values with 5, 6, and 8 Angstroms.
Collapse
Affiliation(s)
- Guang-Zheng Zhang
- Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences
| | | |
Collapse
|
50
|
|