301
|
Abstract
The SLoop database of supersecondary fragments, first described by Donate et al. (Protein Sci., 1996, 5, 2600-2616), contains protein loops, classified according to structural similarity. The database has recently been updated and currently contains over 10 000 loops up to 20 residues in length, which cluster into over 560 well populated classes. The database can be found at http://www-cryst.bioc.cam.ac.uk/~sloop. In this paper, we identify conserved structural features such as main chain conformation and hydrogen bonding. Using the original approach of Rufino and co-workers (1997), the correct structural class is predicted with the highest SLoop score for 35% of loops. This rises to 65% by considering the three highest scoring class predictions and to 75% in the top five scoring class predictions. Inclusion of residues from the neighbouring secondary structures and use of substitution tables derived using a reduced definition of secondary structure increase these prediction accuracies to 58, 78 and 85%, respectively. This suggests that capping residues can stabilize the loop conformation as well as that of the secondary structure. Further increases are achieved if only well-populated classes are considered in the prediction. These results correspond to an average loop root mean square deviation of between 0.4 and 2.6 A for loops up to five residues in length.
Collapse
Affiliation(s)
- D F Burke
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB1 2GA, UK.
| | | |
Collapse
|
302
|
Abstract
The cooperative folding of proteins implies a description by multibody potentials. Such multibody potentials can be generalized from common two-body statistical potentials through a relation to probability distributions of residue clusters via the Boltzmann condition. In this exploratory study, we compare a four-body statistical potential, defined by the Delaunay tessellation of protein structures, to the Miyazawa-Jernigan (MJ) potential for protein structure prediction, using a lattice chain growth algorithm. We use the four-body potential as a discriminatory function for conformational ensembles generated with the MJ potential and examine performance on a set of 22 proteins of 30-76 residues in length. We find that the four-body potential yields comparable results to the two-body MJ potential, namely, an average coordinate root-mean-square deviation (cRMSD) value of 8 A for the lowest energy configurations of all-alpha proteins, and somewhat poorer cRMSD values for other protein classes. For both two and four-body potentials, superpositions of some predicted and native structures show a rough overall agreement. Formulating the four-body potential using larger data sets and direct, but costly, generation of conformational ensembles with multibody potentials may offer further improvements. Proteins 2001;43:161-174.
Collapse
Affiliation(s)
- H H Gan
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University and the Howard Hughes Medical Institute, 251 Mercer Street, New York, NY 10012, USA
| | | | | |
Collapse
|
303
|
Deane CM, Blundell TL. CODA: a combined algorithm for predicting the structurally variable regions of protein models. Protein Sci 2001; 10:599-612. [PMID: 11344328 PMCID: PMC2374131 DOI: 10.1110/ps.37601] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
CODA, an algorithm for predicting the variable regions in proteins, combines FREAD a knowledge based approach, and PETRA, which constructs the region ab initio. FREAD selects from a database of protein structure fragments with environmentally constrained substitution tables and other rule-based filters. FREAD was parameterized and tested on over 3000 loops. The average root mean square deviation ranged from 0.78 A for three residue loops to 3.5 A for eight residue loops on a nonhomologous test set. CODA clusters the predictions from the two independent programs and makes a consensus prediction that must pass a set of rule-based filters. CODA was parameterized and tested on two unrelated separate sets of structures that were nonhomologous to one another and those found in the FREAD database. The average root mean square deviation in the test set ranged from 0.76 A for three residue loops to 3.09 A for eight residue loops. CODA shows a general improvement in loop prediction over PETRA and FREAD individually. The improvement is far more marked for lengths six and upward, probably as the predictive power of PETRA becomes more important. CODA was further tested on several model structures to determine its applicability to the modeling situation. A web server of CODA is available at http://www-cryst.bioc.cam.ac.uk/~charlotte/Coda/search_coda.html.
Collapse
Affiliation(s)
- C M Deane
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | | |
Collapse
|
304
|
Abstract
The prediction of protein structure, based primarily on sequence and structure homology, has become an increasingly important activity. Homology models have become more accurate and their range of applicability has increased. Progress has come, in part, from the flood of sequence and structure information that has appeared over the past few years, and also from improvements in analysis tools. These include profile methods for sequence searches, the use of three-dimensional structure information in sequence alignment and new homology modeling tools, specifically in the prediction of loop and side-chain conformations. There have also been important advances in understanding the physical chemical basis of protein stability and the corresponding use of physical chemical potential functions to identify correctly folded from incorrectly folded protein conformations.
Collapse
Affiliation(s)
- B Al-Lazikani
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, 630 West 168th Street, New York, NY 10032, USA
| | | | | | | |
Collapse
|
305
|
Abstract
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.
Collapse
Affiliation(s)
- B Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
306
|
Xia Y, Levitt M. Extracting knowledge-based energy functions from protein structures by error rate minimization: Comparison of methods using lattice model. J Chem Phys 2000. [DOI: 10.1063/1.1320823] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
307
|
Abstract
Rigid-body methods, particularly Fourier correlation techniques, are very efficient for docking bound (co-crystallized) protein conformations using measures of surface complementarity as the target function. However, when docking unbound (separately crystallized) conformations, the method generally yields hundreds of false positive structures with good scores but high root mean square deviations (RMSDs). This paper describes a two-step scoring algorithm that can discriminate near-native conformations (with less than 5 A RMSD) from other structures. The first step includes two rigid-body filters that use the desolvation free energy and the electrostatic energy to select a manageable number of conformations for further processing, but are unable to eliminate all false positives. Complete discrimination is achieved in the second step that minimizes the molecular mechanics energy of the retained structures, and re-ranks them with a combined free-energy function which includes electrostatic, solvation, and van der Waals energy terms. After minimization, the improved fit in near-native complex conformations provides the free-energy gap required for discrimination. The algorithm has been developed and tested using docking decoys, i.e., docked conformations generated by Fourier correlation techniques. The decoy sets are available on the web for testing other discrimination procedures. Proteins 2000;40:525-537.
Collapse
Affiliation(s)
- C J Camacho
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
308
|
DeGrado WF, Summa CM, Pavone V, Nastri F, Lombardi A. De novo design and structural characterization of proteins and metalloproteins. Annu Rev Biochem 2000; 68:779-819. [PMID: 10872466 DOI: 10.1146/annurev.biochem.68.1.779] [Citation(s) in RCA: 500] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
De novo protein design has recently emerged as an attractive approach for studying the structure and function of proteins. This approach critically tests our understanding of the principles of protein folding; only in de novo design must one truly confront the issue of how to specify a protein's fold and function. If we truly understand proteins, it should be possible to design receptors, enzymes, and ion channels from scratch. Further, as this understanding evolves and is further refined, it should be possible to design proteins and biomimetic polymers with properties unprecedented in nature.
Collapse
Affiliation(s)
- W F DeGrado
- Johnson Research Foundation, Pennsylvania, Philadelphia, USA.
| | | | | | | | | |
Collapse
|
309
|
Miyazawa S, Jernigan RL. Identifying sequence-structure pairs undetected by sequence alignments. PROTEIN ENGINEERING 2000; 13:459-75. [PMID: 10906342 DOI: 10.1093/protein/13.7.459] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We examine how effectively simple potential functions previously developed can identify compatibilities between sequences and structures of proteins for database searches. The potential function consists of pairwise contact energies, repulsive packing potentials of residues for overly dense arrangement and short-range potentials for secondary structures, all of which were estimated from statistical preferences observed in known protein structures. Each potential energy term was modified to represent compatibilities between sequences and structures for globular proteins. Pairwise contact interactions in a sequence-structure alignment are evaluated in a mean field approximation on the basis of probabilities of site pairs to be aligned. Gap penalties are assumed to be proportional to the number of contacts at each residue position, and as a result gaps will be more frequently placed on protein surfaces than in cores. In addition to minimum energy alignments, we use probability alignments made by successively aligning site pairs in order by pairwise alignment probabilities. The results show that the present energy function and alignment method can detect well both folds compatible with a given sequence and, inversely, sequences compatible with a given fold, and yield mostly similar alignments for these two types of sequence and structure pairs. Probability alignments consisting of most reliable site pairs only can yield extremely small root mean square deviations, and including less reliable pairs increases the deviations. Also, it is observed that secondary structure potentials are usefully complementary to yield improved alignments with this method. Remarkably, by this method some individual sequence-structure pairs are detected having only 5-20% sequence identity.
Collapse
Affiliation(s)
- S Miyazawa
- Faculty of Technology, Gunma University, Kiryu, Gunma 376, Japan and Room B-116, Bldg 12B, MSC 5677, Laboratory of Experimental and Computational Biology, DBS, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892-5677,USA
| | | |
Collapse
|
310
|
Samudrala R, Levitt M. Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction. Protein Sci 2000; 9:1399-401. [PMID: 10933507 PMCID: PMC2144680 DOI: 10.1110/ps.9.7.1399] [Citation(s) in RCA: 162] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The development of an energy or scoring function for protein structure prediction is greatly enhanced by testing the function on a set of computer-generated conformations (decoys) to determine whether it can readily distinguish native-like conformations from nonnative ones. We have created "Decoys 'R' Us," a database containing many such sets of conformations, to provide a resource that allows scoring functions to be improved.
Collapse
Affiliation(s)
- R Samudrala
- Department of Structural Biology, Stanford University School of Medicine, California 94305, USA.
| | | |
Collapse
|
311
|
|
312
|
Samudrala R, Huang ES, Koehl P, Levitt M. Constructing side chains on near-native main chains for ab initio protein structure prediction. PROTEIN ENGINEERING 2000; 13:453-7. [PMID: 10906341 DOI: 10.1093/protein/13.7.453] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Is there value in constructing side chains while searching protein conformational space during an ab initio simulation? If so, what is the most computationally efficient method for constructing these side chains? To answer these questions, four published approaches were used to construct side chain conformations on a range of near-native main chains generated by ab initio protein structure prediction methods. The accuracy of these approaches was compared with a naive approach that selects the most frequently observed rotamer for a given amino acid to construct side chains. An all-atom conditional probability discriminatory function is useful at selecting conformations with overall low all-atom root mean square deviation (r.m.s.d.) and the discrimination improves on sets that are closer to the native conformation. In addition, the naive approach performs as well as more sophisticated methods in terms of the percentage of chi(1) angles built accurately and the all-atom r. m.s.d., between the native and near-native conformations. The results suggest that the naive method would be extremely useful for fast and efficient side chain construction on vast numbers of conformations for ab initio prediction of protein structure.
Collapse
Affiliation(s)
- R Samudrala
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|
313
|
Xia Y, Huang ES, Levitt M, Samudrala R. Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 2000; 300:171-85. [PMID: 10864507 DOI: 10.1006/jmbi.2000.3835] [Citation(s) in RCA: 141] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We present a hierarchical method to predict protein tertiary structure models from sequence. We start with complete enumeration of conformations using a simple tetrahedral lattice model. We then build conformations with increasing detail, and at each step select a subset of conformations using empirical energy functions with increasing complexity. After enumeration on lattice, we select a subset of low energy conformations using a statistical residue-residue contact energy function, and generate all-atom models using predicted secondary structure. A combined knowledge-based atomic level energy function is then used to select subsets of the all-atom models. The final predictions are generated using a consensus distance geometry procedure. We test the feasibility of the procedure on a set of 12 small proteins covering a wide range of protein topologies. A rigorous double-blind test of our method was made under the auspices of the CASP3 experiment, where we did ab initio structure predictions for 12 proteins using this approach. The performance of our methodology at CASP3 is reasonably good and completely consistent with our initial tests.
Collapse
Affiliation(s)
- Y Xia
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | | | | | | |
Collapse
|
314
|
Structures of scrambled disulfide forms of the potato carboxypeptidase inhibitor predicted by molecular dynamics simulations with constraints. Proteins 2000. [DOI: 10.1002/1097-0134(20000815)40:3<482::aid-prot150>3.0.co;2-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
315
|
Gatchell DW, Dennis S, Vajda S. Discrimination of near-native protein structures from misfolded models by empirical free energy functions. Proteins 2000. [DOI: 10.1002/1097-0134(20001201)41:4<518::aid-prot90>3.0.co;2-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
316
|
|
317
|
Takano K, Ota M, Ogasahara K, Yamagata Y, Nishikawa K, Yutani K. Experimental verification of the 'stability profile of mutant protein' (SPMP) data using mutant human lysozymes. PROTEIN ENGINEERING 1999; 12:663-72. [PMID: 10469827 DOI: 10.1093/protein/12.8.663] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The stability profile of mutant protein (SPMP) (Ota,M., Kanaya,S. and Nishikawa,K., 1995, J. Mol. Biol., 248, 733-738) estimates the changes in conformational stability due to single amino acid substitutions using a pseudo-energy potential developed for evaluating structure-sequence compatibility in the structure prediction method, the 3D-1D compatibility evaluation. Nine mutant human lysozymes expected to significantly increase in stability from SPMP were constructed, in order to experimentally verify the reliability of SPMP. The thermodynamic parameters for denaturation and crystal structures of these mutant proteins were determined. One mutant protein was stabilized as expected, compared with the wild-type protein. However, the others were not stabilized even though the structural changes were subtle, indicating that SPMP overestimates the increase in stability or underestimates negative effects due to substitution. The stability changes in the other mutant human lysozymes previously reported were also analyzed by SPMP. The correlation of the stability changes between the experiment and prediction depended on the types of substitution: there were some correlations for proline mutants and cavity-creating mutants, but no correlation for mutants related to side-chain hydrogen bonds. The present results may indicate some additional factors that should be considered in the calculation of SPMP, suggesting that SPMP can be refined further.
Collapse
Affiliation(s)
- K Takano
- Institute for Protein Research, Osaka University, Yamadaoka, Suita, Osaka 565-0871, Japan
| | | | | | | | | | | |
Collapse
|
318
|
Huang ES, Samudrala R, Ponder JW. Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. J Mol Biol 1999; 290:267-81. [PMID: 10388572 DOI: 10.1006/jmbi.1999.2861] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The problem of protein tertiary structure prediction from primary sequence can be separated into two subproblems: generation of a library of possible folds and specification of a best fold given the library. A distance geometry procedure based on random pairwise metrization with good sampling properties was used to generate a library of 500 possible structures for each of 11 small helical proteins. The input to distance geometry consisted of sets of restraints to enforce predicted helical secondary structure and a generic range of 5 to 11 A between predicted contact residues on all pairs of helices. For each of the 11 targets, the resulting library contained structures with low RMSD versus the native structure. Near-native sampling was enhanced by at least three orders of magnitude compared to a random sampling of compact folds. All library members were scored with a combination of an all-atom distance-dependent function, a residue pair-potential, and a hydrophobicity function. In six of the 11 cases, the best-ranking fold was considered to be near native. Each library was also reduced to a final ab initio prediction via consensus distance geometry performed over the 50 best-ranking structures from the full set of 500. The consensus results were of generally higher quality, yielding six predictions within 6.5 A of the native fold. These favorable predictions corresponded to those for which the correlation between the RMSD and the scoring function were highest. The advantage of the reported methodology is its extreme simplicity and potential for including other types of structural restraints.
Collapse
Affiliation(s)
- E S Huang
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Saint Louis, MO, 63110, USA
| | | | | |
Collapse
|
319
|
Abstract
We discuss the derivation of atomic-level potentials of mean force from the known protein structures and their applicability for structural evaluation applications. In the derivation process, rigorous density estimation methodology is used to estimate the probability density functions (PDFs) for the distributions of interatomic distances in the protein structures. Potentials of mean force are then derived from these density functions using simple Boltzmann's relation. We also test the potentials against pairs of current and superseded protein structures in the Protein Data Bank. Using PDF potentials to evaluate each structure pair, we are able to identify, with high accuracy, which of the two structures is of higher resolution or better quality. This result shows that the PDF potentials are sensitive to details in protein structures as the current and superseded atomic coordinates generally do not differ by more than 1 A in root-mean-square deviation, and that the PDF potentials could potentially be used for X-ray structure refinement and protein structure prediction.
Collapse
Affiliation(s)
- A Rojnuckarin
- Department of Chemical Engineering, University of Wisconsin-Madison, USA
| | | |
Collapse
|
320
|
Lazaridis T, Karplus M. Discrimination of the native from misfolded protein models with an energy function including implicit solvation. J Mol Biol 1999; 288:477-87. [PMID: 10329155 DOI: 10.1006/jmbi.1999.2685] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
An essential requirement for theoretical protein structure prediction is an energy function that can discriminate the native from non-native protein conformations. To date most of the energy functions used for this purpose have been extracted from a statistical analysis of the protein structure database, without explicit reference to the physical interactions responsible for protein stability. The use of the statistical functions has been supported by the widespread belief that they are superior for such discrimination to physics-based energy functions. An effective energy function which combined the CHARMM vacuum potential with a Gaussian model for the solvation free energy is tested for its ability to discriminate the native structure of a protein from misfolded conformations; the results are compared with those obtained with the vacuum CHARMM potential. The test is performed on several sets of misfolded structures prepared by others, including sets of about 650 good decoys for six proteins, as well as on misfolded structures of chymotrypsin inhibitor 2. The vacuum CHARMM potential is successful in most cases when energy minimized conformations are considered, but fails when applied to structures relaxed by molecular dynamics. With the effective energy function the native state is always more stable than grossly misfolded conformations both in energy minimized and molecular dynamics-relaxed structures. The present results suggest that molecular mechanics (physics-based) energy functions, complemented by a simple model for the solvation free energy, should be tested for use in the inverse folding problem, and supports their use in studies of the effective energy surface of proteins in solution. Moreover, the study suggests that the belief in the superiority of statistical functions for these purposes may be ill founded.
Collapse
Affiliation(s)
- T Lazaridis
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford St, Cambridge, MA, 02138, USA
| | | |
Collapse
|
321
|
|
322
|
Orengo C, Bray J, Hubbard T, LoConte L, Sillitoe I. Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins 1999. [DOI: 10.1002/(sici)1097-0134(1999)37:3+<149::aid-prot20>3.0.co;2-h] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
323
|
Simons KT, Ruczinski I, Kooperberg C, Fox BA, Bystroff C, Baker D. Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins. Proteins 1999; 34:82-95. [PMID: 10336385 DOI: 10.1002/(sici)1097-0134(19990101)34:1<82::aid-prot7>3.0.co;2-a] [Citation(s) in RCA: 350] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We describe the development of a scoring function based on the decomposition P(structure/sequence) proportional to P(sequence/structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first term captures sequence-dependent features of protein structures, such as the burial of hydrophobic residues in the core, the second term, universal sequence-independent features, such as the assembly of beta-strands into beta-sheets. The efficacies of a wide variety of sequence-dependent and sequence-independent features of protein structures for recognizing native-like structures were systematically evaluated using ensembles of approximately 30,000 compact conformations with fixed secondary structure for each of 17 small protein domains. The best results were obtained using a core scoring function with P(sequence/structure) parameterized similarly to our previous work (Simons et al., J Mol Biol 1997;268:209-225] and P(structure) focused on secondary structure packing preferences; while several additional features had some discriminatory power on their own, they did not provide any additional discriminatory power when combined with the core scoring function. Our results, on both the training set and the independent decoy set of Park and Levitt (J Mol Biol 1996;258:367-392), suggest that this scoring function should contribute to the prediction of tertiary structure from knowledge of sequence and secondary structure.
Collapse
Affiliation(s)
- K T Simons
- Department of Biochemistry, University of Washington, Seattle 98195, USA
| | | | | | | | | | | |
Collapse
|
324
|
Huang ES, Koehl P, Levitt M, Pappu RV, Ponder JW. Accuracy of side-chain prediction upon near-native protein backbones generated by Ab initio folding methods. Proteins 1998; 33:204-17. [PMID: 9779788 DOI: 10.1002/(sici)1097-0134(19981101)33:2<204::aid-prot5>3.0.co;2-i] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The ab initio folding problem can be divided into two sequential tasks of approximately equal computational complexity: the generation of native-like backbone folds and the positioning of side chains upon these backbones. The prediction of side-chain conformation in this context is challenging, because at best only the near-native global fold of the protein is known. To test the effect of displacements in the protein backbones on side-chain prediction for folds generated ab initio, sets of near-native backbones (< or = 4 A C alpha RMS error) for four small proteins were generated by two methods. The steric environment surrounding each residue was probed by placing the side chains in the native conformation on each of these decoys, followed by torsion-space optimization to remove steric clashes on a rigid backbone. We observe that on average 40% of the chi1 angles were displaced by 40 degrees or more, effectively setting the limits in accuracy for side-chain modeling under these conditions. Three different algorithms were subsequently used for prediction of side-chain conformation. The average prediction accuracy for the three methods was remarkably similar: 49% to 51% of the chi1 angles were predicted correctly overall (33% to 36% of the chi1+2 angles). Interestingly, when the inter-side-chain interactions were disregarded, the mean accuracy increased. A consensus approach is described, in which side-chain conformations are defined based on the most frequently predicted chi angles for a given method upon each set of near-native backbones. We find that consensus modeling, which de facto includes backbone flexibility, improves side-chain prediction: chi1 accuracy improved to 51-54% (36-42% of chi1+2). Implications of a consensus method for ab initio protein structure prediction are discussed.
Collapse
Affiliation(s)
- E S Huang
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | | | | | | | |
Collapse
|
325
|
Huang ES, Samudrala R, Ponder JW. Distance geometry generates native-like folds for small helical proteins using the consensus distances of predicted protein structures. Protein Sci 1998; 7:1998-2003. [PMID: 9761481 PMCID: PMC2144160 DOI: 10.1002/pro.5560070916] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
For successful ab initio protein structure prediction, a method is needed to identify native-like structures from a set containing both native and non-native protein-like conformations. In this regard, the use of distance geometry has shown promise when accurate inter-residue distances are available. We describe a method by which distance geometry restraints are culled from sets of 500 protein-like conformations for four small helical proteins generated by the method of Simons et al. (1997). A consensus-based approach was applied in which every inter-Calpha distance was measured, and the most frequently occurring distances were used as input restraints for distance geometry. For each protein, a structure with lower coordinate root-mean-square (RMS) error than the mean of the original set was constructed; in three cases the topology of the fold resembled that of the native protein. When the fold sets were filtered for the best scoring conformations with respect to an all-atom knowledge-based scoring function, the remaining subset of 50 structures yielded restraints of higher accuracy. A second round of distance geometry using these restraints resulted in an average coordinate RMS error of 4.38 A.
Collapse
Affiliation(s)
- E S Huang
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Saint Louis, Missouri 63110, USA
| | | | | |
Collapse
|
326
|
Abstract
The interconnected nature of interactions in protein structures appears to be the major hurdle in preventing the construction of accurate comparative models. We present an algorithm that uses graph theory to handle this problem. Each possible conformation of a residue in an amino acid sequence is represented using the notion of a node in a graph. Each node is given a weight based on the degree of the interaction between its side-chain atoms and the local main-chain atoms. Edges are then drawn between pairs of residue conformations/nodes that are consistent with each other (i.e. clash-free and satisfying geometrical constraints). The edges are weighted based on the interactions between the atoms of the two nodes. Once the entire graph is constructed, all the maximal sets of completely connected nodes (cliques) are found using a clique-finding algorithm. The cliques with the best weights represent the optimal combinations of the various main-chain and side-chain possibilities, taking the respective environments into account. The algorithm is used in a comparative modeling scenario to build side-chains, regions of main chain, and mix and match between different homologs in a context-sensitive manner. The predictive power of this method is assessed by applying it to cases where the experimental structure is not known in advance.
Collapse
Affiliation(s)
- R Samudrala
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville 20850, USA
| | | |
Collapse
|