251
|
Ekwa-Ekoka C, Diaz GA, Carlson C, Hasegawa T, Samudrala R, Lim KC, Yabu JM, Levy B, Schnapp LM. Genomic organization and sequence variation of the human integrin subunit alpha8 gene (ITGA8). Matrix Biol 2005; 23:487-96. [PMID: 15579315 DOI: 10.1016/j.matbio.2004.08.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2004] [Revised: 07/31/2004] [Accepted: 08/09/2004] [Indexed: 11/20/2022]
Abstract
The integrin alpha8 is highly expressed during kidney and lung development. alpha8-deficient mice display abnormal renal development suggesting that alpha8 plays a critical role in organogenesis. Therefore, it would be of considerable interest to understand the genomic structure, localization and sequence variation of the alpha8 gene. Using FISH and genomic database analysis, we show that alpha8 gene maps to chromosome 10p13 and consists of >200 kbp organized into 30 exons. Examination of 47 individuals from two different ethnic groups (European and African descent) identified 286 varying sites. The diversity of alpha8 is comparable to that of other regions within the human genome. Eight of the varying sites were located in the coding regions: six resulted in nonsynonymous substitutions of which two lead to non-conservative changes in protein. None of the sites showed significant deviation from Hardy-Weinberg equilibrium. We mapped the coding region single nucleotide polymorphisms (SNPs) onto a model of the predicted alpha8 structure and found all the SNPs were located in the "calf" of the extracellular domain. In the European population, the linkage disequilibrium statistic D' showed three blocks of relatively non-recombinant regions in the alpha8 gene while the African population showed more evidence of recombination. The observed patterns of the linkage disequilibrium statistic R2 suggest that a large number of sites will need to be genotyped to ensure coverage of the entire gene for genetic association studies. Identification of the sequence variation will allow genetic association studies of alpha8 in kidney and lung disease.
Collapse
MESH Headings
- Base Sequence
- Chromosome Mapping
- Chromosomes, Human, Pair 10
- DNA/metabolism
- DNA, Complementary/metabolism
- Databases, Genetic
- Exons
- Genetic Variation
- Genome
- Genotype
- Humans
- Immunohistochemistry
- In Situ Hybridization, Fluorescence
- Integrin alpha Chains/genetics
- Integrin alpha Chains/metabolism
- Kidney/metabolism
- Linkage Disequilibrium
- Lung/metabolism
- Models, Genetic
- Models, Molecular
- Molecular Sequence Data
- Polymorphism, Genetic
- Polymorphism, Single Nucleotide
- Protein Conformation
- Protein Structure, Secondary
- Sequence Homology, Amino Acid
Collapse
|
252
|
|
253
|
Liu HL, Hsu JP. Recent developments in structural proteomics for protein structure determination. Proteomics 2005; 5:2056-68. [PMID: 15846841 DOI: 10.1002/pmic.200401104] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The major challenges in structural proteomics include identifying all the proteins on the genome-wide scale, determining their structure-function relationships, and outlining the precise three-dimensional structures of the proteins. Protein structures are typically determined by experimental approaches such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. However, the knowledge of three-dimensional space by these techniques is still limited. Thus, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as alternative tools to predict the three-dimensional structures and dynamic behavior of proteins. This review summarizes recent developments in structural proteomics for protein structure determination; including instrumental methods such as X-ray crystallography and NMR spectroscopy, and computational methods such as comparative and de novo structure prediction and molecular dynamics simulations.
Collapse
Affiliation(s)
- Hsuan-Liang Liu
- Department of Chemical Engineering, National Taipei University of Technology, Taiwan.
| | | |
Collapse
|
254
|
Abstract
Cluster distance geometry is a recent generalization of distance geometry whereby protein structures can be described at even lower levels of detail than one point per residue. With improvements in the clustering technique, protein conformations can be summarized in terms of alternative contact patterns between clusters, where each cluster contains four sequentially adjacent amino acid residues. A very simple potential function involving 210 adjustable parameters can be determined that favors the native contacts of 31 small, monomeric proteins over their respective sets of nonnative contacts. This potential then favors the native contacts for 174 small, monomeric proteins that have low sequence identity with any of the training set. A broader search finds 698 small protein chains from the Protein Data Bank where the native contacts are preferred over all alternatives, even though they have low sequence identity with the training set. This amounts to a highly predictive method for ab initio protein folding at low spatial resolution.
Collapse
Affiliation(s)
- Gordon M Crippen
- College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109-1065, USA.
| |
Collapse
|
255
|
Li X, Liang J. Geometric cooperativity and anticooperativity of three-body interactions in native proteins. Proteins 2005; 60:46-65. [PMID: 15849756 DOI: 10.1002/prot.20438] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Characterizing multibody interactions of hydrophobic, polar, and ionizable residues in protein is important for understanding the stability of protein structures. We introduce a geometric model for quantifying 3-body interactions in native proteins. With this model, empirical propensity values for many types of 3-body interactions can be reliably estimated from a database of native protein structures, despite the overwhelming presence of pairwise contacts. In addition, we define a nonadditive coefficient that characterizes cooperativity and anticooperativity of residue interactions in native proteins by measuring the deviation of 3-body interactions from 3 independent pairwise interactions. It compares the 3-body propensity value from what would be expected if only pairwise interactions were considered, and highlights the distinction of propensity and cooperativity of 3-body interaction. Based on the geometric model, and what can be inferred from statistical analysis of such a model, we find that hydrophobic interactions and hydrogen-bonding interactions make nonadditive contributions to protein stability, but the nonadditive nature depends on whether such interactions are located in the protein interior or on the protein surface. When located in the interior, many hydrophobic interactions such as those involving alkyl residues are anticooperative. Salt-bridge and regular hydrogen-bonding interactions, such as those involving ionizable residues and polar residues, are cooperative. When located on the protein surface, these salt-bridge and regular hydrogen-bonding interactions are anticooperative, and hydrophobic interactions involving alkyl residues become cooperative. We show with examples that incorporating 3-body interactions improves discrimination of protein native structures against decoy conformations. In addition, analysis of cooperative 3-body interaction may reveal spatial motifs that can suggest specific protein functions.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioengineering, SEO, MC-063, University of Illinois at Chicago, Chicago, Illinois 60607-7052, USA
| | | |
Collapse
|
256
|
Abstract
Energy functions are crucial ingredients of protein tertiary structure prediction methods. Assessing the quality of energy functions is therefore of prime importance. It requires the elaboration of a standard evaluation scheme, whose key elements are: i). sets that contain the native and several non-native structures of proteins (decoys) in order to test whether the energy functions display the expected quality features and ii). measures to evaluate the reliability of energy functions. We present here a survey of the recent advances in these two related fields. In a first part, we analyze and review the large number of decoy sets that are available on the web, and we summarize the characteristics of a challenging decoy set. We then discuss how to define the quality of energy functions and review the measures related to it.
Collapse
Affiliation(s)
- D Gilis
- Center of Applied Molecular Engineering, Institute of Chemistry and Biochemistry, University of Salzburg, Jakob Haringerstrabe 3, A-5020 Salzburg, Austria.
| |
Collapse
|
257
|
Trebbi B, Fanti M, Rossi I, Zerbetto F. Intraresidue Distribution of Energy in Proteins. J Phys Chem B 2005; 109:3586-93. [PMID: 16851397 DOI: 10.1021/jp0471756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Boltzmann-like distributions appear in many properties and energy-related quantities of proteins. A few examples are hydrophobicity, various types of side-chain/side-chain interactions, proline isomerization, hydrogen bonds, internal cavities, interactions at the level of specific atom types, and the propensity of the phi/'phi' ratio. Here, we conjecture that the Boltzmann hypothesis also holds for the intra-residue energy distribution. We confirm the conjecture by calculating the energies of 41,672 residues of the structures of highly resolved proteins, where at least 12 out of 20 naturally occurring amino acids follow Boltzmann's law. We further examine the entire set of all residue energies and find that the convolution of the individual distributions gives a Poisson function, which is followed by approximately 50% of individual proteins' structures.
Collapse
Affiliation(s)
- Bruno Trebbi
- Dipartimento di Chimica "G. Ciamician" and Dipartimento di Biologia, Università di Bologna, via F. Selmi 2, 40126 Bologna, Italy
| | | | | | | |
Collapse
|
258
|
Liu Z, Mao F, Guo JT, Yan B, Wang P, Qu Y, Xu Y. Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential. Nucleic Acids Res 2005; 33:546-58. [PMID: 15673715 PMCID: PMC548349 DOI: 10.1093/nar/gki204] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Computational evaluation of protein–DNA interaction is important for the identification of DNA-binding sites and genome annotation. It could validate the predicted binding motifs by sequence-based approaches through the calculation of the binding affinity between a protein and DNA. Such an evaluation should take into account structural information to deal with the complicated effects from DNA structural deformation, distance-dependent multi-body interactions and solvation contributions. In this paper, we present a knowledge-based potential built on interactions between protein residues and DNA tri-nucleotides. The potential, which explicitly considers the distance-dependent two-body, three-body and four-body interactions between protein residues and DNA nucleotides, has been optimized in terms of a Z-score. We have applied this knowledge-based potential to evaluate the binding affinities of zinc-finger protein–DNA complexes. The predicted binding affinities are in good agreement with the experimental data (with a correlation coefficient of 0.950). On a larger test set containing 48 protein–DNA complexes with known experimental binding free energies, our potential has achieved a high correlation coefficient of 0.800, when compared with the experimental data. We have also used this potential to identify binding motifs in DNA sequences of transcription factors (TF). The TFs in 79.4% of the known TF–DNA complexes have accurately found their native binding sequences from a large pool of DNA sequences. When tested in a genome-scale search for TF-binding motifs of the cyclic AMP regulatory protein (CRP) of Escherichia coli, this potential ranks all known binding motifs of CRP in the top 15% of all candidate sequences.
Collapse
Affiliation(s)
- Zhijie Liu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Fenglou Mao
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Jun-tao Guo
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Bo Yan
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Peng Wang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Youxing Qu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA 30602, USA
- Computational Biology Institute, Oak Ridge National LaboratoryOak Ridge, TN 37831, USA
- To whom correspondence should be addressed. Tel: +1 706 542 9779; Fax: +1 706 542 9751;
| |
Collapse
|
259
|
Zhang C, Liu S, Zhou H, Zhou Y. The dependence of all-atom statistical potentials on structural training database. Biophys J 2005; 86:3349-58. [PMID: 15189839 PMCID: PMC1304244 DOI: 10.1529/biophysj.103.035998] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An accurate statistical energy function that is suitable for the prediction of protein structures of all classes should be independent of the structural database used for energy extraction. Here, two high-resolution, low-sequence-identity structural databases of 333 alpha-proteins and 271 beta-proteins were built for examining the database dependence of three all-atom statistical energy functions. They are RAPDF (residue-specific all-atom conditional probability discriminatory function), atomic KBP (atomic knowledge-based potential), and DFIRE (statistical potential based on distance-scaled finite ideal-gas reference state). These energy functions differ in the reference states used for energy derivation. The energy functions extracted from the different structural databases are used to select native structures from multiple decoys of 64 alpha-proteins and 28 beta-proteins. The performance in native structure selections indicates that the DFIRE-based energy function is mostly independent of the structural database whereas RAPDF and KBP have a significant dependence. The construction of two additional structural databases of alpha/beta and alpha + beta-proteins further confirmed the weak dependence of DFIRE on the structural databases of various structural classes. The possible source for the difference between the three all-atom statistical energy functions is that the physical reference state of ideal gas used in the DFIRE-based energy function is least dependent on the structural database.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
260
|
Pei J, Grishin NV. Combining evolutionary and structural information for local protein structure prediction. Proteins 2004; 56:782-94. [PMID: 15281130 DOI: 10.1002/prot.20158] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.
Collapse
Affiliation(s)
- Jimin Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | |
Collapse
|
261
|
Zhang C, Liu S, Zhou H, Zhou Y. An accurate, residue-level, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci 2004; 13:400-11. [PMID: 14739325 PMCID: PMC2286718 DOI: 10.1110/ps.03348304] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Structure prediction on a genomic scale requires a simplified energy function that can efficiently sample the conformational space of polypeptide chains. A good energy function at minimum should discriminate native structures against decoys. Here, we show that a recently developed, residue-specific, all-atom knowledge-based potential (167 atomic types) based on distance-scaled, finite ideal-gas reference state (DFIRE-all-atom) can be substantially simplified to 20 residue types located at side-chain center of mass (DFIRE-SCM) without a significant change in its capability of structure discrimination. Using 96 standard multiple decoy sets, we show that there is only a small reduction (from 80% to 78%) in success rate of ranking native structures as the top 1. The success rate is higher than two previously developed, all-atom distance-dependent statistical pair potentials. Applied to structure selections of 21 docking decoys without modification, the DFIRE-SCM potential is 29% more successful in recognizing native complex structures than an all-atom statistical potential trained by a database of dimeric interfaces. The potential also achieves 92% accuracy in distinguishing true dimeric interfaces from artificial crystal interfaces. In addition, the DFIRE potential with the C(alpha) positions as the interaction centers recognizes 123 native structures out of a comprehensive 125-protein TOUCHSTONE decoy set in which each protein has 24,000 decoys with only C(alpha) positions. Furthermore, the performance by DFIRE-SCM on newly established 25 monomeric and 31 docking Rosetta-decoy sets is comparable to (or better than in the case of monomeric decoy sets) that of a recently developed, all-atom Rosetta energy function enhanced with an orientation-dependent hydrogen bonding potential.
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, SUNY Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | | | | | |
Collapse
|
262
|
Winther O, Krogh A. Teaching computers to fold proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:030903. [PMID: 15524499 DOI: 10.1103/physreve.70.030903] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2003] [Revised: 04/26/2004] [Indexed: 05/24/2023]
Abstract
A new general algorithm for optimization of potential functions for protein folding is introduced. It is based upon gradient optimization of the thermodynamic stability of native folds of a training set of proteins with known structure. The iterative update rule contains two thermodynamic averages which are estimated by (generalized ensemble) Monte Carlo. We test the learning algorithm on a Lennard-Jones (LJ) force field with a torsional angle degrees-of-freedom and a single-atom side-chain. In a test with 24 peptides of known structure, none folded correctly with the initial potential functions, but two-thirds came within 3 A to their native fold after optimizing the potential functions.
Collapse
Affiliation(s)
- Ole Winther
- Center for Biological Sequence Analysis, The Technical University of Denmark, Building 208, DK-2800 Lyngby, Denmark.
| | | |
Collapse
|
263
|
Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004; 55:1005-13. [PMID: 15146497 DOI: 10.1002/prot.20007] [Citation(s) in RCA: 163] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, New York 14214, USA
| | | |
Collapse
|
264
|
Wang K, Fain B, Levitt M, Samudrala R. Improved protein structure selection using decoy-dependent discriminatory functions. BMC STRUCTURAL BIOLOGY 2004; 4:8. [PMID: 15207004 PMCID: PMC449718 DOI: 10.1186/1472-6807-4-8] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2004] [Accepted: 06/18/2004] [Indexed: 11/10/2022]
Abstract
BACKGROUND A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations. RESULTS We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement. CONCLUSIONS Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Boris Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ram Samudrala
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|
265
|
Liu S, Zhang C, Zhou H, Zhou Y. A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins 2004; 56:93-101. [PMID: 15162489 DOI: 10.1002/prot.20019] [Citation(s) in RCA: 158] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Extracting knowledge-based statistical potential from known structures of proteins is proved to be a simple, effective method to obtain an approximate free-energy function. However, the different compositions of amino acid residues at the core, the surface, and the binding interface of proteins prohibited the establishment of a unified statistical potential for folding and binding despite the fact that the physical basis of the interaction (water-mediated interaction between amino acids) is the same. Recently, a physical state of ideal gas, rather than a statistically averaged state, has been used as the reference state for extracting the net interaction energy between amino acid residues of monomeric proteins. Here, we find that this monomer-based potential is more accurate than an existing all-atom knowledge-based potential trained with interfacial structures of dimers in distinguishing native complex structures from docking decoys (100% success rate vs. 52% in 21 dimer/trimer decoy sets). It is also more accurate than a recently developed semiphysical empirical free-energy functional enhanced by an orientation-dependent hydrogen-bonding potential in distinguishing native state from Rosetta docking decoys (94% success rate vs. 74% in 31 antibody-antigen and other complexes based on Z score). In addition, the monomer potential achieved a 93% success rate in distinguishing true dimeric interfaces from artificial crystal interfaces. More importantly, without additional parameters, the potential provides an accurate prediction of binding free energy of protein-peptide and protein-protein complexes (a correlation coefficient of 0.87 and a root-mean-square deviation of 1.76 kcal/mol with 69 experimental data points). This work marks a significant step toward a unified knowledge-based potential that quantitatively captures the common physical principle underlying folding and binding. A Web server for academic users, established for the prediction of binding free energy and the energy evaluation of the protein-protein complexes, may be found at http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Song Liu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
266
|
Heuser P, Wohlfahrt G, Schomburg D. Efficient methods for filtering and ranking fragments for the prediction of structurally variable regions in proteins. Proteins 2004; 54:583-95. [PMID: 14748005 DOI: 10.1002/prot.10603] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The prediction of protein 3D structures close to insertions and deletions or, more generally, loop prediction, is still one of the major challenges in homology modeling projects. In this article, we developed ranking criteria and selection filters to improve knowledge-based loop predictions. These criteria were developed and optimized for a test data set containing 678 insertions and deletions. The examples are, in principle, predictable from the used loop database with an RMSD < 1 A and represent realistic modeling situations. Four noncorrelated criteria for the selection of fragments are evaluated. A fast prefilter compares the distance between the anchor groups in the template protein with the stems of the fragments. The RMSD of the anchor groups is used for fitting and ranking of the selected loop candidates. After fitting, repulsive close contacts of loop candidates with the template protein are used for filtering, and fragments with backbone torsion angles, which are unfavorable according to a knowledge-based potential, are eliminated. By the combined application of these filter criteria to the test set, it was possible to increase the percentage of predictions with a global RMSD < 1 A to over 50% among the first five ranks, with average global RMSD values for the first rank candidate that are between 1.3 and 2.2 A for different loop lengths. Compared to other examples described in the literature, our large numbers of test cases are not self-predictions, where loops are placed in a protein after a peptide loop has been cut out, but are attempts to predict structural changes that occur in evolution when a protein is affected by insertions and deletions.
Collapse
Affiliation(s)
- Philipp Heuser
- University of Cologne, Institute of Biochemistry, Köln, Germany
| | | | | |
Collapse
|
267
|
Zhang C, Liu S, Zhou Y. Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential. Protein Sci 2004; 13:391-9. [PMID: 14739324 PMCID: PMC2286705 DOI: 10.1110/ps.03411904] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Revised: 10/17/2003] [Accepted: 10/17/2003] [Indexed: 10/26/2022]
Abstract
The conformations of loops are determined by the water-mediated interactions between amino acid residues. Energy functions that describe the interactions can be derived either from physical principles (physical-based energy function) or statistical analysis of known protein structures (knowledge-based statistical potentials). It is commonly believed that statistical potentials are appropriate for coarse-grained representation of proteins but are not as accurate as physical-based potentials when atomic resolution is required. Several recent applications of physical-based energy functions to loop selections appear to support this view. In this article, we apply a recently developed DFIRE-based statistical potential to three different loop decoy sets (RAPPER, Jacobson, and Forrest-Woolf sets). Together with a rotamer library for side-chain optimization, the performance of DFIRE-based potential in the RAPPER decoy set (385 loop targets) is comparable to that of AMBER/GBSA for short loops (two to eight residues). The DFIRE is more accurate for longer loops (9 to 12 residues). Similar trend is observed when comparing DFIRE with another physical-based OPLS/SGB-NP energy function in the large Jacobson decoy set (788 loop targets). In the Forrest-Woolf decoy set for the loops of membrane proteins, the DFIRE potential performs substantially better than the combination of the CHARMM force field with several solvation models. The results suggest that a single-term DFIRE-statistical energy function can provide an accurate loop prediction at a fraction of computing cost required for more complicate physical-based energy functions. A Web server for academic users is established for loop selection at the softwares/services section of the Web site http://theory.med.buffalo.edu/.
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics and Department of Physiology and Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | | | |
Collapse
|
268
|
Abstract
A protein model that is simple enough to be used in protein-folding simulations but accurate enough to identify a protein native fold is described. Its geometry consists of describing the residues by one, two, or three pseudoatoms, depending on the residue size. Its energy is given by a pairwise, knowledge-based potential obtained for all the pseudoatoms as a function of their relative distance. The pseudoatomic potential is also a function of the primary chain separation and residue order. The model is tested by gapless threading on a large, representative set of known protein and decoy structures obtained from the "Decoys 'R' Us" database. It is also tested by threading on gapped decoys generated for proteins with many homologs. The gapless threading tests show near 98% native-structure recognition as the lowest energy structure and almost 100% as one of the three lowest energy structures for over 2200 test proteins. In decoy threading tests, the model recognized the majority of the native structures. It is also able to recognize native structures among gapped decoys, in spite of close structural similarities. The results indicate that the pseudoatomic model has native recognition ability similar to comparable atomic-based models but much better than equivalent residue-based models.
Collapse
Affiliation(s)
- Marcos R Betancourt
- University at Buffalo Center of Excellence in Bioinformatics, Buffalo, New York 14203, USA
| |
Collapse
|
269
|
Orientation-dependent coarse-grained potentials derived by statistical analysis of molecular structural databases. POLYMER 2004. [DOI: 10.1016/j.polymer.2003.10.093] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
270
|
Rantanen VV, Gyllenberg M, Koski T, Johnson MS. A Bayesian molecular interaction library. J Comput Aided Mol Des 2003; 17:435-61. [PMID: 14677639 DOI: 10.1023/a:1027371810547] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We describe a library of molecular fragments designed to model and predict non-bonded interactions between atoms. We apply the Bayesian approach, whereby prior knowledge and uncertainty of the mathematical model are incorporated into the estimated model and its parameters. The molecular interaction data are strengthened by narrowing the atom classification to 14 atom types, focusing on independent molecular contacts that lie within a short cutoff distance, and symmetrizing the interaction data for the molecular fragments. Furthermore, the location of atoms in contact with a molecular fragment are modeled by Gaussian mixture densities whose maximum a posteriori estimates are obtained by applying a version of the expectation-maximization algorithm that incorporates hyperparameters for the components of the Gaussian mixtures. A routine is introduced providing the hyperparameters and the initial values of the parameters of the Gaussian mixture densities. A model selection criterion, based on the concept of a 'minimum message length' is used to automatically select the optimal complexity of a mixture model and the most suitable orientation of a reference frame for a fragment in a coordinate system. The type of atom interacting with a molecular fragment is predicted by values of the posterior probability function and the accuracy of these predictions is evaluated by comparing the predicted atom type with the actual atom type seen in crystal structures. The fact that an atom will simultaneously interact with several molecular fragments forming a cohesive network of interactions is exploited by introducing two strategies that combine the predictions of atom types given by multiple fragments. The accuracy of these combined predictions is compared with those based on an individual fragment. Exhaustive validation analyses and qualitative examples (e.g., the ligand-binding domain of glutamate receptors) demonstrate that these improvements lead to effective modeling and prediction of molecular interactions.
Collapse
|
271
|
Abstract
The average contribution of individual residue to folding stability and its dependence on buried accessible surface area (ASA) are obtained by two different approaches. One is based on experimental mutation data, and the other uses a new knowledge-based atom-atom potential of mean force. We show that the contribution of a residue has a significant correlation with buried ASA and the regression slopes of 20 amino acid residues (called the buriability) are all positive (pro-burial). The buriability parameter provides a quantitative measure of the driving force for the burial of a residue. The large buriability gap observed between hydrophobic and hydrophilic residues is responsible for the burial of hydrophobic residues in soluble proteins. Possible factors that contribute to the buriability gap are discussed.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, 14214, USA
| | | |
Collapse
|
272
|
|
273
|
Li X, Hu C, Liang J. Simplicial edge representation of protein structures and alpha contact potential with confidence measure. Proteins 2003; 53:792-805. [PMID: 14635122 DOI: 10.1002/prot.10442] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Protein representation and potential function are two important ingredients for studying protein folding, equilibrium thermodynamics, and sequence design. We introduce a novel geometric representation of protein contact interactions using the edge simplices from the alpha shape of the protein structure. This representation can eliminate implausible neighbors that are not in physical contact, and can avoid spurious contact between two residues when a third residue is between them. We developed statistical alpha contact potential using an odds-ratio model. A studentized bootstrap method was then introduced to assess the 95% confidence intervals for each of the 210 propensity parameters. We found, with confidence, that there is significant long-range propensity (>30 residues apart) for hydrophobic interactions. We tested alpha contact potential for native structure discrimination using several sets of decoy structures, and found that it often performs comparably with atom-based potentials requiring many more parameters. We also show that accurate geometric representation is important, and that alpha contact potential has better performance than potential defined by cutoff distance between geometric centers of side chains. Hierarchical clustering of alpha contact potentials reveals natural grouping of residues. To explore the relationship between shape and physicochemical representations, we tested the minimum alphabet size necessary for native structure discrimination. We found that there is no significant difference in performance of discrimination when alphabet size varies from 7 to 20, if geometry is represented accurately by alpha simplicial edges. This result suggests that the geometry of packing plays an important role, but the specific residue types are often interchangeable.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60607-7052, USA
| | | | | |
Collapse
|
274
|
Zhu J, Zhu Q, Shi Y, Liu H. How well can we predict native contacts in proteins based on decoy structures and their energies? Proteins 2003; 52:598-608. [PMID: 12910459 DOI: 10.1002/prot.10444] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
One strategy for ab initio protein structure prediction is to generate a large number of possible structures (decoys) and select the most fitting ones based on a scoring or free energy function. The conformational space of a protein is huge, and chances are rare that any heuristically generated structure will directly fall in the neighborhood of the native structure. It is desirable that, instead of being thrown away, the unfitting decoy structures can provide insights into native structures so prediction can be made progressively. First, we demonstrate that a recently parameterized physics-based effective free energy function based on the GROMOS96 force field and a generalized Born/surface area solvent model is, as several other physics-based and knowledge-based models, capable of distinguishing native structures from decoy structures for a number of widely used decoy databases. Second, we observe a substantial increase in correlations of the effective free energies with the degree of similarity between the decoys and the native structure, if the similarity is measured by the content of native inter-residue contacts in a decoy structure rather than its root-mean-square deviation from the native structure. Finally, we investigate the possibility of predicting native contacts based on the frequency of occurrence of contacts in decoy structures. For most proteins contained in the decoy databases, a meaningful amount of native contacts can be predicted based on plain frequencies of occurrence at a relatively high level of accuracy. Relative to using plain frequencies, overwhelming improvements in sensitivity of the predictions are observed for the 4_state_reduced decoy sets by applying energy-dependent weighting of decoy structures in determining the frequency. There, approximately 80% native contacts can be predicted at an accuracy of approximately 80% using energy-weighted frequencies. The sensitivity of the plain frequency approach is much lower (20% to 40%). Such improvements are, however, not observed for the other decoy databases. The rationalization and implications of the results are discussed.
Collapse
Affiliation(s)
- Jiang Zhu
- Key Laboratory of Structural Biology, University of Science and Technology of China, Chinese Academy of Sciences, School of Life Sciences, Hefei, Anhui, 230026, China.
| | | | | | | |
Collapse
|
275
|
Abstract
The success of structural genomics initiatives requires the development and application of tools for structure analysis, prediction, and annotation. In this paper we review recent developments in these areas; specifically structure alignment, the detection of remote homologs and analogs, homology modeling and the use of structures to predict function. We also discuss various rationales for structural genomics initiatives. These include the structure-based clustering of sequence space and genome-wide function assignment. It is also argued that structural genomics can be integrated into more traditional biological research if specific biological questions are included in target selection strategies.
Collapse
Affiliation(s)
- Sharon Goldsmith-Fischman
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
276
|
Hung LH, Samudrala R. PROTINFO: Secondary and tertiary protein structure prediction. Nucleic Acids Res 2003; 31:3296-9. [PMID: 12824311 PMCID: PMC168948 DOI: 10.1093/nar/gkg541] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2003] [Revised: 03/31/2003] [Accepted: 03/31/2003] [Indexed: 11/14/2022] Open
Abstract
Information about the secondary and tertiary structure of a protein sequence can greatly assist biologists in the generation and testing of hypotheses, as well as design of experiments. The PROTINFO server enables users to submit a protein sequence and request a prediction of the three-dimensional (tertiary) structure based on comparative modeling, fold generation and de novo methods developed by the authors. In addition, users can submit NMR chemical shift data and request protein secondary structure assignment that is based on using neural networks to combine the chemical shifts with secondary structure predictions. The server is available at http://protinfo.compbio.washington.edu.
Collapse
Affiliation(s)
- Ling-Hong Hung
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | |
Collapse
|
277
|
Abstract
The relative strengths of interactions involving polypeptide chains can be estimated with reasonable accuracy with statistical potentials, free-energy functions derived from the frequency of occurrence of structural arrangements of residues or atoms in collections of protein structures. Recent published work has shown that the energetics of side-chain/backbone interactions can be modeled by the phi/psi propensities of the 20 amino acids. In this report, the more commonly used phi/psi probabilities are demonstrated to fail in evaluating the free energies of protein conformations because of an overriding preference for all helical structures. Comparison of the hypothetical reactions implied by these two different statistics-propensities versus probabilities-leads to the conclusion that the Boltzmann hypothesis may only be applicable for the calculation of statistical potentials after the starting conformation has been specified. This conclusion supports a simple conjecture: The surprising success of the Boltzmann hypothesis in explaining the energetics of protein structures is a direct consequence of a real equilibrium, one extending over evolutionary time that has maintained the stability of each protein within a narrow range of values.
Collapse
Affiliation(s)
- David Shortle
- Department of Biological Chemistry, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA.
| |
Collapse
|
278
|
Keasar C, Levitt M. A novel approach to decoy set generation: designing a physical energy function having local minima with native structure characteristics. J Mol Biol 2003; 329:159-74. [PMID: 12742025 PMCID: PMC2693481 DOI: 10.1016/s0022-2836(03)00323-1] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We suggest a new approach to the generation of candidate structures (decoys) for ab initio prediction of protein structures. Our method is based on random sampling of conformation space and subsequent local energy minimization. At the core of this approach lies the design of a novel type of energy function. This energy function has local minima with native structure characteristics and wide basins of attraction. The current work presents our motivation for deriving such an energy function and also tests the derived energy function. Our approach is novel in that it takes advantage of the inherently rough energy landscape of proteins, which is generally considered a major obstacle for protein structure prediction. When local minima have wide basins of attraction, the protein's conformation space can be greatly reduced by the convergence of large regions of the space into single points, namely the local minima corresponding to these funnels. We have implemented this concept by an iterative process. The potential is first used to generate decoy sets and then we study these sets of decoys to guide further development of the potential. A key feature of our potential is the use of cooperative multi-body interactions that mimic the role of the entropic and solvent contributions to the free energy. The validity and value of our approach is demonstrated by applying it to 14 diverse, small proteins. We show that, for these proteins, the size of conformation space is considerably reduced by the new energy function. In fact, the reduction is so substantial as to allow efficient conformational sampling. As a result we are able to find a significant number of near-native conformations in random searches performed with limited computational resources.
Collapse
Affiliation(s)
- Chen Keasar
- Department of Structural Biology, Stanford School of Medicine, Stanford, CA 94305, USA.
| | | |
Collapse
|
279
|
Abstract
The ability to separate correct models of protein structures from less correct models is of the greatest importance for protein structure prediction methods. Several studies have examined the ability of different types of energy function to detect the native, or native-like, protein structure from a large set of decoys. In contrast to earlier studies, we examine here the ability to detect models that only show limited structural similarity to the native structure. These correct models are defined by the existence of a fragment that shows significant similarity between this model and the native structure. It has been shown that the existence of such fragments is useful for comparing the performance between different fold recognition methods and that this performance correlates well with performance in fold recognition. We have developed ProQ, a neural-network-based method to predict the quality of a protein model that extracts structural features, such as frequency of atom-atom contacts, and predicts the quality of a model, as measured either by LGscore or MaxSub. We show that ProQ performs at least as well as other measures when identifying the native structure and is better at the detection of correct models. This performance is maintained over several different test sets. ProQ can also be combined with the Pcons fold recognition predictor (Pmodeller) to increase its performance, with the main advantage being the elimination of a few high-scoring incorrect models. Pmodeller was successful in CASP5 and results from the latest LiveBench, LiveBench-6, indicating that Pmodeller has a higher specificity than Pcons alone.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-106 91 Stockholm, Sweden
| | | |
Collapse
|
280
|
de Bakker PIW, DePristo MA, Burke DF, Blundell TL. Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model. Proteins 2003; 51:21-40. [PMID: 12596261 DOI: 10.1002/prot.10235] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The accuracy of model selection from decoy ensembles of protein loop conformations was explored by comparing the performance of the Samudrala-Moult all-atom statistical potential (RAPDF) and the AMBER molecular mechanics force field, including the Generalized Born/surface area solvation model. Large ensembles of consistent loop conformations, represented at atomic detail with idealized geometry, were generated for a large test set of protein loops of 2 to 12 residues long by a novel ab initio method called RAPPER that relies on fine-grained residue-specific phi/psi propensity tables for conformational sampling. Ranking the conformers on the basis of RAPDF scores resulted in selected conformers that had an average global, non-superimposed RMSD for all heavy mainchain atoms ranging from 1.2 A for 4-mers to 2.9 A for 8-mers to 6.2 A for 12-mers. After filtering on the basis of anchor geometry and RAPDF scores, ranking by energy minimization of the AMBER/GBSA potential energy function selected conformers that had global RMSD values of 0.5 A for 4-mers, 2.3 A for 8-mers, and 5.0 A for 12-mers. Minimized fragments had, on average, consistently lower RMSD values (by 0.1 A) than their initial conformations. The importance of the Generalized Born solvation energy term is reflected by the observation that the average RMSD accuracy for all loop lengths was worse when this term is omitted. There are, however, still many cases where the AMBER gas-phase minimization selected conformers of lower RMSD than the AMBER/GBSA minimization. The AMBER/GBSA energy function had better correlation with RMSD to native than the RAPDF. When the ensembles were supplemented with conformations extracted from experimental structures, a dramatic improvement in selection accuracy was observed at longer lengths (average RMSD of 1.3 A for 8-mers) when scoring with the AMBER/GBSA force field. This work provides the basis for a promising hybrid approach of ab initio and knowledge-based methods for loop modeling.
Collapse
Affiliation(s)
- Paul I W de Bakker
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.
| | | | | | | |
Collapse
|
281
|
DePristo MA, de Bakker PIW, Lovell SC, Blundell TL. Ab initio construction of polypeptide fragments: efficient generation of accurate, representative ensembles. Proteins 2003; 51:41-55. [PMID: 12596262 DOI: 10.1002/prot.10285] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
We describe a novel method to generate ensembles of conformations of the main-chain atoms [N, C(alpha), C, O, Cbeta] for a sequence of amino acids within the context of a fixed protein framework. Each conformation satisfies fundamental stereo-chemical restraints such as idealized geometry, favorable phi/psi angles, and excluded volume. The ensembles include conformations both near and far from the native structure. Algorithms for effective conformational sampling and constant time overlap detection permit the generation of thousands of distinct conformations in minutes. Unlike previous approaches, our method samples dihedral angles from fine-grained phi/psi state sets, which we demonstrate is superior to exhaustive enumeration from coarse phi/psi sets. Applied to a large set of loop structures, our method samples consistently near-native conformations, averaging 0.4, 1.1, and 2.2 A main-chain root-mean-square deviations for four, eight, and twelve residue long loops, respectively. The ensembles make ideal decoy sets to assess the discriminatory power of a selection method. Using these decoy sets, we conclude that quality of anchor geometry cannot reliably identify near-native conformations, though the selection results are comparable to previous loop prediction methods. In a subsequent study (de Bakker et al.: Proteins 2003;51:21-40), we demonstrate that the AMBER forcefield with the Generalized Born solvation model identifies near-native conformations significantly better than previous methods.
Collapse
Affiliation(s)
- Mark A DePristo
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.
| | | | | | | |
Collapse
|
282
|
McConkey BJ, Sobolev V, Edelman M. Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci U S A 2003; 100:3215-20. [PMID: 12631702 PMCID: PMC152272 DOI: 10.1073/pnas.0535768100] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We introduce a method for discriminating correctly folded proteins from well designed decoy structures using atom-atom and atom-solvent contact surfaces. The measure used to quantify contact surfaces integrates the solvent accessible surface and interatomic contacts into one quantity, allowing solvent to be treated as an atom contact. A scoring function was derived from statistical contact preferences within known protein structures and validated by using established protein decoy sets, including the "Rosetta" decoys and data from the CASP4 structure predictions. The scoring function effectively distinguished native structures from all corresponding decoys in >90% of the cases, using isolated protein subunits as target structures. If contacts between subunits within quaternary structures are included, the accuracy increases to 97%. Interactions beyond atom-atom contact range were not required to distinguish native structures from the decoys using this method. The contact scoring performed as well or better than existing statistical and physicochemical potentials and may be applied as an independent means of evaluating putative structural models.
Collapse
Affiliation(s)
- Brendan J McConkey
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | |
Collapse
|
283
|
Berrera M, Molinari H, Fogolari F. Amino acid empirical contact energy definitions for fold recognition in the space of contact maps. BMC Bioinformatics 2003; 4:8. [PMID: 12689348 PMCID: PMC153506 DOI: 10.1186/1471-2105-4-8] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2003] [Accepted: 02/28/2003] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Contradicting evidence has been presented in the literature concerning the effectiveness of empirical contact energies for fold recognition. Empirical contact energies are calculated on the basis of information available from selected protein structures, with respect to a defined reference state, according to the quasi-chemical approximation. Protein-solvent interactions are estimated from residue solvent accessibility. RESULTS In the approach presented here, contact energies are derived from the potential of mean force theory, several definitions of contact are examined and their performance in fold recognition is evaluated on sets of decoy structures. The best definition of contact is tested, on a more realistic scenario, on all predictions including sidechains accepted in the CASP4 experiment. In 30 out of 35 cases the native structure is correctly recognized and best predictions are usually found among the 10 lowest energy predictions. CONCLUSION The definition of contact based on van der Waals radii of alpha carbon and side chain heavy atoms is seen to perform better than other definitions involving only alpha carbons, only beta carbons, all heavy atoms or only backbone atoms. An important prerequisite for the applicability of the approach is that the protein structure under study should not exhibit anomalous solvent accessibility, compared to soluble proteins whose structure is deposited in the Protein Data Bank. The combined evaluation of a solvent accessibility parameter and contact energy allows for an effective gross screening of predictive models.
Collapse
Affiliation(s)
- Marco Berrera
- International School for Advanced Studies Via Beirut 4, 34014 Trieste, Italy
| | - Henriette Molinari
- Dipartimento Scientifico e Tecnologico, Universita' di Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Federico Fogolari
- Dipartimento Scientifico e Tecnologico, Universita' di Verona, Strada Le Grazie 15, 37134 Verona, Italy
| |
Collapse
|
284
|
Adcock SA. Peptide backbone reconstruction using dead-end elimination and a knowledge-based forcefield. J Comput Chem 2003; 25:16-27. [PMID: 14634990 DOI: 10.1002/jcc.10314] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A novel, yet simple and automated, protocol for reconstruction of complete peptide backbones from C(alpha) coordinates only is described, validated, and benchmarked. The described method collates a set of possible backbone conformations for each set of residue triads from a structural library derived from the PDB. The optimal permutation of these three residue segments of backbone conformations is determined using the dead-end elimination (DEE) algorithm. Putative conformations are evaluated using a pairwise-additive knowledge-based forcefield term and a fragment overlap term. The protocol described in this report is able to restore the full backbone coordinates to within 0.2-0.6 A of the actual crystal structure from C(alpha) coordinates only. In addition, it is insensitive to errors in the input C(alpha) coordinates with RMSDs of 3.0 A, and this is illustrated through application to deliberately distorted C(alpha) traces. The entire process, as described, is rapid, requiring of the order of a few minutes for a typical protein on a typical desktop PC. Approximations enable this to be reduced to a few seconds, although this is at the expense of prediction accuracy. This compares very favorably to previously published methods, being sufficiently fast for general use and being one of the most accurate methods. Because the method is not restricted to the reconstruction from only C(alpha) coordinates, reconstruction based on C(beta) coordinates is also demonstrated.
Collapse
Affiliation(s)
- Stewart A Adcock
- Department of Chemistry and Biochemistry, University of California-San Diego, 4234 Urey Hall, 9500 Gilman Drive, La Jolla, California 92093-0365, USA.
| |
Collapse
|
285
|
Buchete NV, Straub JE, Thirumalai D. Anisotropic coarse-grained statistical potentials improve the ability to identify nativelike protein structures. J Chem Phys 2003. [DOI: 10.1063/1.1561616] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
286
|
Jenkins C, Samudrala R, Anderson I, Hedlund BP, Petroni G, Michailova N, Pinel N, Overbeek R, Rosati G, Staley JT. Genes for the cytoskeletal protein tubulin in the bacterial genus Prosthecobacter. Proc Natl Acad Sci U S A 2002; 99:17049-54. [PMID: 12486237 PMCID: PMC139267 DOI: 10.1073/pnas.012516899] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Tubulins, the protein constituents of the microtubule cytoskeleton, are present in all known eukaryotes but have never been found in the Bacteria or Archaea. Here we report the presence of two tubulin-like genes [bacterial tubulin a (btuba) and bacterial tubulin b (btubb)] in bacteria of the genus Prosthecobacter (Division Verrucomicrobia). In this study, we investigated the organization and expression of these genes and conducted a comparative analysis of the bacterial and eukaryotic protein sequences, focusing on their phylogeny and 3D structures. The btuba and btubb genes are arranged as adjacent loci within the genome along with a kinesin light chain gene homolog. RT-PCR experiments indicate that these three genes are cotranscribed, and a probable promoter was identified upstream of btuba. On the basis of comparative modeling data, we predict that the Prosthecobacter tubulins are monomeric, unlike eukaryotic alpha and beta tubulins, which form dimers and are therefore unlikely to form microtubule-like structures. Phylogenetic analyses indicate that the Prosthecobacter tubulins are quite divergent and do not support recent horizontal transfer of the genes from a eukaryote. The discovery of genes for tubulin in a bacterial genus may offer new insights into the evolution of the cytoskeleton.
Collapse
Affiliation(s)
- Cheryl Jenkins
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
287
|
Hunter CG, Subramaniam S. Natural coordinate representation for the protein backbone structure. Proteins 2002; 49:206-15. [PMID: 12211001 DOI: 10.1002/prot.10201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A new model for describing the geometry of the C(alpha) backbone atoms in protein molecules is derived. This model uses one continuous variable per amino acid. This is half the number of degrees-of-freedom used in traditional backbone models. The new model was tested on 721 PDB structures and its average accuracy was determined to be 1.14 A cRMSD. This model can be used as a description of local structure that provides higher resolution than the traditional secondary structure categories. Also, because this structure description is one-dimensional, it can be used to align structures with the same efficiency and convergence properties available in the popular sequence alignment tools. Furthermore, the 1:1 correspondence with the amino acid sequence has implications for combined sequence/structure alignment. Conventional secondary structure prediction was used to further reduce the number of degrees-of-freedom in 16 test proteins. In those cases, the average cRMSD degraded from 0.96 to 2.33 A while the number of degrees-of-freedom improved (reduced) by more than 30%.
Collapse
Affiliation(s)
- Cornelius G Hunter
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | | |
Collapse
|
288
|
Zhou H, Zhou Y. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 2002; 11:2714-26. [PMID: 12381853 PMCID: PMC2373736 DOI: 10.1110/ps.0217002] [Citation(s) in RCA: 684] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The distance-dependent structure-derived potentials developed so far all employed a reference state that can be characterized as a residue (atom)-averaged state. Here, we establish a new reference state called the distance-scaled, finite ideal-gas reference (DFIRE) state. The reference state is used to construct a residue-specific all-atom potential of mean force from a database of 1011 nonhomologous (less than 30% homology) protein structures with resolution less than 2 A. The new all-atom potential recognizes more native proteins from 32 multiple decoy sets, and raises an average Z-score by 1.4 units more than two previously developed, residue-specific, all-atom knowledge-based potentials. When only backbone and C(beta) atoms are used in scoring, the performance of the DFIRE-based potential, although is worse than that of the all-atom version, is comparable to those of the previously developed potentials on the all-atom level. In addition, the DFIRE-based all-atom potential provides the most accurate prediction of the stabilities of 895 mutants among three knowledge-based all-atom potentials. Comparison with several physical-based potentials is made.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | |
Collapse
|
289
|
Chhajer M, Crippen GM. A protein folding potential that places the native states of a large number of proteins near a local minimum. BMC STRUCTURAL BIOLOGY 2002; 2:4. [PMID: 12165098 PMCID: PMC126205 DOI: 10.1186/1472-6807-2-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2002] [Accepted: 08/06/2002] [Indexed: 11/22/2022]
Abstract
BACKGROUND We present a simple method to train a potential function for the protein folding problem which, even though trained using a small number of proteins, is able to place a significantly large number of native conformations near a local minimum. The training relies on generating decoys by energy minimization of the native conformations using the current potential and using a physically meaningful objective function (derivative of energy with respect to torsion angles at the native conformation) during the quadratic programming to place the native conformation near a local minimum. RESULTS We also compare the performance of three different types of energy functions and find that while the pairwise energy function is trainable, a solvation energy function by itself is untrainable if decoys are generated by minimizing the current potential starting at the native conformation. The best results are obtained when a pairwise interaction energy function is used with solvation energy function. CONCLUSIONS We are able to train a potential function using six proteins which places a total of 42 native conformations within approximately 4 A rmsd and 71 native conformations within approximately 6 A rmsd of a local minimum out of a total of 91 proteins. Furthermore, the threading test using the same 91 proteins ranks 89 native conformations to be first and the other two as second.
Collapse
Affiliation(s)
- Mukesh Chhajer
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599, U.S.A
| | - Gordon M Crippen
- College of Pharmacy, University of Michigan, Ann Arbor, MI 48109-1065, U.S.A
| |
Collapse
|
290
|
Samudrala R, Levitt M. A comprehensive analysis of 40 blind protein structure predictions. BMC STRUCTURAL BIOLOGY 2002; 2:3. [PMID: 12150712 PMCID: PMC122083 DOI: 10.1186/1472-6807-2-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2002] [Accepted: 08/01/2002] [Indexed: 11/21/2022]
Abstract
BACKGROUND We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships. RESULTS For 23 of these proteins, we produced models ranging from 1.0 to 6.0 A root mean square deviation (RMSD) for the Calpha atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 A Calpha RMSD for 60-100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 A Calpha RMSD for residues 1-80 for T110/rbfa. CONCLUSIONS The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism.
Collapse
Affiliation(s)
- Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University, School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
291
|
Felts AK, Gallicchio E, Wallqvist A, Levy RM. Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all-atom force field and the Surface Generalized Born solvent model. Proteins 2002; 48:404-22. [PMID: 12112706 DOI: 10.1002/prot.10171] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Protein decoy data sets provide a benchmark for testing scoring functions designed for fold recognition and protein homology modeling problems. It is commonly believed that statistical potentials based on reduced atomic models are better able to discriminate native-like from misfolded decoys than scoring functions based on more detailed molecular mechanics models. Recent benchmark tests on small data sets, however, suggest otherwise. In this work, we report the results of extensive decoy detection tests using an effective free energy function based on the OPLS all-atom (OPLS-AA) force field and the Surface Generalized Born (SGB) model for the solvent electrostatic effects. The OPLS-AA/SGB effective free energy is used as a scoring function to detect native protein folds among a total of 48,832 decoys for 32 different proteins from Park and Levitt's 4-state-reduced, Levitt's local-minima, Baker's ROSETTA all-atom, and Skolnick's decoy sets. Solvent electrostatic effects are included through the Surface Generalized Born (SGB) model. All structures are locally minimized without restraints. From an analysis of the individual energy components of the OPLS-AA/SGB energy function for the native and the best-ranked decoy, it is determined that a balance of the terms of the potential is responsible for the minimized energies that most successfully distinguish the native from the misfolded conformations. Different combinations of individual energy terms provide less discrimination than the total energy. The results are consistent with observations that all-atom molecular potentials coupled with intermediate level solvent dielectric models are competitive with knowledge-based potentials for decoy detection and protein modeling problems such as fold recognition and homology modeling.
Collapse
Affiliation(s)
- Anthony K Felts
- Department of Chemistry and Chemical Biology, Rutgers University, Wright-Rieman Laboratories, Piscataway, New Jersey 08854-8087, USA.
| | | | | | | |
Collapse
|
292
|
Fain B, Xia Y, Levitt M. Design of an optimal Chebyshev-expanded discrimination function for globular proteins. Protein Sci 2002; 11:2010-21. [PMID: 12142455 PMCID: PMC2373672 DOI: 10.1110/ps.0200702] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We describe the construction of a scoring function designed to model the free energy of protein folding. An optimization technique is used to determine the best functional forms of the hydrophobic, residue-residue and hydrogen-bonding components of the potential. The scoring function is expanded by use of Chebyshev polynomials, the coefficients of which are determined by minimizing the score, in units of standard deviation, of native structures in the ensembles of alternate decoy conformations. The derived effective potential is then tested on decoy sets used conventionally in such studies. Using our scoring function, we achieve a high level of discrimination between correct and incorrect folds. In addition, our method is able to represent functions of arbitrary shape with fewer parameters than the usual histogram potentials of similar resolution. Finally, our representation can be combined easily with many optimization methods, because the total energy is a linear function of the parameters. Our results show that the techniques of Z-score optimization and Chebyshev expansion work well.
Collapse
Affiliation(s)
- Boris Fain
- Department of Structural Biology, Stanford University, Stanford University School of Medicine, California 94305, USA.
| | | | | |
Collapse
|
293
|
Lomize AL, Reibarkh MY, Pogozheva ID. Interatomic potentials and solvation parameters from protein engineering data for buried residues. Protein Sci 2002; 11:1984-2000. [PMID: 12142453 PMCID: PMC2373680 DOI: 10.1110/ps.0307002] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Van der Waals (vdW) interaction energies between different atom types, energies of hydrogen bonds (H-bonds), and atomic solvation parameters (ASPs) have been derived from the published thermodynamic stabilities of 106 mutants with available crystal structures by use of an originally designed model for the calculation of free-energy differences. The set of mutants included substitutions of uncharged, inflexible, water-inaccessible residues in alpha-helices and beta-sheets of T4, human, and hen lysozymes and HI ribonuclease. The determined energies of vdW interactions and H-bonds were smaller than in molecular mechanics and followed the "like dissolves like" rule, as expected in condensed media but not in vacuum. The depths of modified Lennard-Jones potentials were -0.34, -0.12, and -0.06 kcal/mole for similar atom types (polar-polar, aromatic-aromatic, and aliphatic-aliphatic interactions, respectively) and -0.10, -0.08, -0.06, -0.02, and nearly 0 kcal/mole for different types (sulfur-polar, sulfur-aromatic, sulfur-aliphatic, aliphatic-aromatic, and carbon-polar, respectively), whereas the depths of H-bond potentials were -1.5 to -1.8 kcal/mole. The obtained solvation parameters, that is, transfer energies from water to the protein interior, were 19, 7, -1, -21, and -66 cal/moleA(2) for aliphatic carbon, aromatic carbon, sulfur, nitrogen, and oxygen, respectively, which is close to the cyclohexane scale for aliphatic and aromatic groups but intermediate between octanol and cyclohexane for others. An analysis of additional replacements at the water-protein interface indicates that vdW interactions between protein atoms are reduced when they occur across water.
Collapse
Affiliation(s)
- Andrei L Lomize
- College of Pharmacy, University of Michigan, Ann Arbor 48109-1065, USA.
| | | | | |
Collapse
|
294
|
Van Loy CP, Sokurenko EV, Samudrala R, Moseley SL. Identification of amino acids in the Dr adhesin required for binding to decay-accelerating factor. Mol Microbiol 2002; 45:439-52. [PMID: 12123455 DOI: 10.1046/j.1365-2958.2002.03022.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Members of the Dr family of adhesins of Escherichia coli recognize as a receptor the Dr(a) blood-group antigen present on the complement regulatory and signalling molecule, decay-accelerating factor (DAF). One member of this family, the Dr haemagglutinin, also binds to a second receptor, type IV collagen. Structure/function information regarding these adhesins has been limited and domains directly involved in the interaction with DAF have not been determined. We devised a strategy to identify amino acids in the Dr haemagglutinin that are specifically involved in the interaction with DAF. The gene encoding the adhesive subunit, draE, was subjected to random mutagenesis and used to complement a strain defective for its expression. The resulting mutants were enriched and screened to obtain those that do not bind to DAF, but retain binding to type IV collagen. Individual amino acid changes at positions 10, 63, 65, 75, 77, 79 and 131 of the mature DraE sequence significantly reduced the ability of the DraE adhesin to bind DAF, but not collagen. Over half of the mutants obtained had substitutions within amino acids 63-81. Analysis of predicted structures of DraE suggest that these proximal residues may cluster to form a binding domain for DAF.
Collapse
Affiliation(s)
- Cristina P Van Loy
- University of Washington, Department of Microbiology, Box 357242, Seattle, WA 98195-7242, USA
| | | | | | | |
Collapse
|
295
|
Tosatto SCE, Bindewald E, Hesser J, Männer R. A divide and conquer approach to fast loop modeling. Protein Eng Des Sel 2002; 15:279-86. [PMID: 11983928 DOI: 10.1093/protein/15.4.279] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We describe a fast ab initio method for modeling local segments in protein structures. The algorithm is based on a divide and conquer approach and uses a database of precalculated look-up tables, which represent a large set of possible conformations for loop segments of variable length. The target loop is recursively decomposed until the resulting conformations are small enough to be compiled analytically. The algorithm, which is not restricted to any specific loop length, generates a ranked set of loop conformations in 20-180 s on a desktop PC. The prediction quality is evaluated in terms of global RMSD. Depending on loop length the top prediction varies between 1.06 A RMSD for three-residue loops and 3.72 A RMSD for eight-residue loops. Due to its speed the method may also be useful to generate alternative starting conformations for complex simulations.
Collapse
Affiliation(s)
- Silvio C E Tosatto
- Institute for Computational Medicine and Chair for Computer Science V, Universität Mannheim, B 6, 26, 68131 Mannheim, Germany
| | | | | | | |
Collapse
|
296
|
Abstract
A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue-level statistical potential were optimized, including distance-dependent, contact, Phi/Psi dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z-score of the model energy. The performance of a Z-score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance-dependent and accessible surface potentials. The distance-dependent potential that is optimal for assessing models of all sizes uses both C(alpha) and C(beta) atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 A, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C(beta) atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large-scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment.
Collapse
Affiliation(s)
- Francisco Melo
- Laboratories of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA
| | | | | |
Collapse
|
297
|
Koehl P, Levitt M. Improved recognition of native-like protein structures using a family of designed sequences. Proc Natl Acad Sci U S A 2002; 99:691-6. [PMID: 11782533 PMCID: PMC117367 DOI: 10.1073/pnas.022408799] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2001] [Indexed: 11/18/2022] Open
Abstract
The goal of the inverse protein folding problem is to identify amino acid sequences that stabilize a given target protein conformation. Methods that attempt to solve this problem have proven useful for protein sequence design. Here we show that the same methods can provide valuable information for protein fold recognition and for ab initio protein structure prediction. We present a measure of the compatibility of a test sequence with a target model structure, based on computational protein design. The model structure is used as input to design a family of low free energy sequences, and these sequences are compared with the test sequence by using a metric in sequence space based on nearest-neighbor connectivity. We find that this measure is able to recognize the native fold of a myoglobin sequence among different globin folds. It is also powerful enough to recognize near-native protein structures among non-native models.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
298
|
Lu H, Skolnick J. A distance-dependent atomic knowledge-based potential for improved protein structure selection. Proteins 2001; 44:223-32. [PMID: 11455595 DOI: 10.1002/prot.1087] [Citation(s) in RCA: 243] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A heavy atom distance-dependent knowledge-based pairwise potential has been developed. This statistical potential is first evaluated and optimized with the native structure z-scores from gapless threading. The potential is then used to recognize the native and near-native structures from both published decoy test sets, as well as decoys obtained from our group's protein structure prediction program. In the gapless threading test, there is an average z-score improvement of 4 units in the optimized atomic potential over the residue-based quasichemical potential. Examination of the z-scores for individual pairwise distance shells indicates that the specificity for the native protein structure is greatest at pairwise distances of 3.5-6.5 A, i.e., in the first solvation shell. On applying the current atomic potential to test sets obtained from the web, composed of native protein and decoy structures, the current generation of the potential performs better than residue-based potentials as well as the other published atomic potentials in the task of selecting native and near-native structures. This newly developed potential is also applied to structures of varying quality generated by our group's protein structure prediction program. The current atomic potential tends to pick lower RMSD structures than do residue-based contact potentials. In particular, this atomic pairwise interaction potential has better selectivity especially for near-native structures. As such, it can be used to select near-native folds generated by structure prediction algorithms as well as for protein structure refinement.
Collapse
Affiliation(s)
- H Lu
- Laboratory of Computational Genomics, Donald Danforth Plant Science Center, St. Louis, Missouri 63141, USA
| | | |
Collapse
|
299
|
Ota M, Isogai Y, Nishikawa K. Knowledge-based potential defined for a rotamer library to design protein sequences. PROTEIN ENGINEERING 2001; 14:557-64. [PMID: 11579224 DOI: 10.1093/protein/14.8.557] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A knowledge-based potential for a rotamer library was developed to design protein sequences. Protein side-chain conformations are represented by 56 templates. Each of their fitness to a given structural site-environment is evaluated by a combined function of the three knowledge-based terms, i.e. two-body side-chain packing, one-body hydration and local conformation. The number of matches between the native sequence and the structural site-environment in the database and that of the virtually settled mismatches, counted in advance, were transformed into the energy scores. In the best-14 test (assessment for the reproduction ability of the native rotamer on its structural site within a quarter of 56 fitness rank positions), the structural stability analysis on mutants of human and T4 lysozymes and the inverse-folding search by a structure profile against the sequence database, this function performs better than the function deduced with the conventional normalization and our previously developed function. Targeting various structural motifs, de novo sequence design was conducted with the function. The sequences thus obtained exhibit reasonable molecular masses and hydrophobic/hydrophilic patterns similar to the native sequences of the target and act as if they were the homologs to the target proteins in BLASTP search. This significant improvement is discussed in terms of the reference state for normalization and the crucial role of short-range repulsion to prohibit residue bumps.
Collapse
Affiliation(s)
- M Ota
- National Institute of Genetics, Mishima, Shizuoka 411-8540. The Institute of Physical and Chemical Research (RIKEN), Wako,Saitama 351-0198, Japan.
| | | | | |
Collapse
|
300
|
Abstract
The SLoop database of supersecondary fragments, first described by Donate et al. (Protein Sci., 1996, 5, 2600-2616), contains protein loops, classified according to structural similarity. The database has recently been updated and currently contains over 10 000 loops up to 20 residues in length, which cluster into over 560 well populated classes. The database can be found at http://www-cryst.bioc.cam.ac.uk/~sloop. In this paper, we identify conserved structural features such as main chain conformation and hydrogen bonding. Using the original approach of Rufino and co-workers (1997), the correct structural class is predicted with the highest SLoop score for 35% of loops. This rises to 65% by considering the three highest scoring class predictions and to 75% in the top five scoring class predictions. Inclusion of residues from the neighbouring secondary structures and use of substitution tables derived using a reduced definition of secondary structure increase these prediction accuracies to 58, 78 and 85%, respectively. This suggests that capping residues can stabilize the loop conformation as well as that of the secondary structure. Further increases are achieved if only well-populated classes are considered in the prediction. These results correspond to an average loop root mean square deviation of between 0.4 and 2.6 A for loops up to five residues in length.
Collapse
Affiliation(s)
- D F Burke
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB1 2GA, UK.
| | | |
Collapse
|