101
|
Lwin TZ, Luo R. Overcoming entropic barrier with coupled sampling at dual resolutions. J Chem Phys 2007; 123:194904. [PMID: 16321110 DOI: 10.1063/1.2102871] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
An enhanced sampling method is proposed for ab initio protein folding simulations. The new method couples a high-resolution model for accuracy and a low-resolution model for efficiency. It aims to overcome the entropic barrier found in the exponentially large protein conformational space when a high-resolution model, such as an all-atom molecular mechanics force field, is used. The proposed method is designed to satisfy the detailed balance condition so that the Boltzmann distribution can be generated in all sampling trajectories in both high and low resolutions. The method was tested on model analytical energy functions and ab initio folding simulations of a beta-hairpin peptide. It was found to be more efficient than replica-exchange method that is used as its building block. Analysis with the analytical energy functions shows that the number of energy calculations required to find global minima and to converge mean potential energies is much fewer with the new method. Ergodic measure shows that the new method explores the conformational space more rapidly. We also studied imperfect low-resolution energy models and found that the introduction of errors in low-resolution models does decrease its sampling efficiency. However, a reasonable increase in efficiency is still observed when the global minima of the low-resolution models are in the vicinity of the global minimum basin of the high-resolution model. Finally, our ab initio folding simulation of the tested peptide shows that the new method is able to fold the peptide in a very short simulation time. The structural distribution generated by the new method at the equilibrium portion of the trajectory resembles that in the equilibrium simulation starting from the crystal structure.
Collapse
Affiliation(s)
- Thur Zar Lwin
- Chemical and Material Physics Graduate Program, University of California, Irvine, CA 92697-3900, USA
| | | |
Collapse
|
102
|
Carr JM, Wales DJ. Global optimization and folding pathways of selected alpha-helical proteins. J Chem Phys 2007; 123:234901. [PMID: 16392943 DOI: 10.1063/1.2135783] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The results of basin-hopping global optimization simulations are presented for four small, alpha-helical proteins described by a coarse-grained potential. A step-taking scheme that incorporates the local conformational preferences extracted from a large number of high-resolution protein structures is compared with an unbiased scheme. In addition, the discrete path sampling method is used to investigate the folding of one of the proteins, namely, the villin headpiece subdomain. Folding times from kinetic Monte Carlo simulations and iterative calculations based on a Markovian first-step analysis for the resulting stationary-point database are in good mutual agreement, but differ significantly from the experimental values, probably because the native state is not the global free energy minimum for the potential employed.
Collapse
Affiliation(s)
- Joanne M Carr
- University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | | |
Collapse
|
103
|
Ruvinsky AM. Role of binding entropy in the refinement of protein-ligand docking predictions: analysis based on the use of 11 scoring functions. J Comput Chem 2007; 28:1364-72. [PMID: 17342720 DOI: 10.1002/jcc.20580] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present results of testing the ability of eleven popular scoring functions to predict native docked positions using a recently developed method (Ruvinsky and Kozintsev, J Comput Chem 2005, 26, 1089) for estimation the entropy contributions of relative motions to protein-ligand binding affinity. The method is based on the integration of the configurational integral over clusters obtained from multiple docked positions. We use a test set of 100 PDB protein-ligand complexes and ensembles of 101 docked positions generated by (Wang et al. J Med Chem 2003, 46, 2287) for each ligand in the test set. To test the suggested method we compared the averaged root-mean square deviations (RMSD) of the top-scored ligand docked positions, accounting and not accounting for entropy contributions, relative to the experimentally determined positions. We demonstrate that the method increases docking accuracy by 10-21% when used in conjunction with the AutoDock scoring function, by 2-25% with G-Score, by 7-41% with D-Score, by 0-8% with LigScore, by 1-6% with PLP, by 0-12% with LUDI, by 2-8% with F-Score, by 7-29% with ChemScore, by 0-9% with X-Score, by 2-19% with PMF, and by 1-7% with DrugScore. We also compared the performance of the suggested method with the method based on ranking by cluster occupancy only. We analyze how the choice of a clustering-RMSD and a low bound of dense clusters impacts on docking accuracy of the scoring methods. We derive optimal intervals of the clustering-RMSD for 11 scoring functions.
Collapse
Affiliation(s)
- Anatoly M Ruvinsky
- Center for Bioinformatics, The University of Kansas, 2030 Becker Drive, Lawrence, Kansas 66047, USA.
| |
Collapse
|
104
|
Protein structure prediction by all-atom free-energy refinement. BMC STRUCTURAL BIOLOGY 2007; 7:12. [PMID: 17371594 PMCID: PMC1832197 DOI: 10.1186/1472-6807-7-12] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2006] [Accepted: 03/19/2007] [Indexed: 11/18/2022]
Abstract
Background The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01. Results We use PFF01 to rank and cluster the conformations for 32 proteins generated by ROSETTA. For 22/10 high-quality/low quality decoy sets we select near-native conformations with an average Cα root mean square deviation of 3.03 Å/6.04 Å. The protocol incorporates an inherent reliability indicator that succeeds for 78% of the decoy sets. In over 90% of these cases near-native conformations are selected from the decoy set. This success rate is rationalized by the quality of the decoys and the selectivity of the PFF01 forcefield, which ranks near-native conformations an average 3.06 standard deviations below that of the relaxed decoys (Z-score). Conclusion All-atom free-energy relaxation with PFF01 emerges as a powerful low-cost approach toward generic de-novo protein structure prediction. The approach can be applied to large all-atom decoy sets of any origin and requires no preexisting structural information to identify the native conformation. The study provides evidence that a large class of proteins may be foldable by PFF01.
Collapse
|
105
|
Johnson CP, Gaetani M, Ortiz V, Bhasin N, Harper S, Gallagher PG, Speicher DW, Discher DE. Pathogenic proline mutation in the linker between spectrin repeats: disease caused by spectrin unfolding. Blood 2006; 109:3538-43. [PMID: 17192394 PMCID: PMC1852230 DOI: 10.1182/blood-2006-07-038588] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Pathogenic mutations in alpha and beta spectrin result in a variety of syndromes, including hereditary elliptocytosis (HE), hereditary pyropoikilocytosis (HPP), and hereditary spherocytosis (HS). Although some mutations clearly lie at sites of interaction, such as the sites of spectrin alpha-betatetramer formation, a surprising number of HE-causing mutations have been identified within linker regions between distal spectrin repeats. Here we apply solution structural and single molecule methods to the folding and stability of recombinant proteins consisting of the first 5 spectrin repeats of alpha-spectrin, comparing normal spectrin with a pathogenic linker mutation, Q471P, between repeats R4 and R5. Results show that the linker mutation destabilizes a significant fraction of the 5-repeat construct at 37 degrees C, whereas the WT remains fully folded well above body temperature. In WT protein, helical linkers propagate stability from one repeat to the next, but the mutation disrupts the stabilizing influence of adjacent repeats. The results suggest a molecular mechanism for the high frequency of disease caused by proline mutations in spectrin linkers.
Collapse
Affiliation(s)
- Colin P Johnson
- Molecular and Cell Biophysics Laboratory, University of Pennsylvania, 3699 Market Street, Philadelphia, PA 19104, USA
| | | | | | | | | | | | | | | |
Collapse
|
106
|
Alminaite A, Halttunen V, Kumar V, Vaheri A, Holm L, Plyusnin A. Oligomerization of hantavirus nucleocapsid protein: analysis of the N-terminal coiled-coil domain. J Virol 2006; 80:9073-81. [PMID: 16940519 PMCID: PMC1563903 DOI: 10.1128/jvi.00515-06] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Hantaviruses constitute a genus in the family Bunyaviridae. They are enveloped negative-strand RNA viruses with a tripartite genome encoding the nucleocapsid (N) protein, the two surface glycoproteins Gn and Gc, and an RNA-dependent RNA polymerase. The N protein is the most abundant component of the virion; it encapsidates genomic RNA segments forming ribonucleoproteins and participates in genome transcription and replication as well as virus assembly. In the course of RNA encapsidation, N protein forms intermediate trimers via head-to-head and tail-to-tail interactions. We analyzed the amino-terminal trimerization domain (amino acid residues 1 to 77) of Tula hantavirus using computer modeling, mammalian two-hybrid assay, and immunofluorescence assay. The results obtained were consistent with the existence of an antiparallel coiled-coil stabilized by interactions between hydrophobic residues. Residues L44, V51, and L58 were important for the N-N interaction; other residues, e.g., L25 and V32, also made a contribution, albeit a modest one. Our alignments of the N-terminal domain of the hantaviral N proteins suggest the coiled-coil structure, and hence the mode of N-protein oligomerization, is conserved among hantaviruses.
Collapse
Affiliation(s)
- Agne Alminaite
- Department of Virology, Haartman Institute, P.O. Box 21, FIN-00014 University of Helsinki, Helsinki, Finland
| | | | | | | | | | | |
Collapse
|
107
|
Pollastri G, Vullo A, Frasconi P, Baldi P. Modular DAG-RNN architectures for assembling coarse protein structures. J Comput Biol 2006; 13:631-50. [PMID: 16706716 DOI: 10.1089/cmb.2006.13.631] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We develop and test machine learning methods for the prediction of coarse 3D protein structures, where a protein is represented by a set of rigid rods associated with its secondary structure elements (alpha-helices and beta-strands). First, we employ cascades of recursive neural networks derived from graphical models to predict the relative placements of segments. These are represented as discretized distance and angle maps, and the discretization levels are statistically inferred from a large and curated dataset. Coarse 3D folds of proteins are then assembled starting from topological information predicted in the first stage. Reconstruction is carried out by minimizing a cost function taking the form of a purely geometrical potential. We show that the proposed architecture outperforms simpler alternatives and can accurately predict binary and multiclass coarse maps. The reconstruction procedure proves to be fast and often leads to topologically correct coarse structures that could be exploited as a starting point for various protein modeling strategies. The fully integrated rod-shaped protein builder (predictor of contact maps + reconstruction algorithm) can be accessed at http://distill.ucd.ie/.
Collapse
Affiliation(s)
- Gianluca Pollastri
- School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland.
| | | | | | | |
Collapse
|
108
|
Stumpff-Kane AW, Feig M. A correlation-based method for the enhancement of scoring functions on funnel-shaped energy landscapes. Proteins 2006; 63:155-64. [PMID: 16397892 DOI: 10.1002/prot.20853] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A correlation-based approach is introduced for enhancing the ability of structure-scoring methods to identify and distinguish native-like conformations. The proposed method relies on a funnel-shaped scoring function that decreases steadily toward the native state. It takes advantage of the idea that the structure from a given ensemble that is closest to the native basin leads to the highest correlation coefficient between a given score and distance to that structure as an approximation of the native state for the entire ensemble. The method is applied successfully to a number of different test cases that demonstrate substantial improvements in the correlation of the score with the distance from the true native state but also result in the selection of more native-like structures compared to the original score.
Collapse
Affiliation(s)
- Andrew W Stumpff-Kane
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | | |
Collapse
|
109
|
Chivian D, Kim DE, Malmström L, Schonbrun J, Rohl CA, Baker D. Prediction of CASP6 structures using automated Robetta protocols. Proteins 2006; 61 Suppl 7:157-166. [PMID: 16187358 DOI: 10.1002/prot.20733] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The Robetta server and revised automatic protocols were used to predict structures for CASP6 targets. Robetta is a publicly available protein structure prediction server (http://robetta.bakerlab.org/ that uses the Rosetta de novo and homology modeling structure prediction methods. We incorporated some of the lessons learned in the CASP5 experiment into the server prior to participating in CASP6. We additionally tested new ideas that were amenable to full-automation with an eye toward improving the server. We find that the Robetta server shows the greatest promise for the more challenging targets. The most significant finding from CASP5, that automated protocols can be roughly comparable in ability with the better human-intervention predictors, is repeated here in CASP6.
Collapse
Affiliation(s)
- Dylan Chivian
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | |
Collapse
|
110
|
Fang Q, Shortle D. Enhanced sampling near the native conformation using statistical potentials for local side-chain and backbone interactions. Proteins 2006; 60:97-102. [PMID: 15852306 DOI: 10.1002/prot.20483] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In the preceding article in this issue of Proteins, an empirical energy function consisting of 4 statistical potentials that quantify local side-chain-backbone and side-chain-side-chain interactions has been demonstrated to successfully identify the native conformations of short sequence fragments and the native structure within large sets of high-quality decoys. Because this energy function consists entirely of interactions between residues separated by fewer than 5 positions, it can be used at the earliest stage of ab initio structure prediction to enhance the efficiency of conformational search. In this article, protein fragments are generated de novo by recombining very short segments of protein structures (2, 4, or 6 residues), either selected at random or optimized with respect this local energy function. When local energy is optimized in selected fragments, more efficient sampling of conformational space near the native conformation is consistently observed for 450 randomly selected single turn fragments, with turn lengths varying from 3 to 12 residues and all 4 combinations of flanking secondary structure. These results further demonstrate the energetic significance of local interactions in protein conformations. When used in combination with longer range energy functions, application of these potentials should lead to more accurate prediction of protein structure.
Collapse
Affiliation(s)
- Qiaojun Fang
- Department of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | |
Collapse
|
111
|
Fujitsuka Y, Chikenji G, Takada S. SimFold energy function for de novo protein structure prediction: consensus with Rosetta. Proteins 2006; 62:381-98. [PMID: 16294329 DOI: 10.1002/prot.20748] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.
Collapse
Affiliation(s)
- Yoshimi Fujitsuka
- Graduate School of Natural Science and Technology, Kobe University, Kobe, Japan
| | | | | |
Collapse
|
112
|
Ruvinsky AM, Kozintsev AV. Novel statistical-thermodynamic methods to predict protein-ligand binding positions using probability distribution functions. Proteins 2006; 62:202-8. [PMID: 16287127 DOI: 10.1002/prot.20673] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present two novel methods to predict native protein-ligand binding positions. Both methods identify the native binding position as the most probable position corresponding to a maximum of a probability distribution function (PDF) of possible binding positions in a protein active site. Possible binding positions are the origins of clusters composed, on the basis of root-mean square deviations (RMSD), from the multiple ligand positions determined by a docking algorithm. The difference between the methods lies in the ways the PDF is derived. To validate the suggested methods, we compare the averaged RMSD of the predicted ligand docked positions relative to the experimentally determined positions for a set of 135 PDB protein-ligand complexes. We demonstrate that the suggested methods improve docking accuracy by as much as 21-24% in comparison with a method that simply identifies the binding position as the energy top-scored ligand position.
Collapse
Affiliation(s)
- A M Ruvinsky
- Force Field Laboratory, Algodign, LLC, Moscow, Russia.
| | | |
Collapse
|
113
|
Skolnick J. In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol 2006; 16:166-71. [PMID: 16524716 DOI: 10.1016/j.sbi.2006.02.004] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2005] [Revised: 02/10/2006] [Accepted: 02/23/2006] [Indexed: 11/19/2022]
Abstract
Key to successful protein structure prediction is a potential that recognizes the native state from misfolded structures. Recent advances in empirical potentials based on known protein structures include improved reference states for assessing random interactions, sidechain-orientation-dependent pair potentials, potentials for describing secondary or supersecondary structural preferences and, most importantly, optimization protocols that sculpt the energy landscape to enhance the correlation between native-like features and the energy. Improved clustering algorithms that select native-like structures on the basis of cluster density also resulted in greater prediction accuracy. For template-based modeling, these advances allowed improvement in predicted structures relative to their initial template alignments over a wide range of target-template homology. This represents significant progress and suggests applications to proteome-scale structure prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street, Buffalo, NY 14203, USA.
| |
Collapse
|
114
|
Saraf MC, Moore GL, Goodey NM, Cao VY, Benkovic SJ, Maranas CD. IPRO: an iterative computational protein library redesign and optimization procedure. Biophys J 2006; 90:4167-80. [PMID: 16513775 PMCID: PMC1459523 DOI: 10.1529/biophysj.105.079277] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A number of computational approaches have been developed to reengineer promising chimeric proteins one at a time through targeted point mutations. In this article, we introduce the computational procedure IPRO (iterative protein redesign and optimization procedure) for the redesign of an entire combinatorial protein library in one step using energy-based scoring functions. IPRO relies on identifying mutations in the parental sequences, which when propagated downstream in the combinatorial library, improve the average quality of the library (e.g., stability, binding affinity, specific activity, etc.). Residue and rotamer design choices are driven by a globally convergent mixed-integer linear programming formulation. Unlike many of the available computational approaches, the procedure allows for backbone movement as well as redocking of the associated ligands after a prespecified number of design iterations. IPRO can also be used, as a limiting case, for the redesign of a single or handful of individual sequences. The application of IPRO is highlighted through the redesign of a 16-member library of Escherichia coli/Bacillus subtilis dihydrofolate reductase hybrids, both individually and through upstream parental sequence redesign, for improving the average binding energy. Computational results demonstrate that it is indeed feasible to improve the overall library quality as exemplified by binding energy scores through targeted mutations in the parental sequences.
Collapse
Affiliation(s)
- Manish C Saraf
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | |
Collapse
|
115
|
Verma A, Schug A, Lee KH, Wenzel W. Basin hopping simulations for all-atom protein folding. J Chem Phys 2006; 124:044515. [PMID: 16460193 DOI: 10.1063/1.2138030] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We investigate different protocols of the basin hopping technique for de novo protein folding. Using the protein free-energy force field PFF01 we report the reproducible all-atom folding of the 20-amino-acid tryptophan-cage protein [Protein Data Bank (PDB) code: 112y] and of the recently discovered 26-amino-acid potassium channel blocker (PDB code: 1wqc), which exhibits an unusual fold. We find that simulations with increasing cycle length and random starting temperatures perform best in comparison with other parametrizations. The basin hopping technique emerges as a simple but very efficient and robust workhorse for all-atom protein folding.
Collapse
Affiliation(s)
- A Verma
- Forschungszentrum Karlsruhe GmbH, Institut für Wissenschaftliches Rechnen, Postfach 3640, D-76021 Karlsruhe, Germany
| | | | | | | |
Collapse
|
116
|
Gong H, Fleming PJ, Rose GD. Building native protein conformation from highly approximate backbone torsion angles. Proc Natl Acad Sci U S A 2005; 102:16227-32. [PMID: 16251268 PMCID: PMC1283474 DOI: 10.1073/pnas.0508415102] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2005] [Indexed: 11/18/2022] Open
Abstract
Reconstructing a protein in three dimensions from its backbone torsion angles is an ongoing challenge because minor inaccuracies in these angles produce major errors in the structure. As a familiar example, a small change in an elbow angle causes a large displacement at the end of your arm, the longer the arm, the larger the displacement. Even accurate knowledge of the backbone torsions and Psi is insufficient, owing to the small, but cumulative, deviations from ideality in backbone planarity, which, if ignored, also lead to major errors in the structure. Against this background, we conducted a computational experiment to assess whether protein conformation can be determined from highly approximate backbone torsion angles, the kind of information that is now obtained readily from NMR. Specifically, backbone torsion angles were taken from proteins of known structure and mapped into 60 degrees x 60 degrees grid squares, called mesostates. Side-chain atoms beyond the beta -carbon were discarded. A mesostate representation of the protein backbone was then used to extract likely candidates from a fragment library of mesostate pentamers, followed by Monte Carlo-based fragment-assembly simulations to identify stable conformations compatible with the given mesostate sequence. Only three simple energy terms were used to gauge stability: molecular compaction, soft-sphere repulsion, and hydrogen bonding. For the six representative proteins described here, stable conformers can be partitioned into a remarkably small number of topologically distinct clusters. Among these, the native topology is found with high frequency and can be identified as the cluster with the most favorable energy.
Collapse
Affiliation(s)
- Haipeng Gong
- T. C. Jenkins Department of Biophysics, The Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | | | | |
Collapse
|
117
|
Camproux AC, Tufféry P. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. Biochim Biophys Acta Gen Subj 2005; 1724:394-403. [PMID: 16040198 DOI: 10.1016/j.bbagen.2005.05.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2005] [Revised: 05/10/2005] [Accepted: 05/11/2005] [Indexed: 11/19/2022]
Abstract
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
Collapse
Affiliation(s)
- A C Camproux
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France.
| | | |
Collapse
|
118
|
Sharp JS, Guo JT, Uchiki T, Xu Y, Dealwis C, Hettich RL. Photochemical surface mapping of C14S-Sml1p for constrained computational modeling of protein structure. Anal Biochem 2005; 340:201-12. [PMID: 15840492 DOI: 10.1016/j.ab.2005.02.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2004] [Indexed: 11/29/2022]
Abstract
Photochemically generated hydroxyl radicals were used to map solvent-exposed regions in the C14S mutant of the protein Sml1p, a regulator of the ribonuclease reductase enzyme Rnr1p in Saccharomyces cerevisiae. By using high-performance mass spectrometry to characterize the oxidized peptides created by the hydroxyl radical reactions, amino acid solvent-accessibility data for native and denatured C14S Sml1p that revealed a solvent-excluding tertiary structure in the native state were obtained. The data on solvent accessibilities of various amino acids within the protein were then utilized to evaluate the de novo computational models generated by the HMMSTR/Rosetta server. The top five models initially generated by the server all disagreed with both published nuclear magnetic resonance (NMR) data and the solvent-accessibility data obtained in this study. A structural model adjusted to fit the previously reported NMR data satisfied most of the solvent-accessibility constraints. Through minor adjustment of the rotamers of two amino acid side chains for this latter structure, a model that not only provided a lower energy conformation but also completely satisfied previously reported data from NMR and tryptophan fluorescence measurements, in addition to the solvent-accessibility data presented here, was generated.
Collapse
Affiliation(s)
- Joshua S Sharp
- Graduate School of Genome Science and Technology, The University of Tennessee and Oak Ridge National Laboratory, 1060 Commerce Park, Oak Ridge, TN 37830-8026, USA
| | | | | | | | | | | |
Collapse
|
119
|
Abstract
The Yeast Resource Center Public Data Repository (YRC PDR) serves as a single point of access for the experimental data produced from many collaborations typically studying Saccharomyces cerevisiae (baker's yeast). The experimental data include large amounts of mass spectrometry results from protein co-purification experiments, yeast two-hybrid interaction experiments, fluorescence microscopy images and protein structure predictions. All of the data are accessible via searching by gene or protein name, and are available on the Web at http://www.yeastrc.org/pdr/.
Collapse
Affiliation(s)
- Michael Riffle
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
120
|
Meiler J, Baker D. The fumarate sensor DcuS: progress in rapid protein fold elucidation by combining protein structure prediction methods with NMR spectroscopy. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2005; 173:310-316. [PMID: 15780923 DOI: 10.1016/j.jmr.2004.11.031] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2004] [Revised: 11/24/2004] [Indexed: 05/24/2023]
Abstract
We illustrate how moderate resolution protein structures can be rapidly obtained by interlinking computational prediction methodologies with un- or partially assigned NMR data. To facilitate the application of our recently described method of ranking and subsequent refining alternative structural models using unassigned NMR data [Proc. Natl. Acad. Sci. USA 100 (2003) 15404] for such "structural genomics"-type experiments it is combined with protein models from several prediction techniques, enhanced to utilize partial assignments, and applied on a protein with an unknown structure and fold. From the original NMR spectra obtained for the 140 residue fumarate sensor DcuS, 1100 1H, 13C, and 15N chemical shift signals, 3000 1H-1H NOESY cross peak intensities, and 209 backbone residual dipolar couplings were extracted and used to rank models produced by de novo structure prediction and comparative modeling methods. The ranking proceeds in two steps: first, an optimal assignment of the NMR peaks to atoms is found for each model independently, and second, the models are ranked based on the consistency between the NMR data and the model assuming these optimal assignments. The low-resolution model selected using this ranking procedure had the correct overall fold and a global backbone RMSD of 6.0 angstrom, and was subsequently refined to 3.7 angstrom RMSD. With the incorporation of a small number of NOE and residual dipolar coupling constraints available very early in the traditional spectral assignment process, a model with an RMSD of 2.8 angstrom could rapidly be built. The ability to generate moderate resolution models within days of NMR data collection should facilitate large scale NMR structure determination efforts.
Collapse
Affiliation(s)
- Jens Meiler
- Department of Biochemistry, University of Washington, BOX 357350, Seattle, WA 98195, USA.
| | | |
Collapse
|
121
|
Herges T, Wenzel W. In silico folding of a three helix protein and characterization of its free-energy landscape in an all-atom force field. PHYSICAL REVIEW LETTERS 2005; 94:018101. [PMID: 15698135 DOI: 10.1103/physrevlett.94.018101] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2003] [Indexed: 05/24/2023]
Abstract
We report the reproducible first-principles folding of the 40 amino-acid, three-helix headpiece of the HIV accessory protein in a recently developed all-atom free-energy force field. Six of 20 simulations using an adapted basin-hopping method converged to better than 3 A backbone rms deviation to the experimental structure. Using over 60 000 low-energy conformations of this protein, we constructed a decoy tree that completely characterizes its folding funnel.
Collapse
Affiliation(s)
- T Herges
- Forschungszentrum Karlsruhe, Institut für Nanotechnologie, 76021 Karlsruhe, Germany
| | | |
Collapse
|
122
|
Holmes JB, Tsai J. Some fundamental aspects of building protein structures from fragment libraries. Protein Sci 2005; 13:1636-50. [PMID: 15152094 PMCID: PMC2279988 DOI: 10.1110/ps.03494504] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We have investigated some of the basic principles that influence generation of protein structures using a fragment-based, random insertion method. We tested buildup methods and fragment library quality for accuracy in constructing a set of known structures. The parameters most influential in the construction procedure are bond and torsion angles with minor inaccuracies in bond angles alone causing >6 A CalphaRMSD for a 150-residue protein. Idealization to a standard set of values corrects this problem, but changes the torsion angles and does not work for every structure. Alternatively, we found using Cartesian coordinates instead of torsion angles did not reduce performance and can potentially increase speed and accuracy. Under conditions simulating ab initio structure prediction, fragment library quality can be suboptimal and still produce near-native structures. Using various clustering criteria, we created a number of libraries and used them to predict a set of native structures based on nonnative fragments. Local CalphaRMSD fit of fragments, library size, and takeoff/landing angle criteria weakly influence the accuracy of the models. Based on a fragment's minimal perturbation upon insertion into a known structure, a seminative fragment library was created that produced more accurate structures with fragments that were less similar to native fragments than the other sets. These results suggest that fragments need only contain native-like subsections, which when correctly overlapped, can recreate a native-like model. For fragment-based, random insertion methods used in protein structure prediction and design, our findings help to define the parameters this method needs to generate near-native structures.
Collapse
Affiliation(s)
- J Bradley Holmes
- Department of Biophysics and Biochemistry, Texas A&M University, College Station, TX 77843, USA
| | | |
Collapse
|
123
|
Lee J, Kim SY, Lee J. Protein structure prediction based on fragment assembly and parameter optimization. Biophys Chem 2005; 115:209-14. [PMID: 15752606 DOI: 10.1016/j.bpc.2004.12.046] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2004] [Revised: 11/09/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We propose a novel method for ab-initio prediction of protein tertiary structures based on the fragment assembly and global optimization. Fifteen residue long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. Then in order to enhance the performance of the prediction method, we optimize the linear parameters of the energy function, so that the native-like conformations become energetically more favorable than the non-native ones for proteins with known structures. We test the feasibility of the parameter optimization procedure by applying it to the training set consisting of three proteins: the 10-55 residue fragment of staphylococcal protein A (PDB ID 1bdd), a designed protein betanova, and 1fsd.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Computer Aided Molecular Design Research Center, Bioinformatics and Molecular Design Technology Innovation Center, Soongsil University, Seoul 156-743, South Korea.
| | | | | |
Collapse
|
124
|
Wen EZ, Hsieh MJ, Kollman PA, Luo R. Enhanced ab initio protein folding simulations in Poisson-Boltzmann molecular dynamics with self-guiding forces. J Mol Graph Model 2004; 22:415-24. [PMID: 15099837 DOI: 10.1016/j.jmgm.2003.12.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have investigated the sampling efficiency in molecular dynamics with the PB implicit solvent when self-guiding forces are added. Compared with a high-temperature dynamics simulation, the use of self-guiding forces in room-temperature dynamics is found to be rather efficient as measured by potential energy fluctuation, gyration radius fluctuation, backbone RMSD fluctuation, number of unique clusters, and distribution of low RMSD structures over simulation time. Based on the enhanced sampling method, we have performed ab initio folding simulations of two small proteins, betabetaalpha1 and villin headpiece. The preliminary data for the folding simulations is presented. It is found that betabetaalpha1 folding proceeds by initiation of the turn and the helix. The hydrophobic collapse seems to be lagging behind or at most concurrent with the formation of the helix. The hairpin stability is weaker than the helix in our simulations. Its role in the early folding events seems to be less important than the more stable helix. In contrast, villin headpiece folding proceeds first by hydrophobic collapse. The formation of helices is later than the collapse phase, different from the betabetaalpha1 folding.
Collapse
Affiliation(s)
- Edward Z Wen
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697-3900, USA
| | | | | | | |
Collapse
|
125
|
Pei J, Grishin NV. Combining evolutionary and structural information for local protein structure prediction. Proteins 2004; 56:782-94. [PMID: 15281130 DOI: 10.1002/prot.20158] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.
Collapse
Affiliation(s)
- Jimin Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | |
Collapse
|
126
|
Lee J, Kim SY, Joo K, Kim I, Lee J. Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins 2004; 56:704-14. [PMID: 15281124 DOI: 10.1002/prot.20150] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.
Collapse
Affiliation(s)
- Julian Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | | | | | | | | |
Collapse
|
127
|
Skolnick J, Kihara D, Zhang Y. Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins 2004; 56:502-18. [PMID: 15229883 DOI: 10.1002/prot.20106] [Citation(s) in RCA: 118] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This article describes the PROSPECTOR_3 threading algorithm, which combines various scoring functions designed to match structurally related target/template pairs. Each variant described was found to have a Z-score above which most identified templates have good structural (threading) alignments, Z(struct) (Z(good)). 'Easy' targets with accurate threading alignments are identified as single templates with Z > Z(good) or two templates, each with Z > Z(struct), having a good consensus structure in mutually aligned regions. 'Medium' targets have a pair of templates lacking a consensus structure, or a single template for which Z(struct) < Z < Z(good). PROSPECTOR_3 was applied to a comprehensive Protein Data Bank (PDB) benchmark composed of 1491 single domain proteins, 41-200 residues long and no more than 30% identical to any threading template. Of the proteins, 878 were found to be easy targets, with 761 having a root mean square deviation (RMSD) from native of less than 6.5 A. The average contact prediction accuracy was 46%, and on average 17.6 residue continuous fragments were predicted with RMSD values of 2.0 A. There were 606 medium targets identified, 87% (31%) of which had good structural (threading) alignments. On average, 9.1 residue, continuous fragments with RMSD of 2.5 A were predicted. Combining easy and medium sets, 63% (91%) of the targets had good threading (structural) alignments compared to native; the average target/template sequence identity was 22%. Only nine targets lacked matched templates. Moreover, PROSPECTOR_3 consistently outperforms PSIBLAST. Similar results were predicted for open reading frames (ORFS) < or =200 residues in the M. genitalium, E. coli and S. cerevisiae genomes. Thus, progress has been made in identification of weakly homologous/analogous proteins, with very high alignment coverage, both in a comprehensive PDB benchmark as well as in genomes.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington St., Suite 300, Buffalo, NY 14203, USA.
| | | | | |
Collapse
|
128
|
Chikenji G, Fujitsuka Y, Takada S. Protein folding mechanisms and energy landscape of src SH3 domain studied by a structure prediction toolbox. Chem Phys 2004. [DOI: 10.1016/j.chemphys.2004.06.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
129
|
Herges T, Wenzel W. An all-atom force field for tertiary structure prediction of helical proteins. Biophys J 2004; 87:3100-9. [PMID: 15507688 PMCID: PMC1304781 DOI: 10.1529/biophysj.104.040071] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2004] [Accepted: 06/28/2004] [Indexed: 11/18/2022] Open
Abstract
We have developed an all-atom free-energy force field (PFF01) for protein tertiary structure prediction. PFF01 is based on physical interactions and was parameterized using experimental structures of a family of proteins believed to span a wide variety of possible folds. It contains empirical, although sequence-independent terms for hydrogen bonding. Its solvent-accessible surface area solvent model was first fit to transfer energies of small peptides. The parameters of the solvent model were then further optimized to stabilize the native structure of a single protein, the autonomously folding villin headpiece, against competing low-energy decoys. Here we validate the force field for five nonhomologous helical proteins with 20-60 amino acids. For each protein, decoys with 2-3 A backbone root mean-square deviation and correct experimental Cbeta-Cbeta distance constraints emerge as those with the lowest energy.
Collapse
Affiliation(s)
- T Herges
- Forschungszentrum Karlsruhe, Institut für Nanotechnologie, Karlsruhe, Germany
| | | |
Collapse
|
130
|
Weston AD, Baliga NS, Bonneau R, Hood L. Systems approaches applied to the study of Saccharomyces cerevisiae and Halobacterium sp. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2004; 68:345-57. [PMID: 15338636 DOI: 10.1101/sqb.2003.68.345] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- A D Weston
- Institute for Systems Biology, Seattle, Washington 98103-8904, USA
| | | | | | | |
Collapse
|
131
|
Colubri A. Prediction of protein structure by simulating coarse-grained folding pathways: a preliminary report. J Biomol Struct Dyn 2004; 21:625-38. [PMID: 14769055 DOI: 10.1080/07391102.2004.10506953] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local phi-psi energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 A backbone rmsd for fragments of about 60-70 residues of alpha-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of beta-sheet structures are briefly described.
Collapse
Affiliation(s)
- Andrés Colubri
- Searle Chemistry Lab, University of Chicago, 5735 South Ellis Ave #126, Chicago, Illinois 60637, USA.
| |
Collapse
|
132
|
Winther O, Krogh A. Teaching computers to fold proteins. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:030903. [PMID: 15524499 DOI: 10.1103/physreve.70.030903] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2003] [Revised: 04/26/2004] [Indexed: 05/24/2023]
Abstract
A new general algorithm for optimization of potential functions for protein folding is introduced. It is based upon gradient optimization of the thermodynamic stability of native folds of a training set of proteins with known structure. The iterative update rule contains two thermodynamic averages which are estimated by (generalized ensemble) Monte Carlo. We test the learning algorithm on a Lennard-Jones (LJ) force field with a torsional angle degrees-of-freedom and a single-atom side-chain. In a test with 24 peptides of known structure, none folded correctly with the initial potential functions, but two-thirds came within 3 A to their native fold after optimizing the potential functions.
Collapse
Affiliation(s)
- Ole Winther
- Center for Biological Sequence Analysis, The Technical University of Denmark, Building 208, DK-2800 Lyngby, Denmark.
| | | |
Collapse
|
133
|
Ginalski K, Kinch L, Rychlewski L, Grishin NV. BOF: a novel family of bacterial OB-fold proteins. FEBS Lett 2004; 567:297-301. [PMID: 15178340 DOI: 10.1016/j.febslet.2004.04.086] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2004] [Accepted: 04/19/2004] [Indexed: 11/22/2022]
Abstract
Using top-of-the-line fold recognition methods, we assigned an oligonucleotide/oligosaccharide-binding (OB)-fold structure to a family of previously uncharacterized hypothetical proteins from several bacterial genomes. This novel family of bacterial OB-fold (BOF) proteins present in a number of pathogenic strains encompasses sequences of unknown function from DUF388 (in Pfam database) and COG3111. The BOF proteins can be linked evolutionarily to other members of the OB-fold nucleic acid-binding superfamily (anticodon-binding and single strand DNA-binding domains), although they probably lack nucleic acid-binding properties as implied by the analysis of the potential binding site. The presence of conserved N-terminal predicted signal peptide indicates that BOF family members localize in the periplasm where they may function to bind proteins, small molecules, or other typical OB-fold ligands. As hypothesized for the distantly related OB-fold containing bacterial enterotoxins, the loss of nucleotide-binding function and the rapid evolution of the BOF ligand-binding site may be associated with the presence of BOF proteins in mobile genetic elements and their potential role in bacterial pathogenicity.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- Department of Biochemistry, University of Texas, Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9038, USA.
| | | | | | | |
Collapse
|
134
|
Bonneau R, Baliga NS, Deutsch EW, Shannon P, Hood L. Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1. Genome Biol 2004; 5:R52. [PMID: 15287974 PMCID: PMC507877 DOI: 10.1186/gb-2004-5-8-r52] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2004] [Revised: 03/07/2004] [Accepted: 06/01/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function. RESULTS We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators. CONCLUSIONS Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.
Collapse
Affiliation(s)
| | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| | - Paul Shannon
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| | - Leroy Hood
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| |
Collapse
|
135
|
Przytycka T. Significance of conformational biases in Monte Carlo simulations of protein folding: Lessons from Metropolis-Hastings approach. Proteins 2004; 57:338-44. [PMID: 15340921 DOI: 10.1002/prot.20210] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Despite significant effort, the problem of predicting a protein's three-dimensional fold from its amino-acid sequence remains unsolved. An important strategy involves treating folding as a statistical process, using the Markov chain formalism, implemented as a Metropolis Monte Carlo algorithm. A formal prerequisite of this approach is the condition of detailed balance, the plausible requirement that at equilibrium, the transition from state i to state j is traversed with the same probability as the reverse transition from state j to state i. Surprisingly, some relatively successful methods that use biased sampling fail to satisfy this requirement. Is this compromise merely a convenient heuristic that results in faster convergence? Or, is it instead a cryptic energy term that compensates for an incomplete potential function? I explore this question using Metropolis-Hasting Monte Carlo simulations. Results from these simulations suggest the latter answer is more likely.
Collapse
Affiliation(s)
- Teresa Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, Maryland 20894, USA.
| |
Collapse
|
136
|
|
137
|
Camproux AC, Gautier R, Tufféry P. A hidden markov model derived structural alphabet for proteins. J Mol Biol 2004; 339:591-605. [PMID: 15147844 DOI: 10.1016/j.jmb.2004.04.005] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2003] [Revised: 03/30/2004] [Accepted: 04/05/2004] [Indexed: 10/26/2022]
Abstract
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.
Collapse
Affiliation(s)
- A C Camproux
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E0436, Université Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France.
| | | | | |
Collapse
|
138
|
Rohl CA, Strauss CEM, Chivian D, Baker D. Modeling structurally variable regions in homologous proteins with rosetta. Proteins 2004; 55:656-77. [PMID: 15103629 DOI: 10.1002/prot.10629] [Citation(s) in RCA: 242] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three- and nine-residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side-chains to iteratively optimize backbone torsion angles and side-chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 A are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3A root-mean-square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39-residue insertion into a TIM barrel in CASP 5 target T0186.
Collapse
Affiliation(s)
- Carol A Rohl
- Department of Biomolecular Engineering, University of California, Santa Cruz 95064, USA.
| | | | | | | |
Collapse
|
139
|
Wang K, Fain B, Levitt M, Samudrala R. Improved protein structure selection using decoy-dependent discriminatory functions. BMC STRUCTURAL BIOLOGY 2004; 4:8. [PMID: 15207004 PMCID: PMC449718 DOI: 10.1186/1472-6807-4-8] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2004] [Accepted: 06/18/2004] [Indexed: 11/10/2022]
Abstract
BACKGROUND A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations. RESULTS We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement. CONCLUSIONS Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Boris Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ram Samudrala
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|
140
|
Klepeis JL, Floudas CA. ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. Biophys J 2004; 85:2119-46. [PMID: 14507680 PMCID: PMC1303441 DOI: 10.1016/s0006-3495(03)74640-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The field of computational biology has been revolutionized by recent advances in genomics. The completion of a number of genome projects, including that of the human genome, has paved the way toward a variety of challenges and opportunities in bioinformatics and biological systems engineering. One of the first challenges has been the determination of the structures of proteins encoded by the individual genes. This problem, which represents the progression from sequence to structure (genomics to structural genomics), has been widely known as the structure-prediction-in-protein-folding problem. We present the development and application of ASTRO-FOLD, a novel and complete approach for the ab initio prediction of protein structures given only the amino acid sequences of the proteins. The approach exhibits many novel components and the merits of its application are examined for a suite of protein systems, including a number of targets from several critical-assessment-of-structure-prediction experiments.
Collapse
Affiliation(s)
- J L Klepeis
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 10036, USA.
| | | |
Collapse
|
141
|
Randall AZ, Baldi P, Villarreal LP. Structural proteomics of the poxvirus family. Artif Intell Med 2004; 31:105-15. [PMID: 15219289 DOI: 10.1016/j.artmed.2004.01.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2003] [Revised: 07/22/2003] [Accepted: 01/16/2004] [Indexed: 11/20/2022]
Abstract
Recent concerns over the potential use of variola virus-commonly known as smallpox-and other orthopox viruses as weapons of bioterrorism have increased research efforts towards creating new antiviral drugs and safer more effective vaccines. Here we introduce a new resource for structural information of poxvirus proteins: the poxvirus proteomics database (PPDB). In the PPDB, we leverage recently developed bioinformatics structure prediction tools on a genomic scale and provide results in a publicly accessible format. The current version of the system contains both experimentally determined and predicted information about protein structural features, such as secondary structure and relative solvent accessibility, as well as tertiary structure and homology information. The system is automated to read the primary sequences from the database, produce the new information for each sequence, and update the database monthly and as new tools are incorporated. The PPDB contains detailed information on the open reading frames (ORFs) in the Copenhagen strain of the vaccinia virus genome. The contents of the PPDB can be accessed through a simple web interface. Inclusion of additional poxvirus genomes in the PPDB is in progress. The PPDB has an upward scalable informatics infrastructure that can readily be applied to viral, bacterial, as well as eukaryotic genomes.
Collapse
Affiliation(s)
- Arlo Z Randall
- Department of Information and Computer Science, University of California, Irvine, CA 92697-3425, USA.
| | | | | |
Collapse
|
142
|
Sadreyev RI, Baker D, Grishin NV. Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci 2004; 12:2262-72. [PMID: 14500884 PMCID: PMC2366929 DOI: 10.1110/ps.03197403] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recently we proposed a novel method of alignment–alignment comparison, COMPASS (the tool for COmparison of Multiple Protein Alignments with Assessment of Statistical Significance). Here we present several examples of the relations between PFAM protein families that were detected by COMPASS and that lead to the predictions of presently unresolved protein structures. We discuss relatively straightforward COMPASS predictions that are new and interesting to us, and that would require a substantial time and effort to justify even for a skilled PSI-BLAST user. All of the presented COMPASS hits are independently confirmed by other methods, including the ab initio structure-prediction method ROSETTA. The tertiary structure predictions made by ROSETTA proved to be useful for improving sequence-derived alignments, because they are based on a reasonable folding of the polypeptide chain rather than on the information from sequence databases. The ability of COMPASS to predict new relations within the PFAM database indicates the high sensitivity of COMPASS searches and substantiates its potential value for the discovery of previously unknown similarities between protein families.
Collapse
Affiliation(s)
- Ruslan I Sadreyev
- Howard Hughes Medical Institute and Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | | | |
Collapse
|
143
|
Baliga NS, Bjork SJ, Bonneau R, Pan M, Iloanusi C, Kottemann MCH, Hood L, DiRuggiero J. Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. Genome Res 2004; 14:1025-35. [PMID: 15140832 PMCID: PMC419780 DOI: 10.1101/gr.1993504] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We report a remarkably high UV-radiation resistance in the extremely halophilic archaeon Halobacterium NRC-1 withstanding up to 110 J/m2 with no loss of viability. Gene knockout analysis in two putative photolyase-like genes (phr1 and phr2) implicated only phr2 in photoreactivation. The UV-response was further characterized by analyzing simultaneously, along with gene function and protein interactions inferred through comparative genomics approaches, mRNA changes for all 2400 genes during light and dark repair. In addition to photoreactivation, three other putative repair mechanisms were identified including d(CTAG) methylation-directed mismatch repair, four oxidative damage repair enzymes, and two proteases for eliminating damaged proteins. Moreover, a UV-induced down-regulation of many important metabolic functions was observed during light repair and seems to be a phenomenon shared by all three domains of life. The systems analysis has facilitated the assignment of putative functions to 26 of 33 key proteins in the UV response through sequence-based methods and/or similarities of their predicted three-dimensional structures to known structures in the PDB. Finally, the systems analysis has raised, through the integration of experimentally determined and computationally inferred data, many experimentally testable hypotheses that describe the metabolic and regulatory networks of Halobacterium NRC-1.
Collapse
Affiliation(s)
- Nitin S Baliga
- Institute for Systems Biology, Seattle, Washington 98103, USA.
| | | | | | | | | | | | | | | |
Collapse
|
144
|
Ginalski K, Rychlewski L, Baker D, Grishin NV. Protein structure prediction for the male-specific region of the human Y chromosome. Proc Natl Acad Sci U S A 2004; 101:2305-10. [PMID: 14983005 PMCID: PMC356946 DOI: 10.1073/pnas.0306306101] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The complete sequence of the male-specific region of the human Y chromosome (MSY) has been determined recently; however, detailed characterization for many of its encoded proteins still remains to be done. We applied state-of-the-art protein structure prediction methods to all 27 distinct MSY-encoded proteins to provide better understanding of their biological functions and their mechanisms of action at the molecular level. The results of such large-scale structure-functional annotation provide a comprehensive view of the MSY proteome, shedding light on MSY-related processes. We found that, in total, at least 60 domains are encoded by 27 distinct MSY genes, of which 42 (70%) were reliably mapped to currently known structures. The most challenging predictions include the unexpected but confident 3D structure assignments for three domains identified here encoded by the USP9Y, UTY, and BPY2 genes. The domains with unknown 3D structures that are not predictable with currently available theoretical methods are established as primary targets for crystallographic or NMR studies. The data presented here set up the basis for additional scientific discoveries in human biology of the Y chromosome, which plays a fundamental role in sex determination.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9038, USA.
| | | | | | | |
Collapse
|
145
|
Hazbun TR, Malmström L, Anderson S, Graczyk BJ, Fox B, Riffle M, Sundin BA, Aranda JD, McDonald WH, Chiu CH, Snydsman BE, Bradley P, Muller EGD, Fields S, Baker D, Yates JR, Davis TN. Assigning function to yeast proteins by integration of technologies. Mol Cell 2004; 12:1353-65. [PMID: 14690591 DOI: 10.1016/s1097-2765(03)00476-3] [Citation(s) in RCA: 216] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Interpreting genome sequences requires the functional analysis of thousands of predicted proteins, many of which are uncharacterized and without obvious homologs. To assess whether the roles of large sets of uncharacterized genes can be assigned by targeted application of a suite of technologies, we used four complementary protein-based methods to analyze a set of 100 uncharacterized but essential open reading frames (ORFs) of the yeast Saccharomyces cerevisiae. These proteins were subjected to affinity purification and mass spectrometry analysis to identify copurifying proteins, two-hybrid analysis to identify interacting proteins, fluorescence microscopy to localize the proteins, and structure prediction methodology to predict structural domains or identify remote homologies. Integration of the data assigned function to 48 ORFs using at least two of the Gene Ontology (GO) categories of biological process, molecular function, and cellular component; 77 ORFs were annotated by at least one method. This combination of technologies, coupled with annotation using GO, is a powerful approach to classifying genes.
Collapse
Affiliation(s)
- Tony R Hazbun
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
146
|
Abstract
A protein model that is simple enough to be used in protein-folding simulations but accurate enough to identify a protein native fold is described. Its geometry consists of describing the residues by one, two, or three pseudoatoms, depending on the residue size. Its energy is given by a pairwise, knowledge-based potential obtained for all the pseudoatoms as a function of their relative distance. The pseudoatomic potential is also a function of the primary chain separation and residue order. The model is tested by gapless threading on a large, representative set of known protein and decoy structures obtained from the "Decoys 'R' Us" database. It is also tested by threading on gapped decoys generated for proteins with many homologs. The gapless threading tests show near 98% native-structure recognition as the lowest energy structure and almost 100% as one of the three lowest energy structures for over 2200 test proteins. In decoy threading tests, the model recognized the majority of the native structures. It is also able to recognize native structures among gapped decoys, in spite of close structural similarities. The results indicate that the pseudoatomic model has native recognition ability similar to comparable atomic-based models but much better than equivalent residue-based models.
Collapse
Affiliation(s)
- Marcos R Betancourt
- University at Buffalo Center of Excellence in Bioinformatics, Buffalo, New York 14203, USA
| |
Collapse
|
147
|
Fang Q, Shortle D. Prediction of protein structure by emphasizing local side-chain/backbone interactions in ensembles of turn fragments. Proteins 2004; 53 Suppl 6:486-90. [PMID: 14579337 DOI: 10.1002/prot.10541] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The prediction strategy used in the CASP5 experiment was premised on the assumption that local side-chain/backbone interactions are the principal determinants of protein structure at low resolution. Our implementation of this assumption made extensive use of a scoring function based on the propensities of the 20 amino acids for 137 different sub-regions of the Ramachandran plot, allowing estimation of the quality of fit between a sequence segment and a known conformation. New folds were predicted in three steps: prediction of secondary structure, threading to isolate fragments of protein structures corresponding to one turn plus flanking helices/strands, and recombination of overlapping fragments. The most important step in this fragment ensemble approach, the isolation of turn fragments, employed 2 to 6 sequence homologues when available, with clustering of the best scoring fragments to recover the most common turn arrangement. Recombinants formed between 3 to 8 turn fragments, with cross-overs confined to helix/strand segments, were selected for compactness plus low energy as estimated by empirical amino acid pair potentials, and the most common overall topology identified by visual inspection. Because significant amounts of steric overlap were permitted during the recombination step, the final model was manually adjusted to reduce overlap and to enhance protein-like structural features. Even though only one or two models were submitted per target, for several targets the correct chain topology was predicted for fragment lengths up to 100 amino acids.
Collapse
Affiliation(s)
- Qiaojun Fang
- Department of Biological Chemistry, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21205-2185, USA
| | | |
Collapse
|
148
|
Chivian D, Kim DE, Malmström L, Bradley P, Robertson T, Murphy P, Strauss CEM, Bonneau R, Rohl CA, Baker D. Automated prediction of CASP-5 structures using the Robetta server. Proteins 2004; 53 Suppl 6:524-33. [PMID: 14579342 DOI: 10.1002/prot.10529] [Citation(s) in RCA: 221] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Robetta is a fully automated protein structure prediction server that uses the Rosetta fragment-insertion method. It combines template-based and de novo structure prediction methods in an attempt to produce high quality models that cover every residue of a submitted sequence. The first step in the procedure is the automatic detection of the locations of domains and selection of the appropriate modeling protocol for each domain. For domains matched to a homolog with an experimentally characterized structure by PSI-BLAST or Pcons2, Robetta uses a new alignment method, called K*Sync, to align the query sequence onto the parent structure. It then models the variable regions by allowing them to explore conformational space with fragments in fashion similar to the de novo protocol, but in the context of the template. When no structural homolog is available, domains are modeled with the Rosetta de novo protocol, which allows the full length of the domain to explore conformational space via fragment-insertion, producing a large decoy ensemble from which the final models are selected. The Robetta server produced quite reasonable predictions for targets in the recent CASP-5 and CAFASP-3 experiments, some of which were at the level of the best human predictions.
Collapse
|
149
|
Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R. Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 2004; 53 Suppl 6:491-6. [PMID: 14579338 DOI: 10.1002/prot.10540] [Citation(s) in RCA: 192] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This article presents an overview of the SAM-T02 method for protein fold recognition and the UNDERTAKER program for ab initio predictions. The SAM-T02 server is an automatic method that uses two-track hidden Markov models (HMMS) to find and align template proteins from PDB to the target protein. The two-track HMMs use an amino acid alphabet and one of several different local structure alphabets. The UNDERTAKER program is a new fragment-packing program that can use short or long fragments and alignments to create protein conformations. The HMMs and fold-recognition alignments from the SAM-T02 method were used to generate the fragment and alignment libraries used by UNDERTAKER. We present results on a few selected targets for which this combined method worked particularly well: T0129, T0181, T0135, T0130, and T0139.
Collapse
Affiliation(s)
- Kevin Karplus
- Computer Engineering Department, University of California, Santa Cruz 95064, USA.
| | | | | | | | | | | | | |
Collapse
|
150
|
Bradley P, Chivian D, Meiler J, Misura KMS, Rohl CA, Schief WR, Wedemeyer WJ, Schueler-Furman O, Murphy P, Schonbrun J, Strauss CEM, Baker D. Rosetta predictions in CASP5: successes, failures, and prospects for complete automation. Proteins 2004; 53 Suppl 6:457-68. [PMID: 14579334 DOI: 10.1002/prot.10552] [Citation(s) in RCA: 140] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We describe predictions of the structures of CASP5 targets using Rosetta. The Rosetta fragment insertion protocol was used to generate models for entire target domains without detectable sequence similarity to a protein of known structure and to build long loop insertions (and N-and C-terminal extensions) in cases where a structural template was available. Encouraging results were obtained both for the de novo predictions and for the long loop insertions; we describe here the successes as well as the failures in the context of current efforts to improve the Rosetta method. In particular, de novo predictions failed for large proteins that were incorrectly parsed into domains and for topologically complex (high contact order) proteins with swapping of segments between domains. However, for the remaining targets, at least one of the five submitted models had a long fragment with significant similarity to the native structure. A fully automated version of the CASP5 protocol produced results that were comparable to the human-assisted predictions for most of the targets, suggesting that automated genomic-scale, de novo protein structure prediction may soon be worthwhile. For the three targets where the human-assisted predictions were significantly closer to the native structure, we identify the steps that remain to be automated.
Collapse
Affiliation(s)
- Philip Bradley
- Department of Biochemistry, University of Washington, Seattle 98195-7350, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|