101
|
Pitera JW. Expected distributions of root-mean-square positional deviations in proteins. J Phys Chem B 2014; 118:6526-30. [PMID: 24655018 DOI: 10.1021/jp412776d] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The atom positional root-mean-square deviation (RMSD) is a standard tool for comparing the similarity of two molecular structures. It is used to characterize the quality of biomolecular simulations, to cluster conformations, and as a reaction coordinate for conformational changes. This work presents an approximate analytic form for the expected distribution of RMSD values for a protein or polymer fluctuating about a stable native structure. The mean and maximum of the expected distribution are independent of chain length for long chains and linearly proportional to the average atom positional root-mean-square fluctuations (RMSF). To approximate the RMSD distribution for random-coil or unfolded ensembles, numerical distributions of RMSD were generated for ensembles of self-avoiding and non-self-avoiding random walks. In both cases, for all reference structures tested for chains more than three monomers long, the distributions have a maximum distant from the origin with a power-law dependence on chain length. The purely entropic nature of this result implies that care must be taken when interpreting stable high-RMSD regions of the free-energy landscape as "intermediates" or well-defined stable states.
Collapse
Affiliation(s)
- Jed W Pitera
- IBM Research - Almaden, 650 Harry Road, San Jose, California 95120, United States
| |
Collapse
|
102
|
Arabidopsis thaliana Tic110, involved in chloroplast protein translocation, contains at least fourteen highly divergent heat-like repeated motifs. Biologia (Bratisl) 2013. [DOI: 10.2478/s11756-013-0310-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
103
|
Consequences of domain insertion on sequence-structure divergence in a superfold. Proc Natl Acad Sci U S A 2013; 110:E3381-7. [PMID: 23959887 DOI: 10.1073/pnas.1305519110] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although the universe of protein structures is vast, these innumerable structures can be categorized into a finite number of folds. New functions commonly evolve by elaboration of existing scaffolds, for example, via domain insertions. Thus, understanding structural diversity of a protein fold evolving via domain insertions is a fundamental challenge. The haloalkanoic dehalogenase superfamily serves as an excellent model system wherein a variable cap domain accessorizes the ubiquitous Rossmann-fold core domain. Here, we determine the impact of the cap-domain insertion on the sequence and structure divergence of the core domain. Through quantitative analysis on a unique dataset of 154 core-domain-only and cap-domain-only structures, basic principles of their evolution have been uncovered. The relationship between sequence and structure divergence of the core domain is shown to be monotonic and independent of the corresponding type of domain insert, reflecting the robustness of the Rossmann fold to mutation. However, core domains with the same cap type share greater similarity at the sequence and structure levels, suggesting interplay between the cap and core domains. Notably, results reveal that the variance in structure maps to α-helices flanking the central β-sheet and not to the domain-domain interface. Collectively, these results hint at intramolecular coevolution where the fold diverges differentially in the context of an accessory domain, a feature that might also apply to other multidomain superfamilies.
Collapse
|
104
|
Sequence and structure space model of protein divergence driven by point mutations. J Theor Biol 2013; 330:1-8. [DOI: 10.1016/j.jtbi.2013.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 03/07/2013] [Accepted: 03/18/2013] [Indexed: 12/11/2022]
|
105
|
Kolodny R, Kosloff M. From Protein Structure to Function via Computational Tools and Approaches. Isr J Chem 2013. [DOI: 10.1002/ijch.201200078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
106
|
Li Y, Hu F, Wang X, Cao H, Liu D, Yao D. A rational design for trypsin-resistant improvement of Armillariella tabescens β-mannanase MAN47 based on molecular structure evaluation. J Biotechnol 2013; 163:401-7. [DOI: 10.1016/j.jbiotec.2012.12.018] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Revised: 12/20/2012] [Accepted: 12/21/2012] [Indexed: 11/27/2022]
|
107
|
Shirvanyants D, Ding F, Tsao D, Ramachandran S, Dokholyan NV. Discrete molecular dynamics: an efficient and versatile simulation method for fine protein characterization. J Phys Chem B 2012; 116:8375-82. [PMID: 22280505 PMCID: PMC3406226 DOI: 10.1021/jp2114576] [Citation(s) in RCA: 166] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Until now it has been impractical to observe protein folding in silico for proteins larger than 50 residues. Limitations of both force field accuracy and computational efficiency make the folding problem very challenging. Here we employ discrete molecular dynamics (DMD) simulations with an all-atom force field to fold fast-folding proteins. We extend the DMD force field by introducing long-range electrostatic interactions to model salt-bridges and a sequence-dependent semiempirical potential accounting for natural tendencies of certain amino acid sequences to form specific secondary structures. We enhance the computational performance by parallelizing the DMD algorithm. Using a small number of commodity computers, we achieve sampling quality and folding accuracy comparable to the explicit-solvent simulations performed on high-end hardware. We demonstrate that DMD can be used to observe equilibrium folding of villin headpiece and WW domain, study two-state folding kinetics, and sample near-native states in ab initio folding of proteins of ∼100 residues.
Collapse
Affiliation(s)
- David Shirvanyants
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Feng Ding
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Douglas Tsao
- Department of Chemistry, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Srinivas Ramachandran
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
108
|
Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis P, Karaca E, Melquiond ASJ, Bonvin AMJJ. Clustering biomolecular complexes by residue contacts similarity. Proteins 2012; 80:1810-7. [PMID: 22489062 DOI: 10.1002/prot.24078] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Revised: 03/14/2012] [Accepted: 03/30/2012] [Indexed: 01/01/2023]
Abstract
Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors.
Collapse
Affiliation(s)
- João P G L M Rodrigues
- Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, 3584 CH Utrecht, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
109
|
Gniewek P, Kolinski A, Jernigan RL, Kloczkowski A. How noise in force fields can affect the structural refinement of protein models? Proteins 2011; 80:335-41. [PMID: 22223184 DOI: 10.1002/prot.23240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Revised: 10/19/2011] [Accepted: 10/30/2011] [Indexed: 12/27/2022]
Abstract
Structural refinement of predicted models of biological macromolecules using atomistic or coarse-grained molecular force fields having various degree of error is investigated. The goal of this analysis is to estimate what is the probability for designing an effective structural refinement based on computations of conformational energies using force field, and starting from a structure predicted from the sequence (using template-based or template-free modeling), and refining it to bring the structure into closer proximity to the native state. It is widely believed that it should be possible to develop such a successful structure refinement algorithm by applying an iterative procedure with stochastic sampling and appropriate energy function, which assesses the quality (correctness) of protein decoys. Here, an analysis of noise in an artificially introduced scoring function is investigated for a model of an ideal sampling scheme, where the underlying distribution of RMSDs is assumed to be Gaussian. Sampling of the conformational space is performed by random generation of RMSD values. We demonstrate that whenever the random noise in a force field exceeds some level, it is impossible to obtain reliable structural refinement. The magnitude of the noise, above which a structural refinement, on average is impossible, depends strongly on the quality of sampling scheme and a size of the protein. Finally, possible strategies to overcome the intrinsic limitations in the force fields for impacting the development of successful refinement algorithms are discussed.
Collapse
Affiliation(s)
- Pawel Gniewek
- Faculty of Chemistry, Laboratory of Theory of Biopolymers, University of Warsaw, Warsaw, Poland; Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa
| | | | | | | |
Collapse
|
110
|
Zha X, Chen S, Yang L, Li B, Chen Y, Yan X, Li Y. Characterization of the CDR3 structure of the Vβ21 T cell clone in patients with P210BCR-ABL-positive chronic myeloid leukemia and B-cell acute lymphoblastic leukemia. Hum Immunol 2011; 72:798-804. [DOI: 10.1016/j.humimm.2011.06.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Revised: 06/21/2011] [Accepted: 06/27/2011] [Indexed: 12/23/2022]
|
111
|
Cheon S, Liang F. Folding small proteins via annealing stochastic approximation Monte Carlo. Biosystems 2011; 105:243-9. [DOI: 10.1016/j.biosystems.2011.05.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2010] [Revised: 05/22/2011] [Accepted: 05/26/2011] [Indexed: 11/26/2022]
|
112
|
Hollup SM, Sadowski MI, Jonassen I, Taylor WR. Exploring the limits of fold discrimination by structural alignment: a large scale benchmark using decoys of known fold. Comput Biol Chem 2011; 35:174-88. [PMID: 21704264 PMCID: PMC3145973 DOI: 10.1016/j.compbiolchem.2011.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 04/23/2011] [Indexed: 11/10/2022]
Abstract
Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale. Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.
Collapse
|
113
|
Gao M, Skolnick J. New benchmark metrics for protein-protein docking methods. Proteins 2011; 79:1623-34. [PMID: 21365685 DOI: 10.1002/prot.22987] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Revised: 12/22/2010] [Accepted: 12/30/2010] [Indexed: 11/10/2022]
Abstract
With the development of many computational methods that predict the structural models of protein-protein complexes, there is a pressing need to benchmark their performance. As was the case for protein monomers, assessing the quality of models of protein complexes is not straightforward. An effective scoring scheme should be able to detect substructure similarity and estimate its statistical significance. Here, we focus on characterizing the similarity of the interfaces of the complex and introduce two scoring functions. The first, the interfacial Template Modeling score (iTM-score), measures the geometric distance between the interfaces, while the second, the Interface Similarity score (IS-score), evaluates their residue-residue contact similarity in addition to their geometric similarity. We first demonstrate that the IS-score is more suitable for assessing docking models than the iTM-score. The IS-score is then validated in a large-scale benchmark test on 1562 dimeric complexes. Finally, the scoring function is applied to evaluate docking models submitted to the Critical Assessment of Prediction of Interactions (CAPRI) experiments. While the results according to the new scoring scheme are generally consistent with the original CAPRI assessment, the IS-score identifies models whose significance was previously underestimated.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
114
|
Habeck M. Statistical mechanics analysis of sparse data. J Struct Biol 2010; 173:541-8. [PMID: 20869444 DOI: 10.1016/j.jsb.2010.09.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Revised: 09/10/2010] [Accepted: 09/16/2010] [Indexed: 10/19/2022]
Abstract
Inferential structure determination uses Bayesian theory to combine experimental data with prior structural knowledge into a posterior probability distribution over protein conformational space. The posterior distribution encodes everything one can say objectively about the native structure in the light of the available data and additional prior assumptions and can be searched for structural representatives. Here an analogy is drawn between the posterior distribution and the canonical ensemble of statistical physics. A statistical mechanics analysis assesses the complexity of a structure calculation globally in terms of ensemble properties. Analogs of the free energy and density of states are introduced; partition functions evaluate the consistency of prior assumptions with data. Critical behavior is observed with dwindling restraint density, which impairs structure determination with too sparse data. However, prior distributions with improved realism ameliorate the situation by lowering the critical number of observations. An in-depth analysis of various experimentally accessible structural parameters and force field terms will facilitate a statistical approach to protein structure determination with sparse data that avoids bias as much as possible.
Collapse
Affiliation(s)
- Michael Habeck
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Spemannstrasse 35, 72076 Tübingen, Germany.
| |
Collapse
|
115
|
Hajdin CE, Ding F, Dokholyan NV, Weeks KM. On the significance of an RNA tertiary structure prediction. RNA (NEW YORK, N.Y.) 2010; 16:1340-9. [PMID: 20498460 PMCID: PMC2885683 DOI: 10.1261/rna.1837410] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 03/21/2010] [Indexed: 05/20/2023]
Abstract
Tertiary structure prediction is important for understanding structure-function relationships for RNAs whose structures are unknown and for characterizing RNA states recalcitrant to direct analysis. However, it is unknown what root-mean-square deviation (RMSD) corresponds to a statistically significant RNA tertiary structure prediction. We use discrete molecular dynamics to generate RNA-like folds for structures up to 161 nucleotides (nt) that have complex tertiary interactions and then determine the RMSD distribution between these decoys. These distributions are Gaussian-like. The mean RMSD increases with RNA length and is smaller if secondary structure constraints are imposed while generating decoys. The compactness of RNA molecules with true tertiary folds is intermediate between closely packed spheres and a freely jointed chain. We use this scaling relationship to define an expression relating RMSD with the confidence that a structure prediction is better than that expected by chance. This is the prediction significance, and corresponds to a P-value. For a 100-nt RNA, the RMSD of predicted structures should be within 25 A of the accepted structure to reach the P <or= 0.01 level if the secondary structure is predicted de novo and within 14 A if secondary structure information is used as a constraint. This significance approach should be useful for evaluating diverse RNA structure prediction and molecular modeling algorithms.
Collapse
Affiliation(s)
- Christine E Hajdin
- Department of Chemistry, University of North Carolina, Chapel Hill, North Carolina 27599-3290, USA
| | | | | | | |
Collapse
|
116
|
Faraggi E, Yang Y, Zhang S, Zhou Y. Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 2010; 17:1515-27. [PMID: 19913486 DOI: 10.1016/j.str.2009.09.006] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 09/01/2009] [Accepted: 09/03/2009] [Indexed: 11/30/2022]
Abstract
Local structures predicted from protein sequences are used extensively in every aspect of modeling and prediction of protein structure and function. For more than 50 years, they have been predicted at a low-resolution coarse-grained level (e.g., three-state secondary structure). Here, we combine a two-state classifier with real-value predictor to predict local structure in continuous representation by backbone torsion angles. The accuracy of the angles predicted by this approach is close to that derived from NMR chemical shifts. Their substitution for predicted secondary structure as restraints for ab initio structure prediction doubles the success rate. This result demonstrates the potential of predicted local structure for fragment-free tertiary-structure prediction. It further implies potentially significant benefits from using predicted real-valued torsion angles as a replacement for or supplement to the secondary-structure prediction tools used almost exclusively in many computational methods ranging from sequence alignment to function prediction.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
117
|
Lobanov MY, Bogatyreva NS, Ivankov DN, Finkel’shtein AV. Analogy-based protein structure prediction: I. A new database of spatially similar and dissimilar structures of protein domains for testing and optimizing prediction methods. Mol Biol 2009. [DOI: 10.1134/s0026893309040190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
118
|
Dukka BKC. Improving consensus structure by eliminating averaging artifacts. BMC STRUCTURAL BIOLOGY 2009; 9:12. [PMID: 19267905 PMCID: PMC2662860 DOI: 10.1186/1472-6807-9-12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2008] [Accepted: 03/06/2009] [Indexed: 11/29/2022]
Abstract
Background Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles. Results Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to Cα models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 Å compared to the average RMSD of the averaged structures from the native structure (3.28 Å for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA [1], our approach produces representative structure of similar RMSD quality, but with much fewer clashes. Conclusion The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction [2], which could also benefit from our approach.
Collapse
Affiliation(s)
- B K C Dukka
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA.
| |
Collapse
|
119
|
Yang YD, Park C, Kihara D. Threading without optimizing weighting factors for scoring function. Proteins 2008; 73:581-96. [DOI: 10.1002/prot.22082] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
120
|
Helles G. A comparative study of the reported performance of ab initio protein structure prediction algorithms. J R Soc Interface 2008; 5:387-96. [PMID: 18077243 DOI: 10.1098/rsif.2007.1278] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure prediction is one of the major challenges in bioinformatics today. Throughout the past five decades, many different algorithmic approaches have been attempted, and although progress has been made the problem remains unsolvable even for many small proteins. While the general objective is to predict the three-dimensional structure from primary sequence, our current knowledge and computational power are simply insufficient to solve a problem of such high complexity. Some prediction algorithms do, however, appear to perform better than others, although it is not always obvious which ones they are and it is perhaps even less obvious why that is. In this review, the reported performance results from 18 different recently published prediction algorithms are compared. Furthermore, the general algorithmic settings most likely responsible for the difference in the reported performance are identified, and the specific settings of each of the 18 prediction algorithms are also compared. The average normalized r.m.s.d. scores reported range from 11.17 to 3.48. With a performance measure including both r.m.s.d. scores and CPU time, the currently best-performing prediction algorithm is identified to be the I-TASSER algorithm. Two of the algorithmic settings--protein representation and fragment assembly--were found to have definite positive influence on the running time and the predicted structures, respectively. There thus appears to be a clear benefit from incorporating this knowledge in the design of new prediction algorithms.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark.
| |
Collapse
|
121
|
Wrabl JO, Grishin NV. Statistics of Random Protein Superpositions: p-Values for Pairwise Structure Alignment. J Comput Biol 2008; 15:317-55. [DOI: 10.1089/cmb.2007.0161] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- James O. Wrabl
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas
| |
Collapse
|
122
|
Abstract
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 A for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 A over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.
Collapse
Affiliation(s)
| | - Jeffrey Skolnick
- Address reprint requests to Jeffrey Skolnick, Tel.: 404-407-8975; Fax: 404-385-7478.
| |
Collapse
|
123
|
Stumpff-Kane AW, Maksimiak K, Lee MS, Feig M. Sampling of near-native protein conformations during protein structure refinement using a coarse-grained model, normal modes, and molecular dynamics simulations. Proteins 2007; 70:1345-56. [PMID: 17876825 DOI: 10.1002/prot.21674] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Protein structure refinement from comparative models with the goal of predicting structures at near-experimental accuracy remains an unsolved problem. Structure refinement might be achieved with an iterative protocol where the most native-like structure from a set of decoys generated from an initial model in one cycle is used as the starting structure for the next cycle. Conformational sampling based on the coarse-grained SICHO model, atomic level of detail molecular dynamics simulations, and normal-mode analysis is compared in the context of such a protocol. All of the sampling methods can achieve significant refinement close to experimental structures, although the distribution of structures and the ability to reach native-like structures differs greatly. Implications for the practical application of such sampling methods and the requirements for scoring functions in an iterative refinement protocol are analyzed in the context of theoretical predictions for the distribution of protein-like conformations with a random sampling protocol.
Collapse
Affiliation(s)
- Andrew W Stumpff-Kane
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1319, USA
| | | | | | | |
Collapse
|
124
|
Carr JM, Wales DJ. Global optimization and folding pathways of selected alpha-helical proteins. J Chem Phys 2007; 123:234901. [PMID: 16392943 DOI: 10.1063/1.2135783] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The results of basin-hopping global optimization simulations are presented for four small, alpha-helical proteins described by a coarse-grained potential. A step-taking scheme that incorporates the local conformational preferences extracted from a large number of high-resolution protein structures is compared with an unbiased scheme. In addition, the discrete path sampling method is used to investigate the folding of one of the proteins, namely, the villin headpiece subdomain. Folding times from kinetic Monte Carlo simulations and iterative calculations based on a Markovian first-step analysis for the resulting stationary-point database are in good mutual agreement, but differ significantly from the experimental values, probably because the native state is not the global free energy minimum for the potential employed.
Collapse
Affiliation(s)
- Joanne M Carr
- University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | | |
Collapse
|
125
|
Ruan J, Chen K, Tuszynski JA, Kurgan LA. Quantitative analysis of the conservation of the tertiary structure of protein segments. Protein J 2007; 25:301-15. [PMID: 16957991 DOI: 10.1007/s10930-006-9016-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The publication of the crystallographic structure of calmodulin protein has offered an example leading us to believe that it is possible for many protein sequence segments to exhibit multiple 3D structures referred to as multi-structural segments. To this end, this paper presents statistical analysis of uniqueness of the 3D-structure of all possible protein sequence segments stored in the Protein Data Bank (PDB, Jan. of 2003, release 103) that occur at least twice and whose lengths are greater than 10 amino acids (AAs). We refined the set of segments by choosing only those that are not parts of longer segments, which resulted in 9297 segments called a sponge set. By adding 8197 signature segments, which occur uniquely in the PDB, into the sponge set we have generated a benchmark set. Statistical analysis of the sponge set demonstrates that rotating, missing and disarranging operations described in the text, result in the segments becoming multi-structural. It turns out that missing segments do not exhibit a change of shape in the 3D-structure of a multi-structural segment. We use the root mean square distance for unit vector sequence (URMSD) as an improved measure to describe the characteristics of hinge rotations, missing, and disarranging segments. We estimated the rate of occurrence for rotating and disarranging segments in the sponge set and divided it by the number of sequences in the benchmark set which is found to be less than 0.85%. Since two of the structure changing operations concern negligible number of segment and the third one is found not to have impact on the structure, we conclude that the 3D-structure of proteins is conserved statistically for more than 98% of the segments. At the same time, the remaining 2% of the sequences may pose problems for the sequence alignment based structure prediction methods.
Collapse
Affiliation(s)
- Jishou Ruan
- Chern Institute of Mathematics, College of Mathematical Science & LPMC, Nankai University, Tianjin 300071, P. R. China
| | | | | | | |
Collapse
|
126
|
McAllister SR, Mickus BE, Klepeis JL, Floudas CA. Novel approach for alpha-helical topology prediction in globular proteins: generation of interhelical restraints. Proteins 2007; 65:930-52. [PMID: 17029234 DOI: 10.1002/prot.21095] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The protein folding problem represents one of the most challenging problems in computational biology. Distance constraints and topology predictions can be highly useful for the folding problem in reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy. We present a novel optimization framework for predicting topological contacts and generating interhelical distance restraints between hydrophobic residues in alpha-helical globular proteins. It should be emphasized that since the model does not make assumptions about the form of the helices, it is applicable to all alpha-helical proteins, including helices with kinks and irregular helices. This model aims at enhancing the ASTRO-FOLD protein folding approach of Klepeis and Floudas (Journal of Computational Chemistry 2003;24:191-208), which finds the structure of global minimum conformational energy via a constrained nonlinear optimization problem. The proposed topology prediction model was evaluated on 26 alpha-helical proteins ranging from 2 to 8 helices and 35 to 159 residues, and the best identified average interhelical distances corresponding to the predicted contacts fell below 11 A in all 26 of these systems. Given the positive results of applying the model to several protein systems, the importance of interhelical hydrophobic-to-hydrophobic contacts in determining the folding of alpha-helical globular proteins is highlighted.
Collapse
Affiliation(s)
- S R McAllister
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
127
|
Zhang J, Lin M, Chen R, Liang J, Liu JS. Monte Carlo sampling of near-native structures of proteins with applications. Proteins 2006; 66:61-8. [PMID: 17039507 DOI: 10.1002/prot.21203] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Since a protein's dynamic fluctuation inside cells affects the protein's biological properties, we present a novel method to study the ensemble of near-native structures (NNS) of proteins, namely, the conformations that are very similar to the experimentally determined native structure. We show that this method enables us to (i) quantify the difficulty of predicting a protein's structure, (ii) choose appropriate simplified representations of protein structures, and (iii) assess the effectiveness of knowledge-based potential functions. We found that well-designed simple representations of protein structures are likely as accurate as those more complex ones for certain potential functions. We also found that the widely used contact potential functions stabilize NNS poorly, whereas potential functions incorporating local structure information significantly increase the stability of NNS.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Statistics, Harvard University, Cambridge, Massachusetts, USA
| | | | | | | | | |
Collapse
|
128
|
Hamelryck T, Kent JT, Krogh A. Sampling realistic protein conformations using local structural bias. PLoS Comput Biol 2006; 2:e131. [PMID: 17002495 PMCID: PMC1570370 DOI: 10.1371/journal.pcbi.0020131] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 08/21/2006] [Indexed: 11/19/2022] Open
Abstract
The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design. Protein structure prediction is one of the main unsolved problems in computational biology today. A common way to tackle the problem is to generate plausible protein conformations using a fairly inaccurate but fast method, and to evaluate the conformations using an accurate but slow method. The main bottleneck lies in the first step, that is, efficiently exploring protein conformational space. Currently, the best way to do this is to construct plausible structures by stringing together fragments from experimentally determined protein structures, a method called fragment assembly. Hamelryck, Kent, and Krogh present a new method that can efficiently generate protein conformations that are compatible with a given protein sequence. Unlike for existing methods, the generated conformations cover a continuous range and come with an associated probability. The method shows great promise for use in protein structure prediction, determination, simulation, and design.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Institute of Molecular Biology and Physiology, University of Copenhagen, Copenhagen, Denmark.
| | | | | |
Collapse
|
129
|
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2006; 57:702-10. [PMID: 15476259 DOI: 10.1002/prot.20264] [Citation(s) in RCA: 1332] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have developed a new scoring function, the template modeling score (TM-score), to assess the quality of protein structure templates and predicted full-length models by extending the approaches used in Global Distance Test (GDT)1 and MaxSub.2 First, a protein size-dependent scale is exploited to eliminate the inherent protein size dependence of the previous scores and appropriately account for random protein structure pairs. Second, rather than setting specific distance cutoffs and calculating only the fractions with errors below the cutoff, all residue pairs in alignment/modeling are evaluated in the proposed score. For comparison of various scoring functions, we have constructed a large-scale benchmark set of structure templates for 1489 small to medium size proteins using the threading program PROSPECTOR_3 and built the full-length models using MODELLER and TASSER. The TM-score of the initial threading alignments, compared to the GDT and MaxSub scoring functions, shows a much stronger correlation to the quality of the final full-length models. The TM-score is further exploited as an assessment of all 'new fold' targets in the recent CASP5 experiment and shows a close coincidence with the results of human-expert visual assessment. These data suggest that the TM-score is a useful complement to the fully automated assessment of protein structure predictions. The executable program of TM-score is freely downloadable at http://bioinformatics.buffalo.edu/TM-score.
Collapse
Affiliation(s)
- Yang Zhang
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203, USA
| | | |
Collapse
|
130
|
Zhang Y, DeVries ME, Skolnick J. Structure modeling of all identified G protein-coupled receptors in the human genome. PLoS Comput Biol 2006; 2:e13. [PMID: 16485037 PMCID: PMC1364505 DOI: 10.1371/journal.pcbi.0020013] [Citation(s) in RCA: 151] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2005] [Accepted: 01/11/2005] [Indexed: 12/22/2022] Open
Abstract
G protein–coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global Cα root-mean-squared deviation from native of 4.6 Å, with a root-mean-squared deviation in the transmembrane helix region of 2.1 Å. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis. All predicted GPCR models are freely available for noncommercial users on our Web site (http://www.bioinformatics.buffalo.edu/GPCR). G protein–coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of the breadth and importance of the physiological roles undertaken by the GPCR family, many of its members are important pharmacological targets. Although the knowledge of a protein's native structure can provide important insight into understanding its function and for the design of new drugs, the experimental determination of the three-dimensional structure of GPCR membrane proteins has proved to be very difficult. This is demonstrated by the fact that there is only one solved GPCR structure (from bovine rhodopsin) deposited in the Protein Data Bank library. In contrast, there are no human GPCR structures in the Protein Data Bank. To address the need for the tertiary structures of human GPCRs, using just sequence information, the authors use a newly developed threading-assembly-refinement method to generate models for all 907 registered GPCRs in the human genome. About 820 GPCRs are anticipated to have correct topology and transmembrane helix arrangement. A subset of the resulting models is validated by comparison with mutagenesis experimental data, and consistent agreement is demonstrated.
Collapse
Affiliation(s)
- Yang Zhang
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York, United States of America
| | - Mark E DeVries
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York, United States of America
| | - Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
131
|
Hubner IA, Deeds EJ, Shakhnovich EI. High-resolution protein folding with a transferable potential. Proc Natl Acad Sci U S A 2005; 102:18914-9. [PMID: 16365306 PMCID: PMC1323145 DOI: 10.1073/pnas.0502181102] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A generalized computational method for folding proteins with a fully transferable potential and geometrically realistic all-atom model is presented and tested on seven helix bundle proteins. The protocol, which includes graph-theoretical analysis of the ensemble of resulting folded conformations, was systematically applied and consistently produced structure predictions of approximately 3 A without any knowledge of the native state. To measure and understand the significance of the results, extensive control simulations were conducted. Graph theoretic analysis provides a means for systematically identifying the native fold and provides physical insight, conceptually linking the results to modern theoretical views of protein folding. In addition to presenting a method for prediction of structure and folding mechanism, our model suggests that an accurate all-atom amino acid representation coupled with a physically reasonable atomic interaction potential and hydrogen bonding are essential features for a realistic protein model.
Collapse
Affiliation(s)
- Isaac A Hubner
- Departments of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
132
|
Zhang Y, Skolnick J. Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. Biophys J 2005; 87:2647-55. [PMID: 15454459 PMCID: PMC1304683 DOI: 10.1529/biophysj.104.045385] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Calpha and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201-300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native < 6.5 angstroms, with >70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD < 6.5 angstroms. Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD < 5.5 A. For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, approximately 20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER.
Collapse
Affiliation(s)
- Yang Zhang
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203, USA
| | | |
Collapse
|
133
|
Ding F, Buldyrev SV, Dokholyan NV. Folding Trp-cage to NMR resolution native structure using a coarse-grained protein model. Biophys J 2005; 88:147-55. [PMID: 15533926 PMCID: PMC1304993 DOI: 10.1529/biophysj.104.046375] [Citation(s) in RCA: 118] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2004] [Accepted: 10/20/2004] [Indexed: 11/18/2022] Open
Abstract
We develop a coarse-grained protein model with a simplified amino acid interaction potential. Using this model, we perform discrete molecular dynamics folding simulations of a small 20-residue protein--Trp-cage--from a fully extended conformation. We demonstrate the ability of the Trp-cage model to consistently reach conformations within 2-angstroms backbone root-mean-square distance from the corresponding NMR structures. The minimum root-mean-square distance of Trp-cage conformations in simulations can be <1 angstroms. Our findings suggest that, at least in the case of Trp-cage, a detailed all-atom protein model with a molecular mechanics force field is not necessary to reach the native state of a protein. Our results also suggest that the success of folding Trp-cage in our simulations and in the reported all-atom molecular mechanics simulation studies may be mainly due to the special stabilizing features specific to this miniprotein.
Collapse
Affiliation(s)
- Feng Ding
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, North Carolina 27599; and Center for Polymer Studies, Boston University, Boston, Massachusetts 02215
| | - Sergey V. Buldyrev
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, North Carolina 27599; and Center for Polymer Studies, Boston University, Boston, Massachusetts 02215
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, The University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, North Carolina 27599; and Center for Polymer Studies, Boston University, Boston, Massachusetts 02215
| |
Collapse
|
134
|
Colubri A. Prediction of protein structure by simulating coarse-grained folding pathways: a preliminary report. J Biomol Struct Dyn 2004; 21:625-38. [PMID: 14769055 DOI: 10.1080/07391102.2004.10506953] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local phi-psi energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 A backbone rmsd for fragments of about 60-70 residues of alpha-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of beta-sheet structures are briefly described.
Collapse
Affiliation(s)
- Andrés Colubri
- Searle Chemistry Lab, University of Chicago, 5735 South Ellis Ave #126, Chicago, Illinois 60637, USA.
| |
Collapse
|
135
|
Sullivan DC, Kuntz ID. Distributions in protein conformation space: implications for structure prediction and entropy. Biophys J 2004; 87:113-20. [PMID: 15240450 PMCID: PMC1304334 DOI: 10.1529/biophysj.104.041723] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2004] [Accepted: 03/23/2004] [Indexed: 11/18/2022] Open
Abstract
By considering how polymer structures are distributed in conformation space, we show that it is possible to quantify the difficulty of structural prediction and to provide a measure of progress for prediction calculations. The critical issue is the probability that a conformation is found within a specified distance of another conformer. We address this question by constructing a cumulative distribution function (CDF) for the average probability from observations about its limiting behavior at small displacements and numerical simulations of polyalanine chains. We can use the CDF to estimate the likelihood that a structure prediction is better than random chance. For example, the chance of randomly predicting the native backbone structure of a 150-amino-acid protein to low resolution, say within 6 A, is 10(-14). A high-resolution structural prediction, say to 2 A, is immensely more difficult (10(-57)). With additional assumptions, the CDF yields the conformational entropy of protein folding from native-state coordinate variance. Or, using values of the conformational entropy change on folding, we can estimate the native state's conformational span. For example, for a 150-mer protein, equilibrium alpha-carbon displacements in the native ensemble would be 0.3-0.5 A based on T Delta S of 1.42 kcal/(mol residue).
Collapse
Affiliation(s)
- David C Sullivan
- Department of Pharmaceutical Chemistry, University of California, San Francisco, 94143-2240, USA
| | | |
Collapse
|
136
|
Zhang Y, Skolnick J. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci U S A 2004; 101:7594-9. [PMID: 15126668 PMCID: PMC419651 DOI: 10.1073/pnas.0305695101] [Citation(s) in RCA: 246] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have developed TASSER, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized C(alpha) and side-chain-based potential driven by threading-based, predicted tertiary restraints. TASSER was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm PROSPECTOR_3 have a rms deviation from native <6.5 A with approximately 80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply TASSER to the 1,360 medium-sized ORFs in the Escherichia coli genome; approximately 920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of TASSER to structural genomics, especially to proteins of low sequence identity to solved protein structures.
Collapse
Affiliation(s)
- Yang Zhang
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street, Buffalo, NY 14203, USA
| | | |
Collapse
|
137
|
Abstract
Structure comparisons of all representative proteins have been done. Employing the relative root mean square deviation (RMSD) from native enables the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge: first, proteins with their native fold can be distinguished by their Z-score. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different secondary structure and fold class; i.e. 24.0% of them have 60% coverage by a template protein with a RMSD below 3.5A and 6.0% have 70% coverage. If the restriction that we align proteins only having different secondary structure types is removed, then in a representative benchmark set of proteins of 200 residues or smaller, 93% can be aligned to a single template structure (with average sequence identity of 9.8%), with a RMSD less than 4A, and 79% average coverage. In this sense, the current Protein Data Bank (PDB) is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the target protein, so that almost the entire molecule can be covered when combined. The number of proteins required to cover a target protein is very small, e.g. the top ten hit proteins can give 90% coverage below a RMSD of 3.5A for proteins up to 320 residues long. These results give a new view of the nature of protein structure space, and its implications for protein structure prediction are discussed.
Collapse
Affiliation(s)
- Daisuke Kihara
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington St, Suite 300, Buffalo, NY 14203, USA
| | | |
Collapse
|
138
|
Binkowski TA, Adamian L, Liang J. Inferring functional relationships of proteins from local sequence and spatial surface patterns. J Mol Biol 2003; 332:505-26. [PMID: 12948498 DOI: 10.1016/s0022-2836(03)00882-9] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607-7052, USA
| | | | | |
Collapse
|
139
|
Stark A, Sunyaev S, Russell RB. A model for statistical significance of local similarities in structure. J Mol Biol 2003; 326:1307-16. [PMID: 12595245 DOI: 10.1016/s0022-2836(03)00045-7] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Structural biology can provide three-dimensional structures for proteins of unknown function. When sequence or structure comparisons fail to suggest a function, insights can come from discovery of functionally important local structural patterns. Existing methods to detect such patterns lack rigorous statistics needed for widespread application. Here, we derive a formula to calculate statistical significance of the root-mean-square deviation between atoms in such patterns. When combined with a database search method, our statistics permit true functional or structural patterns in different folds to be discerned from noise. The approach is highly complementary to fold comparison for providing functional clues for new structures, and is key for the detection of recurrences of any new pattern.
Collapse
Affiliation(s)
- Alexander Stark
- EMBL, Structural & Computational Biology Programme, Meyerhofstrasse 1, 69117, Heidelberg, Germany
| | | | | |
Collapse
|
140
|
Saunders JA, Scheraga HA. Ab initio structure prediction of two alpha-helical oligomers with a multiple-chain united-residue force field and global search. Biopolymers 2003; 68:300-17. [PMID: 12601791 DOI: 10.1002/bip.10226] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A hierarchical methodology for ab initio structure prediction is extended to treat oligomeric proteins. Modifications are made to a united-residue (UNRES) force field and a Conformational Space Annealing (CSA) global search method. The computational cost of including additional chains and the increase in speed from symmetry optimizations are evaluated. The native structures of two oligomeric proteins from the CASP3 exercise, the retro-GCN4 leucine zipper and the synthetic domain-swapped dimer, were identified as the lowest-energy families resulting from the search of the proteins when rotational symmetry was imposed. Additional searches in different symmetries and oligomerization states were carried out, and the results indicate some problems in the thoroughness of the search and in the search of packing arrangements if symmetry constraints are not imposed.
Collapse
Affiliation(s)
- Jeffrey A Saunders
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | | |
Collapse
|
141
|
Kolodny R, Levitt M. Protein decoy assembly using short fragments under geometric constraints. Biopolymers 2003; 68:278-85. [PMID: 12601789 DOI: 10.1002/bip.10262] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A small set of protein fragments can represent adequately all known local protein structure. This set of fragments, along with a construction scheme that assembles these fragments into structures, defines a discrete (relatively small) conformation space, which approximates protein structures accurately. We generate protein decoys by sampling geometrically valid structures from this conformation space, biased by the secondary structure prediction for the protein. Unlike other methods, secondary structure prediction is the only protein-specific information used for generating the decoys. Nevertheless, these decoys are qualitatively similar to those found by others. The method works well for all-alpha proteins, and shows promising results for alpha and beta proteins.
Collapse
Affiliation(s)
- R Kolodny
- Department of Computer Science, Stanford University, Stanford, CA 94305-5126, USA
| | | |
Collapse
|
142
|
Saunders JA, Scheraga HA. Challenges in structure prediction of oligomeric proteins at the united-residue level: searching the multiple-chain energy landscape with CSA and CFMC. Biopolymers 2003; 68:318-32. [PMID: 12601792 DOI: 10.1002/bip.10227] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A revised version of the Conformational Space Annealing (CSA) global optimization method is developed, with three separate measures of structural similarity, in order to overcome the inability of a single distance measure to evaluate multiple-chain protein structures adequately. A second search method, Conformational Family Monte Carlo (CFMC), involving genetic-type moves, Monte Carlo-with-minimization perturbations, and explicit clustering of the population into conformational families, is adapted to treat multiple-chain proteins. These two methods are applied to two oligomeric proteins, the retro-GCN4 leucine zipper and the synthetic domain-swapped dimer. CFMC proves superior to CSA in its search for low-energy representatives of its conformational families, but both methods encounter difficulty in finding the native packing arrangements in the absence of native-like symmetry constraints, even when native monomers are present in the population.
Collapse
Affiliation(s)
- Jeffrey A Saunders
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301
| | | |
Collapse
|
143
|
Fain B, Xia Y, Levitt M. Design of an optimal Chebyshev-expanded discrimination function for globular proteins. Protein Sci 2002; 11:2010-21. [PMID: 12142455 PMCID: PMC2373672 DOI: 10.1110/ps.0200702] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We describe the construction of a scoring function designed to model the free energy of protein folding. An optimization technique is used to determine the best functional forms of the hydrophobic, residue-residue and hydrogen-bonding components of the potential. The scoring function is expanded by use of Chebyshev polynomials, the coefficients of which are determined by minimizing the score, in units of standard deviation, of native structures in the ensembles of alternate decoy conformations. The derived effective potential is then tested on decoy sets used conventionally in such studies. Using our scoring function, we achieve a high level of discrimination between correct and incorrect folds. In addition, our method is able to represent functions of arbitrary shape with fewer parameters than the usual histogram potentials of similar resolution. Finally, our representation can be combined easily with many optimization methods, because the total energy is a linear function of the parameters. Our results show that the techniques of Z-score optimization and Chebyshev expansion work well.
Collapse
Affiliation(s)
- Boris Fain
- Department of Structural Biology, Stanford University, Stanford University School of Medicine, California 94305, USA.
| | | | | |
Collapse
|
144
|
Zhang C, Hou J, Kim SH. Fold prediction of helical proteins using torsion angle dynamics and predicted restraints. Proc Natl Acad Sci U S A 2002; 99:3581-5. [PMID: 11904420 PMCID: PMC122566 DOI: 10.1073/pnas.052003799] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe a procedure for predicting the tertiary folds of alpha-helical proteins from their primary sequences. The central component of the procedure is a method for predicting interhelical contacts that is based on a helix-packing model. Instead of predicting the individual contacts, our method attempts to identify the entire patch of contacts that involve residues regularly spaced in the sequences. We use this component to glue together two powerful existing methods: a secondary structure prediction program, whose output serves as the input to the contact prediction algorithm, and the tortion angle dynamics program, which uses the predicted tertiary contacts and secondary structural states to assemble three-dimensional structures. In the final step, the procedure uses the initial set of simulated structures to refine the predicted contacts for a new round of structure calculation. When tested against 24 small to medium-sized proteins representing a wide range of helical folds, the completely automated procedure is able to generate native-like models within a limited number of trials consistently.
Collapse
Affiliation(s)
- Chao Zhang
- Department of Chemistry and E. O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
145
|
Reva B, Kister A, Topiol S, Gelfand I. Determining the roles of different chain fragments in recognition of immunoglobulin fold. Protein Eng Des Sel 2002; 15:13-9. [PMID: 11842233 DOI: 10.1093/protein/15.1.13] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We examine sequence-to-structure specificity of beta-structural fragments of immunoglobulin domains. The structure specificity of separate chain fragments is estimated by computing the Z-score values in recognition of the native structure in gapless threading tests. To improve the accuracy of our calculations we use energy averaging over diverse homologs of immunoglobulin domains. We show that the interactions between residues of beta-structure are more determinant in recognition of the native structure than the interactions within the whole chain molecule. This result distinguishes immunoglobulins from more typical proteins where the interactions between residues of the whole chain normally recognize the native fold more accurately than interactions between the residues of the secondary structure residues alone [Reva,B. and Topiol,S. (2000) BIOCOMPUTING: Proceedings of the Pacific Symposium. World Scientific Publishing Co., pp. 168-178]. We also find that the predominant contributions of the secondary structure are produced by the four central beta-strands that form the core of the molecule. The results of this study allow us through quantitative means to understand the architecture of immunoglobulin molecules. Comparing the fold recognition data for different chain fragments one can say that beta-strands form a rigid frame for immunoglobulin molecules, whereas loops, with no structural role, can develop a broad variety of binding specificities. It is well known that protein function is determined by specific portions of a protein chain. This study suggests that the whole protein structure can be predominantly determined by a few fragments of chain which form the structural framework of the molecule. This idea may help in better understanding the mechanisms of protein evolution: strengthening a protein structure in the key framework-forming regions allows mutations and flexibility in other chain regions.
Collapse
Affiliation(s)
- B Reva
- CTA/CAMM, Novartis Institute for Biomedical Research, 556 Morris Avenue, Summit, NJ 07901, USA.
| | | | | | | |
Collapse
|
146
|
Feldman HJ, Hogue CW. Probabilistic sampling of protein conformations: New hope for brute force? Proteins 2001. [DOI: 10.1002/prot.1163] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
147
|
Abstract
We introduce a new variant of the root mean square distance (RMSD) for comparing protein structures whose range of values is independent of protein size. This new dimensionless measure (relative RMSD, or RRMSD) is zero between identical structures and one between structures that are as globally dissimilar as an average pair of random polypeptides of respective sizes. The RRMSD probability distribution between random polypeptides converges to a universal curve as the chain length increases. The correlation coefficients between aligned random structures are computed as a function of polypeptide size showing two characteristic lengths of 4.7 and 37 residues. These lengths mark the separation between phases of different structural order between native protein fragments. The implications for threading are discussed.
Collapse
Affiliation(s)
- M R Betancourt
- Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893 N. Warson Rd., Creve Coeur, MO 63141, USA
| | | |
Collapse
|
148
|
Zhang Y, Skolnick J. Parallel-hat tempering: A Monte Carlo search scheme for the identification of low-energy structures. J Chem Phys 2001. [DOI: 10.1063/1.1396672] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
149
|
Pillardy J, Czaplewski C, Liwo A, Lee J, Ripoll DR, Kaźmierkiewicz R, Oldziej S, Wedemeyer WJ, Gibson KD, Arnautova YA, Saunders J, Ye YJ, Scheraga HA. Recent improvements in prediction of protein structure by global optimization of a potential energy function. Proc Natl Acad Sci U S A 2001; 98:2329-33. [PMID: 11226239 PMCID: PMC30138 DOI: 10.1073/pnas.041609598] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent improvements of a hierarchical ab initio or de novo approach for predicting both alpha and beta structures of proteins are described. The united-residue energy function used in this procedure includes multibody interactions from a cumulant expansion of the free energy of polypeptide chains, with their relative weights determined by Z-score optimization. The critical initial stage of the hierarchical procedure involves a search of conformational space by the conformational space annealing (CSA) method, followed by optimization of an all-atom model. The procedure was assessed in a recent blind test of protein structure prediction (CASP4). The resulting lowest-energy structures of the target proteins (ranging in size from 70 to 244 residues) agreed with the experimental structures in many respects. The entire experimental structure of a cyclic alpha-helical protein of 70 residues was predicted to within 4.3 A alpha-carbon (C(alpha)) rms deviation (rmsd) whereas, for other alpha-helical proteins, fragments of roughly 60 residues were predicted to within 6.0 A C(alpha) rmsd. Whereas beta structures can now be predicted with the new procedure, the success rate for alpha/beta- and beta-proteins is lower than that for alpha-proteins at present. For the beta portions of alpha/beta structures, the C(alpha) rmsd's are less than 6.0 A for contiguous fragments of 30-40 residues; for one target, three fragments (of length 10, 23, and 28 residues, respectively) formed a compact part of the tertiary structure with a C(alpha) rmsd less than 6.0 A. Overall, these results constitute an important step toward the ab initio prediction of protein structure solely from the amino acid sequence.
Collapse
Affiliation(s)
- J Pillardy
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
150
|
Abstract
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.
Collapse
Affiliation(s)
- B Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|