1251
|
Pandit SB, Skolnick J. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics 2008; 9:531. [PMID: 19077267 PMCID: PMC2628391 DOI: 10.1186/1471-2105-9-531] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 12/12/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. RESULTS To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TM-align, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr-TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. CONCLUSION Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr-TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.
Collapse
Affiliation(s)
- Shashi Bhushan Pandit
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, USA.
| | | |
Collapse
|
1252
|
Carrillo-Tripp M, Brooks CL, Reddy VS. A novel method to map and compare protein-protein interactions in spherical viral capsids. Proteins 2008; 73:644-55. [PMID: 18491385 DOI: 10.1002/prot.22088] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Viral capsids are composed of multiple copies of one or a few chemically distinct capsid proteins and are mostly stabilized by inter subunit protein-protein interactions. There have been efforts to identify and analyze these protein-protein interactions, in terms of their extent and similarity, between the subunit interfaces related by quasi- and icosahedral symmetry. Here, we describe a new method to map quaternary interactions in spherical virus capsids onto polar angle space with respect to the icosahedral symmetry axes using azimuthal orthographic diagrams. This approach enables one to map the nonredundant interactions in a spherical virus capsid, irrespective of its size or triangulation number (T), onto the reference icosahedral asymmetric unit space. The resultant diagrams represent characteristic fingerprints of quaternary interactions of the respective capsids. Hence, they can be used as road maps of the protein-protein interactions to visualize the distribution and the density of the interactions. In addition, unlike the previous studies, the fingerprints of different capsids, when represented in a matrix form, can be compared with one another to quantitatively evaluate the similarity (S-score) in the subunit environments and the associated protein-protein interactions. The S-score selectively distinguishes the similarity, or lack of it, in the locations of the quaternary interactions as opposed to other well-known structural similarity metrics (e.g., RMSD, TM-score). Application of this method on a subset of T = 1 and T = 3 capsids suggests that S-score values range between 1 and 0.6 for capsids that belong to the same virus family/genus; 0.6-0.3 for capsids from different families with the same T-number and similar subunit fold; and <0.3 for comparisons of the dissimilar capsids that display different quaternary architectures (T-numbers). Finally, the sequence conserved interface residues within a virus family, whose spatial locations were also conserved have been hypothesized as the essential residues for self-assembly of the member virus capsids.
Collapse
Affiliation(s)
- Mauricio Carrillo-Tripp
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
1253
|
Lu Y, Sze SH. Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. Nucleic Acids Res 2008; 37:463-72. [PMID: 19056820 PMCID: PMC2632924 DOI: 10.1093/nar/gkn945] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on a few sets of benchmark alignments that are commonly used to measure alignment accuracy, and the average improvements in accuracy can be as much as 1–3% on protein sequence alignment and 5–10% on DNA/RNA sequence alignment. Unlike previous algorithms, consistent average improvements can be obtained across all identity levels.
Collapse
Affiliation(s)
- Yue Lu
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX 77843, USA
| | | |
Collapse
|
1254
|
Lee J, Joo K, Kim SY, Lee J. Re-examination of structure optimization of off-lattice protein AB models by conformational space annealing. J Comput Chem 2008; 29:2479-84. [PMID: 18470971 DOI: 10.1002/jcc.20995] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The global structural optimization is carried out for off-lattice protein AB models in two and three dimensions by conformational space annealing. The models consist of hydrophobic and hydrophilic monomers in Fibonacci sequences. To accelerate the convergence, we have introduced a shift operator in the internal coordinate system, and effectively reduced the search space by forming a quotient space. With this, we significantly improve our previous results on AB models, and provide new low energy conformations. This work provides insights on exploring complicated energy landscapes by exploiting the advantages and limitations of CSA.
Collapse
Affiliation(s)
- Jinwoo Lee
- Department of Mathematics, Kwangwoon University, 26 Kwangoon Street, Nowon-Gu, Seoul 139-701 Korea.
| | | | | | | |
Collapse
|
1255
|
Nadler W, Meinke JH, Hansmann UHE. Folding proteins by first-passage-times-optimized replica exchange. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008; 78:061905. [PMID: 19256866 DOI: 10.1103/physreve.78.061905] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2008] [Indexed: 05/27/2023]
Abstract
Replica exchange simulations have become the method of choice in computational protein science, but they still often do not allow an efficient sampling of low-energy protein configurations. Here, we reconstruct replica flow in the temperature ladder from first passage times and use it for temperature optimization, thereby maximizing sampling. The method is applied in simulations of folding thermodynamics for a number of proteins starting from the pentapeptide Met-enkephalin, through the 36-residue HP-36, up to the 67-residue protein GS-alpha3W.
Collapse
Affiliation(s)
- Walter Nadler
- John-von-Neumann Institute for Computing, Forschungszentrum Jülich, D-52425 Jülich, Germany.
| | | | | |
Collapse
|
1256
|
Nicosia G, Stracquadanio G. Generalized pattern search algorithm for Peptide structure prediction. Biophys J 2008; 95:4988-99. [PMID: 18487293 PMCID: PMC2576383 DOI: 10.1529/biophysj.107.124016] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Accepted: 03/20/2008] [Indexed: 11/18/2022] Open
Abstract
Finding the near-native structure of a protein is one of the most important open problems in structural biology and biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to structures already known. This situation occurs frequently when we need to predict the tertiary structure of small molecules, called peptides. In this research work, we propose a new ab initio algorithm, the generalized pattern search algorithm, based on the well-known class of Search-and-Poll algorithms. We performed an extensive set of simulations over a well-known set of 44 peptides to investigate the robustness and reliability of the proposed algorithm, and we compared the peptide conformation with a state-of-the-art algorithm for peptide structure prediction known as PEPstr. In particular, we tested the algorithm on the instances proposed by the originators of PEPstr, to validate the proposed algorithm; the experimental results confirm that the generalized pattern search algorithm outperforms PEPstr by 21.17% in terms of average root mean-square deviation, RMSD C(alpha).
Collapse
Affiliation(s)
- Giuseppe Nicosia
- Department of Mathematics and Computer Science, University of Catania, Catania, Italy
| | | |
Collapse
|
1257
|
Sacan A, Toroslu IH, Ferhatosmanoglu H. Integrated search and alignment of protein structures. Bioinformatics 2008; 24:2872-9. [PMID: 18945684 DOI: 10.1093/bioinformatics/btn545] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identification and comparison of similar three-dimensional (3D) protein structures has become an even greater challenge in the face of the rapidly growing structure databases. Here, we introduce Vorometric, a new method that provides efficient search and alignment of a query protein against a database of protein structures. Voronoi contacts of the protein residues are enriched with the secondary structure information and a metric substitution matrix is developed to allow efficient indexing. The contact hits obtained from a distance-based indexing method are extended to obtain high-scoring segment pairs, which are then used to generate structural alignments. RESULTS Vorometric is the first to address both search and alignment problems in the protein structure databases. The experimental results show that Vorometric is simultaneously effective in retrieving similar protein structures, producing high-quality structure alignments, and identifying cross-fold similarities. Vorometric outperforms current structure retrieval methods in search accuracy, while requiring com-parable running times. Furthermore, the structural superpositions produced are shown to have better quality and coverage, when compared with those of the popular structure alignment tools. AVAILABILITY Vorometric is available as a web service at http://bio.cse.ohio-state.edu/Vorometric
Collapse
Affiliation(s)
- Ahmet Sacan
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey.
| | | | | |
Collapse
|
1258
|
Wu S, Zhang Y. ANGLOR: a composite machine-learning algorithm for protein backbone torsion angle prediction. PLoS One 2008; 3:e3400. [PMID: 18923703 PMCID: PMC2559866 DOI: 10.1371/journal.pone.0003400] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 09/18/2008] [Indexed: 11/20/2022] Open
Abstract
We developed a composite machine-learning based algorithm, called ANGLOR, to predict real-value protein backbone torsion angles from amino acid sequences. The input features of ANGLOR include sequence profiles, predicted secondary structure and solvent accessibility. In a large-scale benchmarking test, the mean absolute error (MAE) of the phi/psi prediction is 28°/46°, which is ∼10% lower than that generated by software in literature. The prediction is statistically different from a random predictor (or a purely secondary-structure-based predictor) with p-value <1.0×10−300 (or <1.0×10−148) by Wilcoxon signed rank test. For some residues (ILE, LEU, PRO and VAL) and especially the residues in helix and buried regions, the MAE of phi angles is much smaller (10–20°) than that in other environments. Thus, although the average accuracy of the ANGLOR prediction is still low, the portion of the accurately predicted dihedral angles may be useful in assisting protein fold recognition and ab initio 3D structure modeling.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas, United States of America
| | - Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas, United States of America
- * E-mail:
| |
Collapse
|
1259
|
Eramian D, Eswar N, Shen MY, Sali A. How well can the accuracy of comparative protein structure models be predicted? Protein Sci 2008; 17:1881-93. [PMID: 18832340 DOI: 10.1110/ps.036061.108] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Calpha root-mean-squared deviation (RMSD) and native overlap (NO3.5A) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5A errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, University of California at San Francisco, California 94158, USA
| | | | | | | |
Collapse
|
1260
|
Wu S, Zhang Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008; 72:547-56. [PMID: 18247410 DOI: 10.1002/prot.21945] [Citation(s) in RCA: 276] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, Kansas 66047, USA
| | | |
Collapse
|
1261
|
Vallat BK, Pillardy J, Elber R. A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins. Proteins 2008; 72:910-28. [PMID: 18300226 PMCID: PMC2907141 DOI: 10.1002/prot.21976] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank (PDB) a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is used to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50 and 100%) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6 A RMSD from the native structure) decays linearly as a function of the TM structural-alignment score.
Collapse
Affiliation(s)
- Brinda Kizhakke Vallat
- Department of Computer Science, Cornell University, Upson Hall 4130, Ithaca, New York 14853, USA
| | | | | |
Collapse
|
1262
|
Zhou H, Skolnick J. Protein model quality assessment prediction by combining fragment comparisons and a consensus C(alpha) contact potential. Proteins 2008; 71:1211-8. [PMID: 18004783 DOI: 10.1002/prot.21813] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this work, we develop a fully automated method for the quality assessment prediction of protein structural models generated by structure prediction approaches such as fold recognition servers, or ab initio methods. The approach is based on fragment comparisons and a consensus C(alpha) contact potential derived from the set of models to be assessed and was tested on CASP7 server models. The average Pearson linear correlation coefficient between predicted quality and model GDT-score per target is 0.83 for the 98 targets, which is better than those of other quality assessment methods that participated in CASP7. Our method also outperforms the other methods by about 3% as assessed by the total GDT-score of the selected top models.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
1263
|
Bernsel A, Viklund H, Elofsson A. Remote homology detection of integral membrane proteins using conserved sequence features. Proteins 2008; 71:1387-99. [PMID: 18076048 DOI: 10.1002/prot.21825] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/.
Collapse
Affiliation(s)
- Andreas Bernsel
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | |
Collapse
|
1264
|
Latek D, Kolinski A. Contact prediction in protein modeling: scoring, folding and refinement of coarse-grained models. BMC STRUCTURAL BIOLOGY 2008; 8:36. [PMID: 18694501 PMCID: PMC2527566 DOI: 10.1186/1472-6807-8-36] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2008] [Accepted: 08/11/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB. RESULTS We described comprehensive tests of the effectiveness of contact data in various aspects of de novo modeling with CABS, an algorithm which was used successfully in CASP6 by the Kolinski-Bujnicki group. We used the predicted contacts in a simple scoring function for the post-simulation ranking of protein models and as a soft bias in the folding simulations and in the fold-refinement procedure. The latter approach turned out to be the most successful. The CABS force field used in the Replica Exchange Monte Carlo simulations cooperated with the true contacts and discriminated the false ones, which resulted in an improvement of the majority of Kolinski-Bujnicki's protein models. In the modeling we tested different sets of predicted contact data submitted to the CASP6 server. According to our results, the best performing were the contacts with the accuracy balanced with the coverage, obtained either from the best two predictors only or by a consensus from as many predictors as possible. CONCLUSION Our tests have shown that theoretically predicted contacts can be very beneficial for protein structure prediction. Depending on the protein modeling method, a contact data set applied should be prepared with differently balanced coverage and accuracy of predicted contacts. Namely, high coverage of contact data is important for the model ranking and high accuracy for the folding simulations.
Collapse
Affiliation(s)
- Dorota Latek
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| | | |
Collapse
|
1265
|
Csaba G, Birzele F, Zimmer R. Protein structure alignment considering phenotypic plasticity. Bioinformatics 2008; 24:i98-104. [DOI: 10.1093/bioinformatics/btn271] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
1266
|
Alternating evolutionary pressure in a genetic algorithm facilitates protein model selection. BMC STRUCTURAL BIOLOGY 2008; 8:34. [PMID: 18673557 PMCID: PMC2527322 DOI: 10.1186/1472-6807-8-34] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 08/01/2008] [Indexed: 11/12/2022]
Abstract
Background Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. Nevertheless, there are still potential improvements to be made in template selection, refinement and protein model selection. Results In the context of an automatic modelling pipeline, we analysed each step separately, revealing several non-intuitive trends and explored a new strategy for protein conformation sampling using Genetic Algorithms (GA). We apply the concept of alternating evolutionary pressure (AEP), i.e. intermediate rounds within the GA runs where unrestrained, linear growth of the model populations is allowed. Conclusion This approach improves the overall performance of the GA by allowing models to overcome local energy barriers. AEP enabled the selection of the best models in 40% of all targets; compared to 25% for a normal GA.
Collapse
|
1267
|
McGuffin LJ. Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics 2008; 24:1798-804. [DOI: 10.1093/bioinformatics/btn326] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
1268
|
Protein model refinement using an optimized physics-based all-atom force field. Proc Natl Acad Sci U S A 2008; 105:8268-73. [PMID: 18550813 DOI: 10.1073/pnas.0800054105] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the greatest challenges in protein structure prediction is the refinement of low-resolution predicted models to high-resolution structures that are close to the native state. Although contemporary structure prediction methods can assemble the correct topology for a large fraction of protein domains, such approximate models are often not of the resolution required for many important applications, including studies of reaction mechanisms and virtual ligand screening. Thus, the development of a method that could bring those structures closer to the native state is of great importance. We recently optimized the relative weights of the components of the Amber ff03 potential on a large set of decoy structures to create a funnel-shaped energy landscape with the native structure at the global minimum. Such an energy function might be able to drive proteins toward their native structure. In this work, for a test set of 47 proteins, with 100 decoy structures per protein that have a range of structural similarities to the native state, we demonstrate that our optimized potential can drive protein models closer to their native structure. Comparing the lowest-energy structure from each trajectory with the starting decoy, structural improvement is seen for 70% of the models on average. The ability to do such systematic structural refinements by using a physics-based all-atom potential represents a promising approach to high-resolution structure prediction.
Collapse
|
1269
|
Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints. Biophys J 2008; 95:1956-64. [PMID: 18487301 DOI: 10.1529/biophysj.108.129759] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To improve tertiary structure predictions of more difficult targets, the next generation of TASSER, TASSER_2.0, has been developed. TASSER_2.0 incorporates more accurate side-chain contact restraint predictions from a new approach, the composite-sequence method, based on consensus restraints generated by an improved threading algorithm, PROSPECTOR_3.5, which uses computationally evolved and wild-type template sequences as input. TASSER_2.0 was tested on a large-scale, benchmark set of 2591 nonhomologous, single domain proteins < or =200 residues that cover the Protein Data Bank at 35% pairwise sequence identity. Compared with the average fraction of accurately predicted side-chain contacts of 0.37 using PROSPECTOR_3.5 with wild-type template sequences, the average accuracy of the composite-sequence method increases to 0.60. The resulting TASSER_2.0 models are closer to their native structures, with an average root mean-square deviation of 4.99 A compared to the 5.31 A result of TASSER. Defining a successful prediction as a model with a root mean-square deviation to native <6.5 A, the success rate of TASSER_2.0 (TASSER) for Medium targets (targets with good templates/poor alignments) is 74.3% (64.7%) and 40.8% (35.5%) for the Hard targets (incorrect templates/alignments). For Easy targets (good templates/alignments), the success rate slightly increases from 86.3% to 88.4%.
Collapse
|
1270
|
Pei J. Multiple protein sequence alignment. Curr Opin Struct Biol 2008; 18:382-6. [PMID: 18485694 DOI: 10.1016/j.sbi.2008.03.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2008] [Accepted: 03/18/2008] [Indexed: 11/16/2022]
Abstract
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA.
| |
Collapse
|
1271
|
Larsson P, Wallner B, Lindahl E, Elofsson A. Using multiple templates to improve quality of homology models in automated homology modeling. Protein Sci 2008; 17:990-1002. [PMID: 18441233 DOI: 10.1110/ps.073344908] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
When researchers build high-quality models of protein structure from sequence homology, it is today common to use several alternative target-template alignments. Several methods can, at least in theory, utilize information from multiple templates, and many examples of improved model quality have been reported. However, to our knowledge, thus far no study has shown that automatic inclusion of multiple alignments is guaranteed to improve models without artifacts. Here, we have carried out a systematic investigation of the potential of multiple templates to improving homology model quality. We have used test sets consisting of targets from both recent CASP experiments and a larger reference set. In addition to Modeller and Nest, a new method (Pfrag) for multiple template-based modeling is used, based on the segment-matching algorithm from Levitt's SegMod program. Our results show that all programs can produce multi-template models better than any of the single-template models, but a large part of the improvement is simply due to extension of the models. Most of the remaining improved cases were produced by Modeller. The most important factor is the existence of high-quality single-sequence input alignments. Because of the existence of models that are worse than any of the top single-template models, the average model quality does not improve significantly. However, by ranking models with a model quality assessment program such as ProQ, the average quality is improved by approximately 5% in the CASP7 test set.
Collapse
Affiliation(s)
- Per Larsson
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | | | |
Collapse
|
1272
|
Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008; 18:342-8. [PMID: 18436442 DOI: 10.1016/j.sbi.2008.02.004] [Citation(s) in RCA: 304] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 02/14/2008] [Indexed: 10/22/2022]
Abstract
Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, United States.
| |
Collapse
|
1273
|
Bennett-Lovsey RM, Herbert AD, Sternberg MJE, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008; 70:611-25. [PMID: 17876813 DOI: 10.1002/prot.21688] [Citation(s) in RCA: 348] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.
Collapse
Affiliation(s)
- Riccardo M Bennett-Lovsey
- Structural Bioinformatics Group, Division of Molecular Biosciences, Imperial College London, London SW7 2AY, United Kingdom
| | | | | | | |
Collapse
|
1274
|
Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008; 71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 737] [Impact Index Per Article: 46.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.
Collapse
Affiliation(s)
- Pascal Benkert
- Institute for Biochemistry, University of Cologne, 50674 Cologne, Germany
| | | | | |
Collapse
|
1275
|
Wrabl JO, Grishin NV. Statistics of Random Protein Superpositions: p-Values for Pairwise Structure Alignment. J Comput Biol 2008; 15:317-55. [DOI: 10.1089/cmb.2007.0161] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- James O. Wrabl
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas
| |
Collapse
|
1276
|
Silva PJ. Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis. Proteins 2008; 70:1588-94. [PMID: 17918727 DOI: 10.1002/prot.21803] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Hydrophobic cluster analysis (HCA) has long been used as a tool to detect distant homologies between protein sequences, and to classify them into different folds. However, it relies on expert human intervention, and is sensitive to subjective interpretations of pattern similarities. In this study, we describe a novel algorithm to assess the similarity of hydrophobic amino acid distributions between two sequences. Our algorithm correctly identifies as misattributions several HCA-based proposals of structural similarity between unrelated proteins present in the literature. We have also used this method to identify the proper fold of a large variety of sequences, and to automatically select the most appropriate structure for homology modeling of several proteins with low sequence identity to any other member of the protein data bank. Automatic modeling of the target proteins based on these templates yielded structures with TM-scores (vs. experimental structures) above 0.60, even without further refinement. Besides enabling a reliable identification of the correct fold of an unknown sequence and the choice of suitable templates, our algorithm also shows that whereas most structural classes of proteins are very homogeneous in hydrophobic cluster composition, a tenth of the described families are compatible with a large variety of hydrophobic patterns. We have built a browsable database of every major representative hydrophobic cluster pattern present in each structural class of proteins, freely available at http://www2.ufp.pt/ pedros/HCA_db/index.htm.
Collapse
Affiliation(s)
- Pedro J Silva
- REQUIMTE, Fac. de Ciências da Saúde, Univ. Fernando Pessoa, Rua Carlos da Maia, 296, 4200-150 Porto-Portugal.
| |
Collapse
|
1277
|
Wu S, Zhang Y. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. ACTA ACUST UNITED AC 2008; 24:924-31. [PMID: 18296462 DOI: 10.1093/bioinformatics/btn069] [Citation(s) in RCA: 151] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Pair-wise residue-residue contacts in proteins can be predicted from both threading templates and sequence-based machine learning. However, most structure modeling approaches only use the template-based contact predictions in guiding the simulations; this is partly because the sequence-based contact predictions are usually considered to be less accurate than that by threading. With the rapid progress in sequence databases and machine-learning techniques, it is necessary to have a detailed and comprehensive assessment of the contact-prediction methods in different template conditions. RESULTS We develop two methods for protein-contact predictions: SVM-SEQ is a sequence-based machine learning approach which trains a variety of sequence-derived features on contact maps; SVM-LOMETS collects consensus contact predictions from multiple threading templates. We test both methods on the same set of 554 proteins which are categorized into 'Easy', 'Medium', 'Hard' and 'Very Hard' targets based on the evolutionary and structural distance between templates and targets. For the Easy and Medium targets, SVM-LOMETS obviously outperforms SVM-SEQ; but for the Hard and Very Hard targets, the accuracy of the SVM-SEQ predictions is higher than that of SVM-LOMETS by 12-25%. If we combine the SVM-SEQ and SVM-LOMETS predictions together, the total number of correctly predicted contacts in the Hard proteins will increase by more than 60% (or 70% for the long-range contact with a sequence separation > or =24), compared with SVM-LOMETS alone. The advantage of SVM-SEQ is also shown in the CASP7 free modeling targets where the SVM-SEQ is around four times more accurate than SVM-LOMETS in the long-range contact prediction. These data demonstrate that the state-of-the-art sequence-based contact prediction has reached a level which may be helpful in assisting tertiary structure modeling for the targets which do not have close structure templates. The maximum yield should be obtained by the combination of both sequence- and template-based predictions.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | | |
Collapse
|
1278
|
Tan CW, Jones DT. Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction. BMC Bioinformatics 2008; 9:94. [PMID: 18267018 PMCID: PMC2267779 DOI: 10.1186/1471-2105-9-94] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2007] [Accepted: 02/11/2008] [Indexed: 11/13/2022] Open
Abstract
Background We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks. Results Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested. Conclusion This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination.
Collapse
Affiliation(s)
- Ching-Wai Tan
- Department of Computer Science, University College London, London, UK.
| | | |
Collapse
|
1279
|
Biegert A, Söding J. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics 2008; 24:807-14. [DOI: 10.1093/bioinformatics/btn039] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
1280
|
Identification of Quaternary Structure and Functional Domains of the CI Repressor from Bacteriophage TP901-1. J Mol Biol 2008; 376:983-96. [DOI: 10.1016/j.jmb.2007.12.022] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2007] [Revised: 12/10/2007] [Accepted: 12/11/2007] [Indexed: 11/21/2022]
|
1281
|
Furnham N, de Bakker PI, Gore S, Burke DF, Blundell TL. Comparative modelling by restraint-based conformational sampling. BMC STRUCTURAL BIOLOGY 2008; 8:7. [PMID: 18237407 PMCID: PMC2275734 DOI: 10.1186/1472-6807-8-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Accepted: 01/31/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Although comparative modelling is routinely used to produce three-dimensional models of proteins, very few automated approaches are formulated in a way that allows inclusion of restraints derived from experimental data as well as those from the structures of homologues. Furthermore, proteins are usually described as a single conformer, rather than an ensemble that represents the heterogeneity and inaccuracy of experimentally determined protein structures. Here we address these issues by exploring the application of the restraint-based conformational space search engine, RAPPER, which has previously been developed for rebuilding experimentally defined protein structures and for fitting models to electron density derived from X-ray diffraction analyses. RESULTS A new application of RAPPER for comparative modelling uses positional restraints and knowledge-based sampling to generate models with accuracies comparable to other leading modelling tools. Knowledge-based predictions are based on geometrical features of the homologous templates and rules concerning main-chain and side-chain conformations. By directly changing the restraints derived from available templates we estimate the accuracy limits of the method in comparative modelling. CONCLUSION The application of RAPPER to comparative modelling provides an effective means of exploring the conformational space available to a target sequence. Enhanced methods for generating positional restraints can greatly improve structure prediction. Generation of an ensemble of solutions that are consistent with both target sequence and knowledge derived from the template structures provides a more appropriate representation of a structural prediction than a single model. By formulating homologous structural information as sets of restraints we can begin to consider how comparative models might be used to inform conformer generation from sparse experimental data.
Collapse
Affiliation(s)
- Nicholas Furnham
- Department of Biochemistry, Sanger Building, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK.
| | | | | | | | | |
Collapse
|
1282
|
Mereghetti P, Ganadu ML, Papaleo E, Fantucci P, De Gioia L. Validation of protein models by a neural network approach. BMC Bioinformatics 2008; 9:66. [PMID: 18230168 PMCID: PMC2276493 DOI: 10.1186/1471-2105-9-66] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2007] [Accepted: 01/29/2008] [Indexed: 11/30/2022] Open
Abstract
Background The development and improvement of reliable computational methods designed to evaluate the quality of protein models is relevant in the context of protein structure refinement, which has been recently identified as one of the bottlenecks limiting the quality and usefulness of protein structure prediction. Results In this contribution, we present a computational method (Artificial Intelligence Decoys Evaluator: AIDE) which is able to consistently discriminate between correct and incorrect protein models. In particular, the method is based on neural networks that use as input 15 structural parameters, which include energy, solvent accessible surface, hydrophobic contacts and secondary structure content. The results obtained with AIDE on a set of decoy structures were evaluated using statistical indicators such as Pearson correlation coefficients, Znat, fraction enrichment, as well as ROC plots. It turned out that AIDE performances are comparable and often complementary to available state-of-the-art learning-based methods. Conclusion In light of the results obtained with AIDE, as well as its comparison with available learning-based methods, it can be concluded that AIDE can be successfully used to evaluate the quality of protein structures. The use of AIDE in combination with other evaluation tools is expected to further enhance protein refinement efforts.
Collapse
Affiliation(s)
- Paolo Mereghetti
- Department of Chemistry, University of Sassari, Via Vienna 2, 07100, Sassari, Italy.
| | | | | | | | | |
Collapse
|
1283
|
Wallner B, Elofsson A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 2008; 69 Suppl 8:184-93. [PMID: 17894353 DOI: 10.1002/prot.21774] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The ability to rank and select the best model is important in protein structure prediction. Model Quality Assessment Programs (MQAPs) are programs developed to perform this task. They can be divided into three categories based on the information they use. Consensus based methods use the similarity to other models, structure-based methods use features calculated from the structure and evolutionary based methods use the sequence similarity between a model and a template. These methods can be trained to predict the overall global quality of a model, that is, how much a model is likely to differ from the native structure. The methods can also be trained to pinpoint which local regions in a model are likely to be incorrect. In CASP7, we participated with three predictors of global and four of local quality using information from the three categories described above. The result shows that the MQAP using consensus, Pcons, was significantly better at predicting both global and local quality compared with MQAPs using only structure or sequence based information.
Collapse
Affiliation(s)
- Björn Wallner
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
1284
|
Tress M, Cheng J, Baldi P, Joo K, Lee J, Seo JH, Lee J, Baker D, Chivian D, Kim D, Ezkurdia I. Assessment of predictions submitted for the CASP7 domain prediction category. Proteins 2008; 69 Suppl 8:137-51. [PMID: 17680686 DOI: 10.1002/prot.21675] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This paper details the assessment process and evaluation results for the Critical Assessment of Protein Structure Prediction (CASP7) domain prediction category. Domain predictions were assessed using the Normalized Domain Overlap score introduced in CASP6 and the accuracy of prediction of domain break points. The results of the analysis clearly demonstrate that the best methods are able to make consistently reliable predictions when the target has a structural template, although they are less good when the domain break occurs in a region not covered by a template. The conditions of the experiment meant that it was impossible to draw any conclusions about domain prediction for free modeling targets and it was also difficult to draw many distinctions between the best groups. Two thirds of the targets submitted were single domains and hence regarded as easy to predict. Even those targets defined as having multiple domains always had at least one domain with a similar template structure.
Collapse
Affiliation(s)
- Michael Tress
- Structural and Biological Computation Programme, Spanish National Cancer Research Centre, Madrid, Spain.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1285
|
Zhou H, Pandit SB, Lee SY, Borreguero J, Chen H, Wroblewska L, Skolnick J. Analysis of TASSER-based CASP7 protein structure prediction results. Proteins 2008; 69 Suppl 8:90-7. [PMID: 17705276 DOI: 10.1002/prot.21649] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
An improved TASSER (Threading/ASSEmbly/Refinement) methodology is applied to predict the tertiary structure for all CASP7 targets. TASSER employs template identification by threading, followed by tertiary structure assembly by rearranging continuous template fragments, where conformational space is searched via Parallel Hyperbolic Monte Carlo sampling with an optimized force-field that includes knowledge-based statistical potentials and restraints derived from threading templates. The final models are selected by clustering structures from the low temperature replicas. Improvements in TASSER over CASP6 involve use of better templates from 3D-jury applied to three threading programs, PROSPECTOR_3, SP(3), and SPARKS, and a fragment comparison method for better model ranking. For targets with no reliable templates, a variant of TASSER (chunk-TASSER) is also applied with potentials and restraints extracted from ab initio folded supersecondary chunks of the target to build full-length models. For all 124 CASP targets/domains, the average root-mean-square-deviation (RMSD) from native and alignment coverage of the best initial threading models from 3D-jury are 6.2 A and 93%, respectively. Following TASSER reassembly, the average RMSD of the best model in the template aligned region decreases to 4.9 A and the average TM-score increases from 0.617 for the template to 0.678 for the best full-length model. Based on target difficulty, the average TM-scores of the final model to native are 0.904, 0.671, and 0.307 for high-accuracy template-based modeling, template-based modeling, and free modeling targets/domains, respectively. For the more difficult targets, TASSER with modest human intervention performed better in comparison to its server counterpart, MetaTASSER, which used a limited time simulation.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | | | | | | | | | | | |
Collapse
|
1286
|
Abstract
We developed and tested the I-TASSER protein structure prediction algorithm in the CASP7 experiment, where targets are first threaded through the PDB library and continuous fragments in the threading alignments are exploited to assemble the global structure. The final models are obtained from the progressive refinements started from the last round structure clusters. A majority of the targets in the template-based modeling (TBM) category have the templates drawn closer to the native structure by more than 1 A within the aligned regions. For the free-modeling (FM) targets, I-TASSER builds correct topology for 7/19 cases with sequence up to 155 residues long. For the first time, the automated server prediction generates models as good as the human-expert does in all the categories, which shows the robustness of the method and the potential of the application to genome-wide structure prediction. Despite the success, the accuracy of I-TASSER modeling is still dominated by the similarity of the template and target structures with a strong correlation coefficient ( approximately 0.9) between the root-mean-squared deviation (RMSD) to native of the templates and the final models. Especially, there is no high-resolution model below 2 A for the FM targets. These problems highlight the issues that need to be addressed in the next generation of atomic-level I-TASSER development especially for the FM target modeling.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics, Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047, USA.
| |
Collapse
|
1287
|
Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008; 9:40. [PMID: 18215316 PMCID: PMC2245901 DOI: 10.1186/1471-2105-9-40] [Citation(s) in RCA: 3801] [Impact Index Per Article: 237.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. RESULTS An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 A for RMSD. CONCLUSION The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/I-TASSER.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA.
| |
Collapse
|
1288
|
McGuffin LJ. The ModFOLD server for the quality assessment of protein structural models. Bioinformatics 2008; 24:586-7. [DOI: 10.1093/bioinformatics/btn014] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
1289
|
Taly JF, Marin A, Gibrat JF. Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models? BMC Bioinformatics 2008; 9:6. [PMID: 18179702 PMCID: PMC2245900 DOI: 10.1186/1471-2105-9-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2007] [Accepted: 01/07/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures. RESULTS For three test sequences whose native structures belong to the all-alpha, all-beta and alphabeta classes we built a set of models intended to cover the whole spectrum: from a perfect model, i.e., the native structure, to a very poor model, i.e., a random alignment of the test sequence with a structure belonging to another structural class, including several intermediate models based on fold recognition alignments. We submitted these models to 11 ns of MD simulations at three different temperatures. We monitored along the corresponding trajectories the mean of the Root-Mean-Square deviations (RMSd) with respect to the initial conformation, the RMSd fluctuations, the number of conformation clusters, the evolution of secondary structures and the surface area of residues. None of these criteria alone is 100% efficient in discriminating correct from erroneous models. The mean RMSd, RMSd fluctuations, secondary structure and clustering of conformations show some false positives whereas the residue surface area criterion shows false negatives. However if we consider these criteria in combination it is straightforward to discriminate the two types of models. CONCLUSION The ability of discriminating correct from erroneous models allows us to improve the specificity and sensitivity of our fold recognition method for a number of ambiguous cases.
Collapse
Affiliation(s)
- Jean-François Taly
- INRA, Unité Mathématique Informatique et Génome UR1077, F-78350 Jouy-en-Josas, France.
| | | | | |
Collapse
|
1290
|
Development of a physics-based force field for the scoring and refinement of protein models. Biophys J 2008; 94:3227-40. [PMID: 18178653 DOI: 10.1529/biophysj.107.121947] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The minimal requirements of a physics-based potential that can refine protein structures are the existence of a correlation between the energy with native similarity and the scoring of the native structure as the lowest in energy. To develop such a force field, the relative weights of the Amber ff03 all-atom potential supplemented by an explicit hydrogen-bond potential were adjusted by global optimization of energetic and structural criteria for a large set of protein decoys generated for a set of 58 nonhomologous proteins. The average correlation coefficient of the energy with TM-score significantly improved from 0.25 for the original ff03 potential to 0.65 for the optimized force field. The fraction of proteins for which the native structure had lowest energy increased from 0.22 to 0.90. Moreover, use of an explicit hydrogen-bond potential improves scoring performance of the force field. Promising preliminary results were obtained in applying the optimized potentials to refine protein decoys using only an energy criterion to choose the best decoy among sampled structures. For a set of seven proteins, 63% of the decoys improve, 18% get worse, and 19% are not changed.
Collapse
|
1291
|
Lorenzen S, Zhang Y. Identification of near-native structures by clustering protein docking conformations. Proteins 2007; 68:187-94. [PMID: 17397057 DOI: 10.1002/prot.21442] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most state-of-the-art protein-protein docking algorithms use the Fast Fourier Transform (FFT) technique to sample the six-dimensional translational and rotational space. Scoring functions including shape complementarity, electrostatics, and desolvation are usually exploited in ranking the docking conformations. While these rigid-body docking methods provide good performance in bound docking, using unbound structures as input frequently leads to a high number of false positive hits. For the purpose of better selecting correct docking conformations, we structurally cluster the docking decoys generated by four widely-used FFT-based protein-protein docking methods. In all cases, the selection based on cluster size outperforms the ranking based on the inherent scoring function. If we cluster decoys from different servers together, only marginal improvement is obtained in comparison with clustering decoys from the best individual server. A collection of multiple decoy sets of comparable quality will be the key to improve the clustering result from meta-docking servers.
Collapse
Affiliation(s)
- Stephan Lorenzen
- Center of Bioinformatics, Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66046, USA
| | | |
Collapse
|
1292
|
Abstract
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt alpha, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score > or = 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.
Collapse
Affiliation(s)
- Jose M Borreguero
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
1293
|
Sadowski MI, Jones DT. Benchmarking template selection and model quality assessment for high-resolution comparative modeling. Proteins 2007; 69:476-85. [PMID: 17623860 DOI: 10.1002/prot.21531] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of "best" models can be found.
Collapse
Affiliation(s)
- M I Sadowski
- Bioinformatics Unit, Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| | | |
Collapse
|
1294
|
Lee M, Jeong CS, Kim D. Predicting and improving the protein sequence alignment quality by support vector regression. BMC Bioinformatics 2007; 8:471. [PMID: 18053160 PMCID: PMC2222655 DOI: 10.1186/1471-2105-8-471] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Accepted: 12/03/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. RESULTS In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. CONCLUSION The present work demonstrates that the alignment quality can be predicted with reasonable accuracy. Our method is useful not only for selecting the optimal alignment parameters for a chosen template based on predicted alignment quality, but also for filtering out problematic templates that are not suitable for structure prediction due to poor alignment accuracy. This is implemented as a part in FORECAST, the server for fold-recognition and is freely available on the web at http://pbil.kaist.ac.kr/forecast.
Collapse
Affiliation(s)
- Minho Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Chan-seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
| |
Collapse
|
1295
|
Abstract
Alternative splicing is thought to be one of the major sources for functional diversity in higher eukaryotes. Interestingly, when mapping splicing events onto protein structures, about half of the events affect structured and even highly conserved regions i.e. are non-trivial on the structure level. This has led to the controversial hypothesis that such splice variants result in nonsense-mediated mRNA decay or non-functional, unstructured proteins, which do not contribute to the functional diversity of an organism. Here we show in a comprehensive study on alternative splicing that proteins appear to be much more tolerant to structural deletions, insertions and replacements than previously thought. We find literature evidence that such non-trivial splicing isoforms exhibit different functional properties compared to their native counterparts and allow for interesting regulatory patterns on the protein network level. We provide examples that splicing events may represent transitions between different folds in the protein sequence–structure space and explain these links by a common genetic mechanism. Taken together, those findings hint to a more prominent role of splicing in protein structure evolution and to a different view of phenotypic plasticity of protein structures.
Collapse
Affiliation(s)
- Fabian Birzele
- Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University, Amalienstrasse 17, D-80333 Munich, Germany.
| | | | | |
Collapse
|
1296
|
Qiu J, Sheffler W, Baker D, Noble WS. Ranking predicted protein structures with support vector regression. Proteins 2007; 71:1175-82. [PMID: 18004754 DOI: 10.1002/prot.21809] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Jian Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | |
Collapse
|
1297
|
Reddy BVB, Kaznessis YN. Use of secondary structural information and C alpha-C alpha distance restraints to model protein structures with MODELLER. J Biosci 2007; 32:929-36. [PMID: 17914235 DOI: 10.1007/s12038-007-0093-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Protein secondary structure predictions and amino acid long range contact map predictions from primary sequence of proteins have been explored to aid in modelling protein tertiary structures. In order to evaluate the usefulness of secondary structure and 3D-residue contact prediction methods to model protein structures we have used the known Q3 (alpha-helix,beta-strands and irregular turns/loops) secondary structure information, along with residue-residue contact information as restraints for MODELLER. We present here results of our modelling studies on 30 best resolved single domain protein structures of varied lengths. The results shows that it is very difficult to obtain useful models even with 100% accurate secondary structure predictions and accurate residue contact predictions for up to 30% of residues in a sequence. The best models that we obtained for proteins of lengths 37, 70, 118, 136 and 193 amino acid residues are of RMSDs 4.17, 5.27, 9.12, 7.89 and 9.69,respectively. The results show that one can obtain better models for the proteins which have high percent of alpha-helix content. This analysis further shows that MODELLER restrain optimization program can be useful only if we have truly homologous structure(s) as a template where it derives numerous restraints, almost identical to the templates used. This analysis also clearly indicates that even if we satisfy several true residue-residue contact distances, up to 30%of their sequence length with fully known secondary structural information, we end up predicting model structures much distant from their corresponding native structures.
Collapse
Affiliation(s)
- Boojala V B Reddy
- Laboratory of Bioinformatics and In Silico Drug Design, Department of Computer Science, Queens College, CUNY 65-30 Kissena Blvd, Flushing, NY 11367, USA.
| | | |
Collapse
|
1298
|
Abstract
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 A for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 A over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.
Collapse
Affiliation(s)
| | - Jeffrey Skolnick
- Address reprint requests to Jeffrey Skolnick, Tel.: 404-407-8975; Fax: 404-385-7478.
| |
Collapse
|
1299
|
McGuffin LJ. Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics 2007; 8:345. [PMID: 17877795 PMCID: PMC2048972 DOI: 10.1186/1471-2105-8-345] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 09/18/2007] [Indexed: 11/25/2022] Open
Abstract
Background Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.
Collapse
Affiliation(s)
- Liam J McGuffin
- The School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.
| |
Collapse
|
1300
|
Wroblewska L, Skolnick J. Can a physics-based, all-atom potential find a protein's native structure among misfolded structures? I. Large scale AMBER benchmarking. J Comput Chem 2007; 28:2059-66. [PMID: 17407093 DOI: 10.1002/jcc.20720] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recent work has shown that physics-based, all-atom energy functions (AMBER, CHARMM, OPLS-AA) and local minimization, when used in scoring, are able to discriminate among native and decoy structures. Yet, there have been only few instances reported of the successful use of physics based potentials in the actual refinement of protein models from a starting conformation to one that ends in structures, which are closer to the native state. An energy function that has a global minimum energy in the protein's native state and a good correlation between energy and native-likeness should be able to drive model structures closer to their native structure during a conformational search. Here, the possible reasons for the discrepancy between the scoring and refinement results for the case of AMBER potential are examined. When the conformational search via molecular dynamics is driven by the AMBER potential for a large set of 150 nonhomologous proteins and their associated decoys, often the native minimum does not appear to be the lowest free energy state. Ways of correcting the potential function in order to make it more suitable for protein model refinement are proposed.
Collapse
Affiliation(s)
- Liliana Wroblewska
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|