Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2006;57:702-10. [PMID: 15476259 DOI: 10.1002/prot.20264] [Citation(s) in RCA: 1303] [Impact Index Per Article: 72.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Number

Cited by Other Article(s)

1201

Oh M, Joo K, Lee J. Protein-binding site prediction based on three-dimensional protein modeling. Proteins 2009;77 Suppl 9:152-6. [DOI: 10.1002/prot.22572] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

1202

Maupetit J, Tuffery P, Derreumaux P. A coarse-grained protein force field for folding and structure prediction. Proteins 2009;69:394-408. [PMID: 17600832 DOI: 10.1002/prot.21505] [Citation(s) in RCA: 164] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

1203

Hsu YH, Burke JE, Li S, Woods VL, Dennis EA. Localizing the membrane binding region of Group VIA Ca2+-independent phospholipase A2 using peptide amide hydrogen/deuterium exchange mass spectrometry. J Biol Chem 2009;284:23652-61. [PMID: 19556238 DOI: 10.1074/jbc.m109.021857] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

1204

McGuffin LJ. Prediction of global and local model quality in CASP8 using the ModFOLD server. Proteins 2009;77 Suppl 9:185-90. [DOI: 10.1002/prot.22491] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

1205

Zhou H, Skolnick J. Protein structure prediction by pro-Sp3-TASSER. Biophys J 2009;96:2119-27. [PMID: 19289038 DOI: 10.1016/j.bpj.2008.12.3898] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 11/12/2008] [Accepted: 12/03/2008] [Indexed: 12/29/2022] Open

Abstract

An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP(3). Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request.

Collapse

1206

Benkert P, Schwede T, Tosatto SC. QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC STRUCTURAL BIOLOGY 2009;9:35. [PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/20/2009] [Indexed: 11/10/2022]

Abstract

BACKGROUND

The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus.

RESULTS

Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach.

CONCLUSION

Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.

Collapse

1207

Mukherjee S, Zhang Y. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming. Nucleic Acids Res 2009;37:e83. [PMID: 19443443 PMCID: PMC2699532 DOI: 10.1093/nar/gkp318] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

1208

Maupetit J, Derreumaux P, Tuffery P. PEP-FOLD: an online resource for de novo peptide structure prediction. Nucleic Acids Res 2009;37:W498-503. [PMID: 19433514 PMCID: PMC2703897 DOI: 10.1093/nar/gkp323] [Citation(s) in RCA: 282] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

1209

Lobley A, Sadowski MI, Jones DT. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. ACTA ACUST UNITED AC 2009;25:1761-7. [PMID: 19429599 DOI: 10.1093/bioinformatics/btp302] [Citation(s) in RCA: 213] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

1210

Darapaneni V, Prabhaker VK, Kukol A. Large-scale analysis of influenza A virus sequences reveals potential drug target sites of non-structural proteins. J Gen Virol 2009;90:2124-33. [PMID: 19420157 DOI: 10.1099/vir.0.011270-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

1211

Gao X, Bu D, Xu J, Li M. Improving consensus contact prediction via server correlation reduction. BMC STRUCTURAL BIOLOGY 2009;9:28. [PMID: 19419562 PMCID: PMC2689239 DOI: 10.1186/1472-6807-9-28] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Accepted: 05/06/2009] [Indexed: 11/10/2022]

Abstract

Background

Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them.

Results

In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively.

Conclusion

Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.

Collapse

1212

Rocha J, Segura J, Wilson RC, Dasgupta S. Flexible structural protein alignment by a sequence of local transformations. ACTA ACUST UNITED AC 2009;25:1625-31. [PMID: 19417057 PMCID: PMC2940242 DOI: 10.1093/bioinformatics/btp296] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

1213

Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors. ALGORITHMS 2009. [DOI: 10.3390/a2020692] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

1214

Csaba G, Birzele F, Zimmer R. Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC STRUCTURAL BIOLOGY 2009;9:23. [PMID: 19374763 PMCID: PMC2678134 DOI: 10.1186/1472-6807-9-23] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Accepted: 04/17/2009] [Indexed: 11/23/2022]

1215

Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009;19:145-55. [PMID: 19327982 PMCID: PMC2673339 DOI: 10.1016/j.sbi.2009.02.005] [Citation(s) in RCA: 191] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Revised: 02/18/2009] [Accepted: 02/19/2009] [Indexed: 10/21/2022]

1216

Pascual-García A, Abia D, Ortiz ÁR, Bastolla U. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput Biol 2009;5:e1000331. [PMID: 19325884 PMCID: PMC2654728 DOI: 10.1371/journal.pcbi.1000331] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 02/11/2009] [Indexed: 11/19/2022] Open

Abstract

Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.php

Making order of the fast-growing information on proteins is essential for gaining evolutionary and functional knowledge. The most successful approaches to this task are based on classifications of protein structures, such as SCOP and CATH, which assume a discrete view of the protein structure space as a collection of separated equivalence classes (folds). However, several authors proposed that protein domains should be regarded as assemblies of polypeptide fragments, which implies that the protein–structure space is continuous. Here, we assess these views of domain space through the concept of transitivity; i.e., we test whether structure similarity of A with B and B with C implies that A and C are similar, as required for consistent classification. We find that the domain space is approximately transitive and discrete at high similarity and continuous at low similarity, where transitivity is severely violated. Comparing our classification at the cross-over similarity with CATH and SCOP, we find that they join proteins at low similarity where classification is inconsistent. Part of this discrepancy is due to structural divergence of homologous domains, which are forced to be in a single cluster in CATH and SCOP. Structural and evolutionary relationships between consistent clusters are represented as a network in our approach, going beyond current protein classification schemes. We conjecture that our results are related to a change of evolutionary regime, from uniparental divergent evolution for highly related domains to assembly of large fragments for which the classical tree representation is unsuitable.

Collapse

1217

Dukka BKC. Improving consensus structure by eliminating averaging artifacts. BMC STRUCTURAL BIOLOGY 2009;9:12. [PMID: 19267905 PMCID: PMC2662860 DOI: 10.1186/1472-6807-9-12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2008] [Accepted: 03/06/2009] [Indexed: 11/29/2022]

Abstract

Background

Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles.

Results

Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to Cα models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 Å compared to the average RMSD of the averaged structures from the native structure (3.28 Å for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA [1], our approach produces representative structure of similar RMSD quality, but with much fewer clashes.

Conclusion

The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction [2], which could also benefit from our approach.

Collapse

1218

Martin AJM, Baù D, Vullo A, Walsh I, Pollastri G. Long-range information and physicality constraints improve predicted protein contact maps. J Bioinform Comput Biol 2009;6:1001-20. [PMID: 18942163 DOI: 10.1142/s0219720008003783] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2008] [Revised: 02/25/2008] [Accepted: 03/18/2008] [Indexed: 11/18/2022]

1219

Walsh I, Baù D, Martin AJM, Mooney C, Vullo A, Pollastri G. Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC STRUCTURAL BIOLOGY 2009;9:5. [PMID: 19183478 PMCID: PMC2654788 DOI: 10.1186/1472-6807-9-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2008] [Accepted: 01/30/2009] [Indexed: 11/17/2022]

Abstract

Background

Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.

Results

We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C_αtrace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C_αtraces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.

Conclusion

Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url .

Collapse

1220

Using least median of squares for structural superposition of flexible proteins. BMC Bioinformatics 2009;10:29. [PMID: 19159484 PMCID: PMC2639377 DOI: 10.1186/1471-2105-10-29] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 01/22/2009] [Indexed: 11/10/2022] Open

1221

Protein Structure Prediction. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

1222

An Atomistic View to the Gas Phase Proteome. Structure 2009;17:88-95. [DOI: 10.1016/j.str.2008.11.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2008] [Revised: 10/14/2008] [Accepted: 11/06/2008] [Indexed: 11/22/2022]

1223

Pandit SB, Skolnick J. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics 2008;9:531. [PMID: 19077267 PMCID: PMC2628391 DOI: 10.1186/1471-2105-9-531] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 12/12/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true.

RESULTS

To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TM-align, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr-TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align.

CONCLUSION

Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr-TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.

Collapse

1224

Carrillo-Tripp M, Brooks CL, Reddy VS. A novel method to map and compare protein-protein interactions in spherical viral capsids. Proteins 2008;73:644-55. [PMID: 18491385 DOI: 10.1002/prot.22088] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Abstract

Viral capsids are composed of multiple copies of one or a few chemically distinct capsid proteins and are mostly stabilized by inter subunit protein-protein interactions. There have been efforts to identify and analyze these protein-protein interactions, in terms of their extent and similarity, between the subunit interfaces related by quasi- and icosahedral symmetry. Here, we describe a new method to map quaternary interactions in spherical virus capsids onto polar angle space with respect to the icosahedral symmetry axes using azimuthal orthographic diagrams. This approach enables one to map the nonredundant interactions in a spherical virus capsid, irrespective of its size or triangulation number (T), onto the reference icosahedral asymmetric unit space. The resultant diagrams represent characteristic fingerprints of quaternary interactions of the respective capsids. Hence, they can be used as road maps of the protein-protein interactions to visualize the distribution and the density of the interactions. In addition, unlike the previous studies, the fingerprints of different capsids, when represented in a matrix form, can be compared with one another to quantitatively evaluate the similarity (S-score) in the subunit environments and the associated protein-protein interactions. The S-score selectively distinguishes the similarity, or lack of it, in the locations of the quaternary interactions as opposed to other well-known structural similarity metrics (e.g., RMSD, TM-score). Application of this method on a subset of T = 1 and T = 3 capsids suggests that S-score values range between 1 and 0.6 for capsids that belong to the same virus family/genus; 0.6-0.3 for capsids from different families with the same T-number and similar subunit fold; and <0.3 for comparisons of the dissimilar capsids that display different quaternary architectures (T-numbers). Finally, the sequence conserved interface residues within a virus family, whose spatial locations were also conserved have been hypothesized as the essential residues for self-assembly of the member virus capsids.

Collapse

1225

Lu Y, Sze SH. Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. Nucleic Acids Res 2008;37:463-72. [PMID: 19056820 PMCID: PMC2632924 DOI: 10.1093/nar/gkn945] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

1226

Lee J, Joo K, Kim SY, Lee J. Re-examination of structure optimization of off-lattice protein AB models by conformational space annealing. J Comput Chem 2008;29:2479-84. [PMID: 18470971 DOI: 10.1002/jcc.20995] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

1227

Nadler W, Meinke JH, Hansmann UHE. Folding proteins by first-passage-times-optimized replica exchange. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008;78:061905. [PMID: 19256866 DOI: 10.1103/physreve.78.061905] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2008] [Indexed: 05/27/2023]

1228

Nicosia G, Stracquadanio G. Generalized pattern search algorithm for Peptide structure prediction. Biophys J 2008;95:4988-99. [PMID: 18487293 PMCID: PMC2576383 DOI: 10.1529/biophysj.107.124016] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Accepted: 03/20/2008] [Indexed: 11/18/2022] Open

1229

Sacan A, Toroslu IH, Ferhatosmanoglu H. Integrated search and alignment of protein structures. Bioinformatics 2008;24:2872-9. [PMID: 18945684 DOI: 10.1093/bioinformatics/btn545] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

1230

Wu S, Zhang Y. ANGLOR: a composite machine-learning algorithm for protein backbone torsion angle prediction. PLoS One 2008;3:e3400. [PMID: 18923703 PMCID: PMC2559866 DOI: 10.1371/journal.pone.0003400] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 09/18/2008] [Indexed: 11/20/2022] Open

1231

Eramian D, Eswar N, Shen MY, Sali A. How well can the accuracy of comparative protein structure models be predicted? Protein Sci 2008;17:1881-93. [PMID: 18832340 DOI: 10.1110/ps.036061.108] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

1232

Wu S, Zhang Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008;72:547-56. [PMID: 18247410 DOI: 10.1002/prot.21945] [Citation(s) in RCA: 310] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

1233

Vallat BK, Pillardy J, Elber R. A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins. Proteins 2008;72:910-28. [PMID: 18300226 PMCID: PMC2907141 DOI: 10.1002/prot.21976] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

1234

Zhou H, Skolnick J. Protein model quality assessment prediction by combining fragment comparisons and a consensus C(alpha) contact potential. Proteins 2008;71:1211-8. [PMID: 18004783 DOI: 10.1002/prot.21813] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

1235

Bernsel A, Viklund H, Elofsson A. Remote homology detection of integral membrane proteins using conserved sequence features. Proteins 2008;71:1387-99. [PMID: 18076048 DOI: 10.1002/prot.21825] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

1236

Latek D, Kolinski A. Contact prediction in protein modeling: scoring, folding and refinement of coarse-grained models. BMC STRUCTURAL BIOLOGY 2008;8:36. [PMID: 18694501 PMCID: PMC2527566 DOI: 10.1186/1472-6807-8-36] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2008] [Accepted: 08/11/2008] [Indexed: 11/10/2022]

Abstract

BACKGROUND

Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB.

RESULTS

We described comprehensive tests of the effectiveness of contact data in various aspects of de novo modeling with CABS, an algorithm which was used successfully in CASP6 by the Kolinski-Bujnicki group. We used the predicted contacts in a simple scoring function for the post-simulation ranking of protein models and as a soft bias in the folding simulations and in the fold-refinement procedure. The latter approach turned out to be the most successful. The CABS force field used in the Replica Exchange Monte Carlo simulations cooperated with the true contacts and discriminated the false ones, which resulted in an improvement of the majority of Kolinski-Bujnicki's protein models. In the modeling we tested different sets of predicted contact data submitted to the CASP6 server. According to our results, the best performing were the contacts with the accuracy balanced with the coverage, obtained either from the best two predictors only or by a consensus from as many predictors as possible.

CONCLUSION

Our tests have shown that theoretically predicted contacts can be very beneficial for protein structure prediction. Depending on the protein modeling method, a contact data set applied should be prepared with differently balanced coverage and accuracy of predicted contacts. Namely, high coverage of contact data is important for the model ranking and high accuracy for the folding simulations.

Collapse

1237

Csaba G, Birzele F, Zimmer R. Protein structure alignment considering phenotypic plasticity. Bioinformatics 2008;24:i98-104. [DOI: 10.1093/bioinformatics/btn271] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

1238

Alternating evolutionary pressure in a genetic algorithm facilitates protein model selection. BMC STRUCTURAL BIOLOGY 2008;8:34. [PMID: 18673557 PMCID: PMC2527322 DOI: 10.1186/1472-6807-8-34] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 08/01/2008] [Indexed: 11/12/2022]

1239

McGuffin LJ. Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics 2008;24:1798-804. [DOI: 10.1093/bioinformatics/btn326] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

1240

Protein model refinement using an optimized physics-based all-atom force field. Proc Natl Acad Sci U S A 2008;105:8268-73. [PMID: 18550813 DOI: 10.1073/pnas.0800054105] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

1241

Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints. Biophys J 2008;95:1956-64. [PMID: 18487301 DOI: 10.1529/biophysj.108.129759] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

1242

Pei J. Multiple protein sequence alignment. Curr Opin Struct Biol 2008;18:382-6. [PMID: 18485694 DOI: 10.1016/j.sbi.2008.03.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2008] [Accepted: 03/18/2008] [Indexed: 11/16/2022]

1243

Larsson P, Wallner B, Lindahl E, Elofsson A. Using multiple templates to improve quality of homology models in automated homology modeling. Protein Sci 2008;17:990-1002. [PMID: 18441233 DOI: 10.1110/ps.073344908] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

1244

Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008;18:342-8. [PMID: 18436442 DOI: 10.1016/j.sbi.2008.02.004] [Citation(s) in RCA: 304] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 02/14/2008] [Indexed: 10/22/2022]

1245

Bennett-Lovsey RM, Herbert AD, Sternberg MJE, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008;70:611-25. [PMID: 17876813 DOI: 10.1002/prot.21688] [Citation(s) in RCA: 348] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Abstract

Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.

Collapse

1246

Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008;71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 733] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

1247

Wrabl JO, Grishin NV. Statistics of Random Protein Superpositions: p-Values for Pairwise Structure Alignment. J Comput Biol 2008;15:317-55. [DOI: 10.1089/cmb.2007.0161] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

1248

Silva PJ. Assessing the reliability of sequence similarities detected through hydrophobic cluster analysis. Proteins 2008;70:1588-94. [PMID: 17918727 DOI: 10.1002/prot.21803] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

1249

Wu S, Zhang Y. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. ACTA ACUST UNITED AC 2008;24:924-31. [PMID: 18296462 DOI: 10.1093/bioinformatics/btn069] [Citation(s) in RCA: 151] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

Pair-wise residue-residue contacts in proteins can be predicted from both threading templates and sequence-based machine learning. However, most structure modeling approaches only use the template-based contact predictions in guiding the simulations; this is partly because the sequence-based contact predictions are usually considered to be less accurate than that by threading. With the rapid progress in sequence databases and machine-learning techniques, it is necessary to have a detailed and comprehensive assessment of the contact-prediction methods in different template conditions.

RESULTS

We develop two methods for protein-contact predictions: SVM-SEQ is a sequence-based machine learning approach which trains a variety of sequence-derived features on contact maps; SVM-LOMETS collects consensus contact predictions from multiple threading templates. We test both methods on the same set of 554 proteins which are categorized into 'Easy', 'Medium', 'Hard' and 'Very Hard' targets based on the evolutionary and structural distance between templates and targets. For the Easy and Medium targets, SVM-LOMETS obviously outperforms SVM-SEQ; but for the Hard and Very Hard targets, the accuracy of the SVM-SEQ predictions is higher than that of SVM-LOMETS by 12-25%. If we combine the SVM-SEQ and SVM-LOMETS predictions together, the total number of correctly predicted contacts in the Hard proteins will increase by more than 60% (or 70% for the long-range contact with a sequence separation > or =24), compared with SVM-LOMETS alone. The advantage of SVM-SEQ is also shown in the CASP7 free modeling targets where the SVM-SEQ is around four times more accurate than SVM-LOMETS in the long-range contact prediction. These data demonstrate that the state-of-the-art sequence-based contact prediction has reached a level which may be helpful in assisting tertiary structure modeling for the targets which do not have close structure templates. The maximum yield should be obtained by the combination of both sequence- and template-based predictions.

Collapse

1250

Tan CW, Jones DT. Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction. BMC Bioinformatics 2008;9:94. [PMID: 18267018 PMCID: PMC2267779 DOI: 10.1186/1471-2105-9-94] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2007] [Accepted: 02/11/2008] [Indexed: 11/13/2022] Open