1
|
Kinch LN, Schaeffer RD, Kryshtafovych A, Grishin NV. Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins 2021; 89:1618-1632. [PMID: 34350630 DOI: 10.1002/prot.26202] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/21/2021] [Accepted: 07/11/2021] [Indexed: 12/14/2022]
Abstract
An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
2
|
Kinch LN, Kryshtafovych A, Monastyrskyy B, Grishin NV. CASP13 target classification into tertiary structure prediction categories. Proteins 2019; 87:1021-1036. [PMID: 31294862 DOI: 10.1002/prot.25775] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/24/2019] [Accepted: 07/06/2019] [Indexed: 12/30/2022]
Abstract
Protein target structures for the Critical Assessment of Structure Prediction round 13 (CASP13) were split into evaluation units (EUs) based on their structural domains, the domain organization of available templates, and the performance of servers on whole targets compared to split target domains. Eighty targets were split into 112 EUs. The EUs were classified into categories suitable for assessment of high accuracy modeling (or template-based modeling [TBM]) and topology (or free modeling [FM]) based on target difficulty. Assignment into assessment categories considered the following criteria: (a) the evolutionary relationship of target domains to existing fold space as defined by the Evolutionary Classification of Protein Domains (ECOD) database; (b) the clustering of target domains using eight objective sequence, structure, and performance measures; and (c) the placement of target domains in a scatter plot of target difficulty against server performance used in the previous CASP. Generally, target domains with good server predictions had close template homologs and were classified as TBM. Alternately, targets with poor server predictions represent a mixture of fast evolving homologs, structure analogs, and new folds, and were classified as FM or FM/TBM overlap.
Collapse
Affiliation(s)
- Lisa N Kinch
- Departments of Biophysics and Biochemistry, Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas
| | | | | | - Nick V Grishin
- Departments of Biophysics and Biochemistry, Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas
| |
Collapse
|
3
|
Addressing the Role of Conformational Diversity in Protein Structure Prediction. PLoS One 2016; 11:e0154923. [PMID: 27159429 PMCID: PMC4861349 DOI: 10.1371/journal.pone.0154923] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 04/21/2016] [Indexed: 11/19/2022] Open
Abstract
Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis.
Collapse
|
4
|
Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011; 79 Suppl 10:59-73. [PMID: 21997521 DOI: 10.1002/prot.23181] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 08/26/2011] [Accepted: 09/04/2011] [Indexed: 11/06/2022]
Abstract
We present an overview of the ninth round of Critical Assessment of Protein Structure Prediction (CASP9) "Template free modeling" category (FM). Prediction models were evaluated using a combination of established structural and sequence comparison measures and a novel automated method designed to mimic manual inspection by capturing both global and local structural features. These scores were compared to those assigned manually over a diverse subset of target domains. Scores were combined to compare overall performance of participating groups and to estimate rank significance. Moreover, we discuss a few examples of free modeling targets to highlight the progress and bottlenecks of current prediction methods. Notably, a server prediction model for a single target (T0581) improved significantly over the closest structure template (44% GDT increase). This accomplishment represents the "winner" of the CASP9 FM category. A number of human expert groups submitted slight variations of this model, highlighting a trend for human experts to act as "meta predictors" by correctly selecting among models produced by the top-performing automated servers. The details of evaluation are available at http://prodata.swmed.edu/CASP9/ .
Collapse
Affiliation(s)
- Lisa Kinch
- Howard Hughes Medical Institute, University of Texas, Southwestern Medical Center, Dallas, TX 75390-9050, USA. .
| | | | | | | | | | | |
Collapse
|
5
|
Tress ML, Ezkurdia I, Richardson JS. Target domain definition and classification in CASP8. Proteins 2010; 77 Suppl 9:10-7. [PMID: 19603487 DOI: 10.1002/prot.22497] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
In order to be successful CASP experiments require experimentally determined protein structures. These structures form the basis of the experiment. Structural genomics groups have provided the vast majority of these structures in recent editions of CASP. Before the structure prediction assessment can begin these target structures must be divided into structural domains for assessment purposes and each assessment unit must be assigned to one or more tertiary structure prediction categories. In CASP8 target domain boundaries were based on visual inspection of targets and their experimental data, and on superpositions of the target structures with related template structures. As in CASP7 target domains were broadly classified into two different categories: "template-based modeling" and "free modeling." Assessment categories were determined by structural similarity between the target domain and the nearest structural templates in the PDB and by whether or not related structural templates were used to build the models. The vast majority of the 164 assessment units in CASP8 were classified as template-based modeling. Just 10 target domains were defined as free modeling. In addition three targets were assessed in both the free modeling and template based categories and a subset of 50 template-based models was evaluated as part of the "high accuracy" subset. The targets submitted for CASP8 confirmed a trend that has been apparent since CASP5: targets submitted to the CASP experiments are becoming easier to predict.
Collapse
Affiliation(s)
- Michael L Tress
- Structural and Computational Biology Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
| | | | | |
Collapse
|
6
|
Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2010; 77 Suppl 9:50-65. [PMID: 19774550 DOI: 10.1002/prot.22591] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The biennial CASP experiment is a crucial way to evaluate, in an unbiased way, the progress in predicting novel 3D protein structures. In this article, we assess the quality of prediction of template free models, that is, ab initio prediction of 3D structures of proteins based solely on the amino acid sequences, that is, proteins that did not have significant sequence identity to any protein in the Protein Data Bank. There were 13 targets in this category and 102 groups submitted predictions. Analysis was based on the GDT_TS analysis, which has been used in previous CASP experiments, together with a newly developed method, the OK_Rank, as well as by visual inspection. There is no doubt that in recent years many obstacles have been removed on the long and elusive way to deciphering the protein-folding problem. Out of the 13 targets, six were predicted well by a number of groups. On the other hand, it must be stressed that for four targets, none of the models were judged to be satisfactory. Thus, for template free model prediction, as evaluated in this CASP, successes have been achieved for most targets; however, a great deal of research is still required, both in improving the existing methods and in development of new approaches.
Collapse
Affiliation(s)
- Moshe Ben-David
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | | | |
Collapse
|
7
|
Bandyopadhyay D, Huan J, Prins J, Snoeyink J, Wang W, Tropsha A. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: II. Case studies and applications. J Comput Aided Mol Des 2009; 23:785-97. [PMID: 19548090 DOI: 10.1007/s10822-009-9277-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 04/22/2009] [Indexed: 11/25/2022]
Abstract
This paper describes several case studies concerning protein function inference from its structure using our novel approach described in the accompanying paper. This approach employs family-specific motifs, i.e. three-dimensional amino acid packing patterns that are statistically prevalent within a protein family. For our case studies we have selected families from the SCOP and EC classifications and analyzed the discriminating power of the motifs in depth. We have devised several benchmarks to compare motifs mined from unweighted topological graph representations of protein structures with those from distance-labeled (weighted) representations, demonstrating the superiority of the latter for function inference in most families. We have tested the robustness of our motif library by inferring the function of new members added to SCOP families, and discriminating between several families that are structurally similar but functionally divergent. Furthermore we have applied our method to predict function for several proteins characterized in structural genomics projects, including orphan structures, and we discuss several selected predictions in depth. Some of our predictions have been corroborated by other computational methods, and some have been validated by independent experimental studies, validating our approach for protein function inference from structure.
Collapse
|
8
|
Pawlowski M, Gajda MJ, Matlak R, Bujnicki JM. MetaMQAP: a meta-server for the quality assessment of protein models. BMC Bioinformatics 2008; 9:403. [PMID: 18823532 PMCID: PMC2573893 DOI: 10.1186/1471-2105-9-403] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Accepted: 09/29/2008] [Indexed: 12/31/2022] Open
Abstract
Background Computational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure. The utility of models depends on the degree of these deviations. A number of predictive methods have been developed to discriminate between the globally incorrect and approximately correct models. However, only a few methods predict correctness of different parts of computational models. Several Model Quality Assessment Programs (MQAPs) have been developed to detect local inaccuracies in unrefined crystallographic models, but it is not known if they are useful for computational models, which usually exhibit different and much more severe errors. Results The ability to identify local errors in models was tested for eight MQAPs: VERIFY3D, PROSA, BALA, ANOLEA, PROVE, TUNE, REFINER, PROQRES on 8251 models from the CASP-5 and CASP-6 experiments, by calculating the Spearman's rank correlation coefficients between per-residue scores of these methods and local deviations between C-alpha atoms in the models vs. experimental structures. As a reference, we calculated the value of correlation between the local deviations and trivial features that can be calculated for each residue directly from the models, i.e. solvent accessibility, depth in the structure, and the number of local and non-local neighbours. We found that absolute correlations of scores returned by the MQAPs and local deviations were poor for all methods. In addition, scores of PROQRES and several other MQAPs strongly correlate with 'trivial' features. Therefore, we developed MetaMQAP, a meta-predictor based on a multivariate regression model, which uses scores of the above-mentioned methods, but in which trivial parameters are controlled. MetaMQAP predicts the absolute deviation (in Ångströms) of individual C-alpha atoms between the model and the unknown true structure as well as global deviations (expressed as root mean square deviation and GDT_TS scores). Local model accuracy predicted by MetaMQAP shows an impressive correlation coefficient of 0.7 with true deviations from native structures, a significant improvement over all constituent primary MQAP scores. The global MetaMQAP score is correlated with model GDT_TS on the level of 0.89. Conclusion Finally, we compared our method with the MQAPs that scored best in the 7th edition of CASP, using CASP7 server models (not included in the MetaMQAP training set) as the test data. In our benchmark, MetaMQAP is outperformed only by PCONS6 and method QA_556 – methods that require comparison of multiple alternative models and score each of them depending on its similarity to other models. MetaMQAP is however the best among methods capable of evaluating just single models. We implemented the MetaMQAP as a web server available for free use by all academic users at the URL
Collapse
Affiliation(s)
- Marcin Pawlowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, PL-02-109 Warsaw, Poland.
| | | | | | | |
Collapse
|
9
|
Shi S, Zhong Y, Majumdar I, Sri Krishna S, Grishin NV. Searching for three-dimensional secondary structural patterns in proteins with ProSMoS. Bioinformatics 2007; 23:1331-8. [PMID: 17384423 DOI: 10.1093/bioinformatics/btm121] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Many evolutionarily distant, but functionally meaningful links between proteins come to light through comparison of spatial structures. Most programs that assess structural similarity compare two proteins to each other and find regions in common between them. Structural classification experts look for a particular structural motif instead. Programs base similarity scores on superposition or closeness of either Cartesian coordinates or inter-residue contacts. Experts pay more attention to the general orientation of the main chain and mutual spatial arrangement of secondary structural elements. There is a need for a computational tool to find proteins with the same secondary structures, topological connections and spatial architecture, regardless of subtle differences in 3D coordinates. RESULTS We developed ProSMoS--a Protein Structure Motif Search program that emulates an expert. Starting from a spatial structure, the program uses previously delineated secondary structural elements. A meta-matrix of interactions between the elements (parallel or antiparallel) minding handedness of connections (left or right) and other features (e.g. element lengths and hydrogen bonds) is constructed prior to or during the searches. All structures are reduced to such meta-matrices that contain just enough information to define a protein fold, but this definition remains very general and deviations in 3D coordinates are tolerated. User supplies a meta-matrix for a structural motif of interest, and ProSMoS finds all proteins in the protein data bank (PDB) that match the meta-matrix. ProSMoS performance is compared to other programs and is illustrated on a beta-Grasp motif. A brief analysis of all beta-Grasp-containing proteins is presented. Program availability: ProSMoS is freely available for non-commercial use from ftp://iole.swmed.edu/pub/ProSMoS.
Collapse
Affiliation(s)
- Shuoyong Shi
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | | | | | | | |
Collapse
|
10
|
Bandyopadhyay D, Huan J, Liu J, Prins J, Snoeyink J, Wang W, Tropsha A. Structure-based function inference using protein family-specific fingerprints. Protein Sci 2006; 15:1537-43. [PMID: 16731985 PMCID: PMC2265098 DOI: 10.1110/ps.062189906] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.
Collapse
Affiliation(s)
- Deepak Bandyopadhyay
- Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
| | | | | | | | | | | | | |
Collapse
|
11
|
Schwarzenbacher R, McMullan D, Krishna SS, Xu Q, Miller MD, Canaves JM, Elsliger MA, Floyd R, Grzechnik SK, Jaroszewski L, Klock HE, Koesema E, Kovarik JS, Kreusch A, Kuhn P, McPhillips TM, Morse AT, Quijano K, Spraggon G, Stevens RC, van den Bedem H, Wolf G, Hodgson KO, Wooley J, Deacon AM, Godzik A, Lesley SA, Wilson IA. Crystal structure of a glycerate kinase (TM1585) from Thermotoga maritima at 2.70 Å resolution reveals a new fold. Proteins 2006; 65:243-8. [PMID: 16865707 DOI: 10.1002/prot.21058] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
12
|
Tress M, Tai CH, Wang G, Ezkurdia I, López G, Valencia A, Lee B, Dunbrack RL. Domain definition and target classification for CASP6. Proteins 2006; 61 Suppl 7:8-18. [PMID: 16187342 DOI: 10.1002/prot.20717] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Assessment of structure predictions in CASP6 was based on single domains isolated from experimentally determined structures, which were categorized into comparative modeling, fold recognition, and new fold targets. Domain definitions were defined upon visual examination of the structures with the aid of automated domain-parsing programs. Domain categorization was determined by comparison of the target structures with those in the Protein Data Bank at the time each target expired and a variety of sequence and structure-based methods to determine potential homologous relationships.
Collapse
Affiliation(s)
- Michael Tress
- Protein Design Group, CNB-CSIC, Calle Darwin, 28049 Cantoblanco, Spain
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Tress ML, Cozzetto D, Tramontano A, Valencia A. An analysis of the Sargasso Sea resource and the consequences for database composition. BMC Bioinformatics 2006; 7:213. [PMID: 16623953 PMCID: PMC1513258 DOI: 10.1186/1471-2105-7-213] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2005] [Accepted: 04/19/2006] [Indexed: 01/20/2023] Open
Abstract
Background The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method. These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource. Results The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments. Conclusion These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques.
Collapse
Affiliation(s)
- Michael L Tress
- Protein Design Group, CNB-CSIC, Calle Darwin, Cantoblanco 28049 Madrid, Spain
| | - Domenico Cozzetto
- Department of Biochemical Sciences, University "La Sapienza" Rome, Italy
| | - Anna Tramontano
- Department of Biochemical Sciences, University "La Sapienza" Rome, Italy
| | - Alfonso Valencia
- Protein Design Group, CNB-CSIC, Calle Darwin, Cantoblanco 28049 Madrid, Spain
| |
Collapse
|
14
|
Qiu J, Elber R. SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 2006; 62:881-91. [PMID: 16385554 DOI: 10.1002/prot.20854] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino acid sequence information, secondary structure and solvent accessibility information of a position are used to derive substitution scores and position-specific gap penalties. In a test set of CASP5 targets, SSALN outperforms sequence alignment methods such as a Smith-Waterman algorithm with BLOSUM50 and PSI_BLAST. SSALN also generates better alignments than PSI_BLAST in the CASP6 test set. LOOPP server prediction based on an SSALN alignment is ranked the best for target T0280_1 in CASP6. SSALN is also compared with several threading methods and sequence alignment methods on the ProSup benchmark. SSALN has the highest alignment accuracy among the methods compared. On the Fischer's benchmark, SSALN performs better than CLUSTALW and GenTHREADER, and generates more alignments with accuracy >50%, >60% or >70% than FUGUE, but fewer alignments with accuracy >80% than FUGUE. All the supplemental materials can be found at http://www.cs.cornell.edu/ approximately jianq/research.htm.
Collapse
Affiliation(s)
- Jian Qiu
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
15
|
Abstract
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the position-specific scoring matrix (PSSM) generated by PSI-BLAST. We have presented a method, PPRODO (Prediction of PROtein DOmain boundaries) that predicts domain boundaries of proteins from sequence information by a neural network. The network is trained and tested using the values obtained from the PSSM generated by PSI-BLAST. A 10-fold cross-validation technique is performed to obtain the parameters of neural networks using a nonredundant set of 522 proteins containing 2 contiguous domains. PPRODO provides good and consistent results for the prediction of domain boundaries, with accuracy of about 66% using the +/-20 residue criterion. The PPRODO source code, as well as all data sets used in this work, are available from http://gene.kias.re.kr/ approximately jlee/pprodo/.
Collapse
Affiliation(s)
- Jaehyun Sim
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | | | | |
Collapse
|
16
|
Levefelt C, Lundh D. A fold-recognition approach to loop modeling. J Mol Model 2005; 12:125-39. [PMID: 16096805 DOI: 10.1007/s00894-005-0003-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2004] [Accepted: 06/22/2005] [Indexed: 11/26/2022]
Abstract
A novel approach is proposed for modeling loop regions in proteins. In this approach, a prerequisite sequence-structure alignment is examined for regions where the target sequence is not covered by the structural template. These regions, extended with a number of residues from adjacent stem regions, are submitted to fold recognition. The alignments produced by fold recognition are integrated into the initial alignment to create an alignment between the target sequence and several structures, where gaps in the main structural template are covered by local structural templates. This one-to-many (1:N) alignment is used to create a protein model by existing protein-modeling techniques. Several alternative approaches were evaluated using a set of ten proteins. One approach was selected and evaluated using another set of 31 proteins. The most promising result was for gap regions not located at the C-terminus or N-terminus of a protein, where the method produced an average RMSD 12% lower than the loop modeling provided with the program MODELLER. This improvement is shown to be statistically significant.
Collapse
Affiliation(s)
- Christer Levefelt
- School of Humanities and Informatics, University of Skövde, Box 408, 54128 Skövde, Sweden
| | | |
Collapse
|
17
|
Abstract
This report describes the assessment of the homology-based predictions submitted to the fifth edition of the Critical Assessment of Methods for Protein Structure Prediction (CASP5) experiment. We assessed the ability of the methods to predict the overall fold, the portions of the structure that differ substantially between the target protein and its closest structural homologue and the conformation of the side-chains. We also compared the results with those obtained in previous editions of the experiment and derived some general conclusions about the state of the art of comparative modeling methods and their usefulness for experimentalists.
Collapse
Affiliation(s)
- Anna Tramontano
- Department of Biochemical Sciences, A. Rossi Fanelli, University of Rome La Sapienza, Rome, Italy.
| | | |
Collapse
|
18
|
Kinch LN, Wrabl JO, Krishna SS, Majumdar I, Sadreyev RI, Qi Y, Pei J, Cheng H, Grishin NV. CASP5 assessment of fold recognition target predictions. Proteins 2003; 53 Suppl 6:395-409. [PMID: 14579328 DOI: 10.1002/prot.10557] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present an overview of the fifth round of Critical Assessment of Protein Structure Prediction (CASP5) fold recognition category. Prediction models were evaluated by using six different structural measures and four different alignment measures, and these scores were compared to those assigned manually over a diverse subset of target domains. Scores were combined to compare overall performance of participating groups and to estimate rank significance. The methods used by a few groups outperformed all other methods in terms of the evaluated criteria and could be considered state-of-the-art in structure prediction. We discuss a few examples of difficult fold recognition targets to highlight the progress of ab initio-type methods on difficult structure analogs and the difficulties of predicting multidomain targets and selecting prediction models. We also compared the results of manual groups to those of automatic servers evaluated in parallel by CAFASP, showing that the top performing automated server structure predictions approached those of the best manual predictors.
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Aloy P, Stark A, Hadley C, Russell RB. Predictions without templates: New folds, secondary structure, and contacts in CASP5. Proteins 2003; 53 Suppl 6:436-56. [PMID: 14579333 DOI: 10.1002/prot.10546] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present the assessment of CASP5 predictions in the new fold category. For coordinate predictions, we considered five targets with new folds and eight lying on the fold recognition borderline. We performed detailed visual and numerical comparisons between predicted and experimental structures to assess prediction accuracy. The two procedures largely agreed, but the visual inspection identified instances where metrics, such as GDT_TS, ranked what we considered incorrect predictions highly. We found the quality of the best predictions to be very good: for nearly every target at least one group predicted a structure close to the correct one. However, selection of the best of five models is still problematic. The group of David Baker once again proved to be best overall, with many individual highlights. However, high quality and consistency were also seen from others, suggesting that the community is moving toward general procedures to predict accurate structures for proteins showing no resemblance to anything seen before. Predictions for secondary structure showed at best limited progress since CASP4. The number of targets is probably too small to spot differences in performance between methods, suggesting that such predictions might be better evaluated with schemes involving more proteins. For contact predictions, accuracies are still low, although there were several instances of accurate and useful contacts predicted de novo, and new approaches hint at future progress.
Collapse
|