Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Feng ZK, Sippl MJ. Optimum superimposition of protein structures: ambiguities and implications. Fold Des 1996;1:123-32. [PMID: 9079372 DOI: 10.1016/s1359-0278(96)00021-1] [Citation(s) in RCA: 107] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Number

Cited by Other Article(s)

Zhao T, Gussak A, van der Hee B, Brugman S, van Baarlen P, Wells JM. Identification of plasminogen-binding sites in Streptococcus suis enolase that contribute to bacterial translocation across the blood-brain barrier. Front Cell Infect Microbiol 2024;14:1356628. [PMID: 38456079 PMCID: PMC10919400 DOI: 10.3389/fcimb.2024.1356628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/06/2024] [Indexed: 03/09/2024] Open

Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020;35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open

Collier JH, Allison L, Lesk AM, Garcia de la Banda M, Konagurthu AS. A new statistical framework to assess structural alignment quality using information compression. Bioinformatics 2015;30:i512-8. [PMID: 25161241 PMCID: PMC4147913 DOI: 10.1093/bioinformatics/btu460] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

Motivation: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field.

Results: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field.

Availability: http://lcb.infotech.monash.edu.au/I-value

Contact: arun.konagurthu@monash.edu

Supplementary information:Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html

Collapse

Rodriguez A, Schmidler SC. BAYESIAN PROTEIN STRUCTURE ALIGNMENT. Ann Appl Stat 2014;8:2068-2095. [PMID: 26925188 DOI: 10.1214/14-aoas780] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Slater AW, Castellanos JI, Sippl MJ, Melo F. Towards the development of standardized methods for comparison, ranking and evaluation of structure alignments. Bioinformatics 2012;29:47-53. [PMID: 23060612 DOI: 10.1093/bioinformatics/bts600] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

MOTIVATION

Pairwise alignment of protein structures is a fundamental task in structural bioinformatics. There are numerous computer programs in the public domain that produce alignments for a given pair of protein structures, but the results obtained by the various programs generally differ substantially. Hence, in the application of such programs the question arises which of the alignment programs are the most trustworthy in the sense of overall performance, and which programs provide the best result for a given pair of proteins. The major problem in comparing, evaluating and judging alignment results is that there is no clear notion of the optimality of an alignment. As a consequence, the numeric criteria and scores reported by the individual structure alignment programs are largely incomparable.

RESULTS

Here we report on the development and application of a new approach for the evaluation of structure alignment results. The method uses the translation vector and rotation matrix to generate the superposition of two structures but discards the alignment reported by the individual programs. The optimal alignment is then generated in standardized form based on a suitably implemented dynamic programming algorithm where the length of the alignment is the single most informative parameter. We demonstrate that some of the most popular programs in protein structure research differ considerably in their overall performance. In particular, each of the programs investigated here produced in at least in one case the best and the worst alignment compared with all others. Hence, at the current state of development of structure comparison techniques, it is advisable to use several programs in parallel and to choose the optimal alignment in the way reported here.

AVAILABILITY AND IMPLEMENTATION

The computer software that implement the method described here is freely available at http://melolab.org/stovca.

Collapse

Sippl MJ, Wiederstein M. Detection of spatial correlations in protein structures and molecular complexes. Structure 2012;20:718-28. [PMID: 22483118 PMCID: PMC3320710 DOI: 10.1016/j.str.2012.01.024] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Revised: 01/09/2012] [Accepted: 01/31/2012] [Indexed: 10/28/2022]

Sehnal D, Vařeková RS, Huber HJ, Geidl S, Ionescu CM, Wimmerová M, Koča J. SiteBinder: an improved approach for comparing multiple protein structural motifs. J Chem Inf Model 2012;52:343-59. [PMID: 22296449 DOI: 10.1021/ci200444d] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Abstract

There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.

Collapse

Poleksic A. Optimizing a widely used protein structure alignment measure in expected polynomial time. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:1716-1720. [PMID: 21904019 DOI: 10.1109/tcbb.2011.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Poleksic A. On complexity of protein structure alignment problem under distance constraint. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;9:511-516. [PMID: 22025757 DOI: 10.1109/tcbb.2011.133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Joseph AP, Srinivasan N, de Brevern AG. Improvement of protein structure comparison using a structural alphabet. Biochimie 2011;93:1434-45. [PMID: 21569819 DOI: 10.1016/j.biochi.2011.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 04/12/2011] [Indexed: 12/29/2022]

Shen YF, Li B, Liu ZP. Protein structure alignment based on internal coordinates. Interdiscip Sci 2010;2:308-19. [DOI: 10.1007/s12539-010-0019-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Revised: 01/05/2010] [Accepted: 01/06/2010] [Indexed: 10/18/2022]

Shibuya T, Jansson J, Sadakane K. Linear-time protein 3-D structure searching with insertions and deletions. Algorithms Mol Biol 2010;5:7. [PMID: 20047663 PMCID: PMC2830924 DOI: 10.1186/1748-7188-5-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2009] [Accepted: 01/04/2010] [Indexed: 11/10/2022] Open

Margraf T, Schenk G, Torda AE. The SALAMI protein structure search server. Nucleic Acids Res 2009;37:W480-4. [PMID: 19465380 PMCID: PMC2703935 DOI: 10.1093/nar/gkp431] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Tai CH, Vincent JJ, Kim C, Lee B. SE: an algorithm for deriving sequence alignment from a pair of superimposed structures. BMC Bioinformatics 2009;10 Suppl 1:S4. [PMID: 19208141 PMCID: PMC2648757 DOI: 10.1186/1471-2105-10-s1-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract

Background

Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments.

Results

SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP.

Conclusion

The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.

Collapse

On distance and similarity in fold space. Bioinformatics 2008;24:872-3. [DOI: 10.1093/bioinformatics/btn040] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Sippl MJ, Suhrer SJ, Gruber M, Wiederstein M. A discrete view on fold space. Bioinformatics 2008;24:870-1. [PMID: 18218654 DOI: 10.1093/bioinformatics/btn020] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Sippl MJ, Wiederstein M. A note on difficult structure alignment problems. Bioinformatics 2008;24:426-7. [PMID: 18174182 DOI: 10.1093/bioinformatics/btm622] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Zemla AT, Zhou CLE. Structural Re-Alignment in an Immunogenic Surface Region of Ricin a Chain. Bioinform Biol Insights 2008;2:5-13. [PMID: 19812763 PMCID: PMC2735970 DOI: 10.4137/bbi.s437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Kim C, Lee B. Accuracy of structure-based sequence alignment of automatic methods. BMC Bioinformatics 2007;8:355. [PMID: 17883866 PMCID: PMC2039753 DOI: 10.1186/1471-2105-8-355] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 09/20/2007] [Indexed: 11/10/2022] Open

Abstract

Background

Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy.

Results

In this study, we evaluate CE, DaliLite, FAST, LOCK2, MATRAS, SHEBA and VAST in terms of the accuracy of the sequence alignments they produce, using sequence alignments from NCBI's human-curated Conserved Domain Database (CDD) as the standard of truth. We find that 4 to 9% of the residues on average are either not aligned or aligned with more than 8 residues of shift error and that an additional 6 to 14% of residues on average are misaligned by 1–8 residues, depending on the program and the data set used. The fraction of correctly aligned residues generally decreases as the sequence similarity decreases or as the RMSD between the C_αpositions of the two structures increases. It varies significantly across CDD superfamilies whether shift error is allowed or not. Also, alignments with different shift errors occur between proteins within the same CDD superfamily, leading to inconsistent alignments between superfamily members. In general, residue pairs that are more than 3.0 Å apart in the reference alignment are heavily (>= 25% on average) misaligned in the test alignments. In addition, each method shows a different pattern of relative weaknesses for different SCOP classes. CE gives relatively poor results for β-sheet-containing structures (all-β, α/β, and α+β classes), DaliLite for "others" class where all but the major four classes are combined, and LOCK2 and VAST for all-β and "others" classes.

Conclusion

When the sequence similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.

Collapse

Tidow H, Andreeva A, Rutherford TJ, Fersht AR. Solution structure of ASPP2 N-terminal domain (N-ASPP2) reveals a ubiquitin-like fold. J Mol Biol 2007;371:948-58. [PMID: 17594908 DOI: 10.1016/j.jmb.2007.05.024] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2007] [Revised: 05/07/2007] [Accepted: 05/07/2007] [Indexed: 11/30/2022]

Suhrer SJ, Gruber M, Sippl MJ. QSCOP-BLAST--fast retrieval of quantified structural information for protein sequences of unknown structure. Nucleic Acids Res 2007;35:W411-5. [PMID: 17478501 PMCID: PMC1933160 DOI: 10.1093/nar/gkm264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Wang Y, Makedon F, Ford J, Huang H. A bipartite graph matching framework for finding correspondences between structural elements in two proteins. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007;2004:2972-5. [PMID: 17270902 DOI: 10.1109/iembs.2004.1403843] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]

Suhrer SJ, Wiederstein M, Sippl MJ. QSCOP--SCOP quantified by structural relationships. Bioinformatics 2006;23:513-4. [PMID: 17127679 DOI: 10.1093/bioinformatics/btl594] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Andreeva A, Prlić A, Hubbard TJP, Murzin AG. SISYPHUS--structural alignments for proteins with non-trivial relationships. Nucleic Acids Res 2006;35:D253-9. [PMID: 17068077 PMCID: PMC1635320 DOI: 10.1093/nar/gkl746] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Shih ESC, Gan RCR, Hwang MJ. OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006;34:W95-8. [PMID: 16845117 PMCID: PMC1538888 DOI: 10.1093/nar/gkl264] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Wang G, Jin Y, Dunbrack RL. Assessment of fold recognition predictions in CASP6. Proteins 2006;61 Suppl 7:46-66. [PMID: 16187346 DOI: 10.1002/prot.20721] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Roach J, Sharma S, Kapustina M, Carter CW. Structure alignment via Delaunay tetrahedralization. Proteins 2006;60:66-81. [PMID: 15856481 DOI: 10.1002/prot.20479] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Qiu J, Elber R. SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 2006;62:881-91. [PMID: 16385554 DOI: 10.1002/prot.20854] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Binkowski TA, Joachimiak A, Liang J. Protein surface analysis for function annotation in high-throughput structural genomics pipeline. Protein Sci 2006;14:2972-81. [PMID: 16322579 PMCID: PMC2253251 DOI: 10.1110/ps.051759005] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Scheeff ED, Bourne PE. Structural evolution of the protein kinase-like superfamily. PLoS Comput Biol 2005;1:e49. [PMID: 16244704 PMCID: PMC1261164 DOI: 10.1371/journal.pcbi.0010049] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Accepted: 09/08/2005] [Indexed: 11/19/2022] Open

Abstract

The protein kinase family is large and important, but it is only one family in a larger superfamily of homologous kinases that phosphorylate a variety of substrates and play important roles in all three superkingdoms of life. We used a carefully constructed structural alignment of selected kinases as the basis for a study of the structural evolution of the protein kinase-like superfamily. The comparison of structures revealed a "universal core" domain consisting only of regions required for ATP binding and the phosphotransfer reaction. Remarkably, even within the universal core some kinase structures display notable changes, while still retaining essential activity. Hence, the protein kinase-like superfamily has undergone substantial structural and sequence revision over long evolutionary timescales. We constructed a phylogenetic tree for the superfamily using a novel approach that allowed for the combination of sequence and structure information into a unified quantitative analysis. When considered against the backdrop of species distribution and other metrics, our tree provides a compelling scenario for the development of the various kinase families from a shared common ancestor. We propose that most of the so-called "atypical kinases" are not intermittently derived from protein kinases, but rather diverged early in evolution to form a distinct phyletic group. Within the atypical kinases, the aminoglycoside and choline kinase families appear to share the closest relationship. These two families in turn appear to be the most closely related to the protein kinase family. In addition, our analysis suggests that the actin-fragmin kinase, an atypical protein kinase, is more closely related to the phosphoinositide-3 kinase family than to the protein kinase family. The two most divergent families, alpha-kinases and phosphatidylinositol phosphate kinases (PIPKs), appear to have distinct evolutionary histories. While the PIPKs probably have an evolutionary relationship with the rest of the kinase superfamily, the relationship appears to be very distant (and perhaps indirect). Conversely, the alpha-kinases appear to be an exception to the scenario of early divergence for the atypical kinases: they apparently arose relatively recently in eukaryotes. We present possible scenarios for the derivation of the alpha-kinases from an extant kinase fold.

Collapse

Carpentier M, Brouillet S, Pothier J. YAKUSA: A fast structural database scanning method. Proteins 2005;61:137-51. [PMID: 16049912 DOI: 10.1002/prot.20517] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Beiko RG, Chan CX, Ragan MA. A word-oriented approach to alignment validation. Bioinformatics 2005;21:2230-9. [PMID: 15728118 DOI: 10.1093/bioinformatics/bti335] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Mestres J. Structure conservation in cytochromes P450. Proteins 2004;58:596-609. [DOI: 10.1002/prot.20354] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Shih ESC, Hwang MJ. Alternative alignments from comparison of protein structures. Proteins 2004;56:519-27. [PMID: 15229884 DOI: 10.1002/prot.20124] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Nozaki Y, Bellgard M. Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties. Bioinformatics 2004;21:1421-8. [PMID: 15591359 DOI: 10.1093/bioinformatics/bti198] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Shapiro J, Brutlag D. FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web. Nucleic Acids Res 2004;32:W536-41. [PMID: 15215444 PMCID: PMC441527 DOI: 10.1093/nar/gkh389] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Kolodny R, Linial N. Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci U S A 2004;101:12201-6. [PMID: 15304646 PMCID: PMC514457 DOI: 10.1073/pnas.0404383101] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2003] [Indexed: 11/18/2022] Open

Shapiro J, Brutlag D. FoldMiner: structural motif discovery using an improved superposition algorithm. Protein Sci 2004;13:278-94. [PMID: 14691242 PMCID: PMC2286532 DOI: 10.1110/ps.03239404] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Przybylski D, Rost B. Improving Fold Recognition Without Folds. J Mol Biol 2004;341:255-69. [PMID: 15312777 DOI: 10.1016/j.jmb.2004.05.041] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Revised: 05/18/2004] [Accepted: 05/18/2004] [Indexed: 11/21/2022]

Koike R, Kinoshita K, Kidera A. Probabilistic description of protein alignments for sequences and structures. Proteins 2004;56:157-66. [PMID: 15162495 DOI: 10.1002/prot.20067] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Constans P. On the functional significance of electron density protein structure alignments. Proteins 2004;55:646-55. [PMID: 15103628 DOI: 10.1002/prot.20059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Lotan I, Schwarzer F. Approximation of Protein Structure for Fast Similarity Measures. J Comput Biol 2004;11:299-317. [PMID: 15285894 DOI: 10.1089/1066527041410355] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Ochagavía ME, Wodak S. Progressive combinatorial algorithm for multiple structural alignments: Application to distantly related proteins. Proteins 2004;55:436-54. [PMID: 15048834 DOI: 10.1002/prot.10587] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Raghava GPS, Searle SMJ, Audley PC, Barber JD, Barton GJ. OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 2003;4:47. [PMID: 14552658 PMCID: PMC280650 DOI: 10.1186/1471-2105-4-47] [Citation(s) in RCA: 155] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2003] [Accepted: 10/10/2003] [Indexed: 11/10/2022] Open

Abstract

Background

The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged.

Results

The simple hierarchical multiple alignment algorithm, AMPS, performed as well as or better than more modern methods such as CLUSTALW once the PAM250 pair-score matrix was replaced by a BLOSUM series matrix. AMPS gave an accuracy in Structurally Conserved Regions (SCRs) of 89.9% over a set of 672 alignments. The T-COFFEE method on a data set of families with <8 sequences gave 91.4% accuracy, significantly better than CLUSTALW (88.9%) and all other methods considered here. The complete suite is available from .

Conclusions

The OXBench suite of reference alignments, evaluation software and results database provide a convenient method to assess progress in sequence alignment techniques. Evaluation measures that were dependent on comparison to a reference alignment were found to give good discrimination between methods. The STAMP S_cScore which is independent of a reference alignment also gave good discrimination. Application of OXBench in this paper shows that with the exception of T-COFFEE, the majority of the improvement in alignment accuracy seen since 1985 stems from improved pair-score matrices rather than algorithmic refinements. The maximum theoretical alignment accuracy obtained by pooling results over all methods was 94.5% with 52.5% accuracy for alignments in the 0–10 percentage identity range. This suggests that further improvements in accuracy will be possible in the future.

Collapse

Zemla A. LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31:3370-4. [PMID: 12824330 PMCID: PMC168977 DOI: 10.1093/nar/gkg571] [Citation(s) in RCA: 721] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Van Walle I, Lasters I, Wyns L. Consistency matrices: quantified structure alignments for sets of related proteins. Proteins 2003;51:1-9. [PMID: 12596259 DOI: 10.1002/prot.10293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Wallin S, Farwer J, Bastolla U. Testing similarity measures with continuous and discrete protein models. Proteins 2003;50:144-57. [PMID: 12471607 DOI: 10.1002/prot.10271] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Krebs WG, Tsai J, Alexandrov V, Junker J, Jansen R, Gerstein M. Tools and Databases to Analyze Protein Flexibility; Approaches to Mapping Implied Features onto Sequences. Methods Enzymol 2003;374:544-84. [PMID: 14696388 DOI: 10.1016/s0076-6879(03)74023-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Constans P. Linear scaling approaches to quantum macromolecular similarity: evaluating the similarity function. J Comput Chem 2002;23:1305-13. [PMID: 12214313 DOI: 10.1002/jcc.10140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Hill EE, Morea V, Chothia C. Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes. J Mol Biol 2002;322:205-33. [PMID: 12215425 DOI: 10.1016/s0022-2836(02)00653-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Abstract

Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.

Collapse