Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kim C, Lee B. Accuracy of structure-based sequence alignment of automatic methods. BMC Bioinformatics 2007;8:355. [PMID: 17883866 PMCID: PMC2039753 DOI: 10.1186/1471-2105-8-355] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 09/20/2007] [Indexed: 11/10/2022] Open

For:	Kim C, Lee B. Accuracy of structure-based sequence alignment of automatic methods. BMC Bioinformatics 2007;8:355. [PMID: 17883866 PMCID: PMC2039753 DOI: 10.1186/1471-2105-8-355] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 09/20/2007] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Neuwald AF, Lanczycki CJ, Hodges TK, Marchler-Bauer A. Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2020:5850901. [PMID: 32500917 PMCID: PMC7297217 DOI: 10.1093/database/baaa042] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 04/01/2020] [Accepted: 05/06/2020] [Indexed: 11/12/2022]

Abstract

For optimal performance, machine learning methods for protein sequence/structural analysis typically require as input a large multiple sequence alignment (MSA), which is often created using query-based iterative programs, such as PSI-BLAST or JackHMMER. However, because these programs align database sequences using a query sequence as a template, they may fail to detect or may tend to misalign sequences distantly related to the query. More generally, automated MSA programs often fail to align sequences correctly due to the unpredictable nature of protein evolution. Addressing this problem typically requires manual curation in the light of structural data. However, curated MSAs tend to contain too few sequences to serve as input for statistically based methods. We address these shortcomings by making publicly available a set of 252 curated hierarchical MSAs (hiMSAs), containing a total of 26 212 066 sequences, along with programs for generating from these extremely large MSAs. Each hiMSA consists of a set of hierarchically arranged MSAs representing individual subgroups within a superfamily along with template MSAs specifying how to align each subgroup MSA against MSAs higher up the hierarchy. Central to this approach is the MAPGAPS search program, which uses a hiMSA as a query to align (potentially vast numbers of) matching database sequences with accuracy comparable to that of the curated hiMSA. We illustrate this process for the exonuclease–endonuclease–phosphatase superfamily and for pleckstrin homology domains. A set of extremely large MSAs generated from the hiMSAs in this way is available as input for deep learning, big data analyses. MAPGAPS, auxiliary programs CDD2MGS, AddPhylum, PurgeMSA and ConvertMSA and links to National Center for Biotechnology Information data files are available at https://www.igs.umaryland.edu/labs/neuwald/software/mapgaps/.

Collapse

Aadland K, Kolaczkowski B. Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy. Genome Biol Evol 2021;12:1549-1565. [PMID: 32785673 PMCID: PMC7523730 DOI: 10.1093/gbe/evaa164] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2020] [Indexed: 12/31/2022] Open

Abstract

Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.

Collapse

Neuwald AF, Kolaczkowski BD, Altschul SF. eCOMPASS: evaluative comparison of multiple protein alignments by statistical score. Bioinformatics 2021;37:3456-3463. [PMID: 33983436 PMCID: PMC8545322 DOI: 10.1093/bioinformatics/btab374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 03/31/2021] [Accepted: 05/12/2021] [Indexed: 11/21/2022] Open

Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020;35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open

Holm L. DALI and the persistence of protein shape. Protein Sci 2020;29:128-140. [PMID: 31606894 PMCID: PMC6933842 DOI: 10.1002/pro.3749] [Citation(s) in RCA: 467] [Impact Index Per Article: 116.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/08/2019] [Accepted: 10/09/2019] [Indexed: 12/30/2022]

High-Throughput Reconstruction of Ancestral Protein Sequence, Structure, and Molecular Function. Methods Mol Biol 2019;1851:135-170. [PMID: 30298396 DOI: 10.1007/978-1-4939-8736-8_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract

Ancestral protein sequence reconstruction is a powerful technique for explicitly testing hypotheses about the evolution of molecular function, allowing researchers to meticulously dissect how historical changes in protein sequence impacted functional repertoire by altering the protein's 3D structure. These techniques have provided concrete, experimentally validated insights into ancient evolutionary processes and help illuminate the complex relationship between protein sequence, structure, and function. Inferring the protein family phylogenies on which ancestral sequence reconstruction depends and reconstructing the sequences, themselves, are amenable to high-throughput computational analysis. However, determining the structures of ancestral-reconstructed proteins and characterizing their functions typically rely on time-consuming and expensive laboratory analyses, limiting most current studies to examining a relatively small number of specific hypotheses. For this reason, we have little detailed, unbiased information about how molecular function evolves across large protein family phylogenies. Here we describe a generalized protocol that integrates ancestral sequence reconstruction with structural homology modeling and structure-based molecular affinity prediction to characterize historical changes in protein function across families with thousands of individual sequences. We highlight key steps in the analysis protocol requiring particularly careful attention to avoid introducing potential errors as well as steps for which computationally efficient subroutines can be substituted for more intensive approaches, allowing researchers to scale the analysis up or down, depending on available resources and requirements for reproducibility and scientific rigor. In our view, this approach provides a compelling compliment to more laboratory-intensive procedures, generating important contextual information that can help guide detailed experiments.

Collapse

Ghosh P, Bhattacharyya T, Mathew OK, Sowdhamini R. PASS2 version 6: a database of structure-based sequence alignments of protein domain superfamilies in accordance with SCOPe. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019;2019:5367127. [PMID: 30820573 PMCID: PMC6395796 DOI: 10.1093/database/baz028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 02/03/2019] [Accepted: 02/06/2019] [Indexed: 11/15/2022]

Ebert MC, Pelletier JN. Computational tools for enzyme improvement: why everyone can - and should - use them. Curr Opin Chem Biol 2017;37:89-96. [PMID: 28231515 DOI: 10.1016/j.cbpa.2017.01.021] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 01/25/2017] [Accepted: 01/30/2017] [Indexed: 12/12/2022]

Mezulis S, Sternberg MJE, Kelley LA. PhyreStorm: A Web Server for Fast Structural Searches Against the PDB. J Mol Biol 2015;428:702-708. [PMID: 26517951 PMCID: PMC7610957 DOI: 10.1016/j.jmb.2015.10.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 10/13/2015] [Accepted: 10/18/2015] [Indexed: 11/10/2022]

Zhao C, Sacan A. UniAlign: protein structure alignment meets evolution. Bioinformatics 2015;31:3139-46. [PMID: 26059715 DOI: 10.1093/bioinformatics/btv354] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 06/02/2015] [Indexed: 11/15/2022] Open

Minami S, Sawada K, Chikenji G. How a spatial arrangement of secondary structure elements is dispersed in the universe of protein folds. PLoS One 2014;9:e107959. [PMID: 25243952 PMCID: PMC4171485 DOI: 10.1371/journal.pone.0107959] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open

Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014;94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Madej T, Lanczycki CJ, Zhang D, Thiessen PA, Geer RC, Marchler-Bauer A, Bryant SH. MMDB and VAST+: tracking structural similarities between macromolecular complexes. Nucleic Acids Res 2013;42:D297-303. [PMID: 24319143 PMCID: PMC3965051 DOI: 10.1093/nar/gkt1208] [Citation(s) in RCA: 213] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Protein structure alignment beyond spatial proximity. Sci Rep 2013;3:1448. [PMID: 23486213 PMCID: PMC3596798 DOI: 10.1038/srep01448] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 02/25/2013] [Indexed: 11/08/2022] Open

Topham CM, Rouquier M, Tarrat N, André I. Adaptive Smith-Waterman residue match seeding for protein structural alignment. Proteins 2013;81:1823-39. [DOI: 10.1002/prot.24327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2013] [Revised: 04/22/2013] [Accepted: 05/15/2013] [Indexed: 12/30/2022]

Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs. Appl Environ Microbiol 2013;79:3380-91. [PMID: 23524681 DOI: 10.1128/aem.03803-12] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 2013;14:24. [PMID: 23331634 PMCID: PMC3637537 DOI: 10.1186/1471-2105-14-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/08/2013] [Indexed: 11/10/2022] Open

Abstract

Background

Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed.

Results

We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, C_α only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here.

Conclusions

MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.

Collapse

Daniels NM, Nadimpalli S, Cowen LJ. Formatt: Correcting protein multiple structural alignments by incorporating sequence alignment. BMC Bioinformatics 2012;13:259. [PMID: 23039758 PMCID: PMC3585936 DOI: 10.1186/1471-2105-13-259] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 10/01/2012] [Indexed: 11/10/2022] Open

Dickson RJ, Gloor GB. Protein sequence alignment analysis by local covariation: coevolution statistics detect benchmark alignment errors. PLoS One 2012;7:e37645. [PMID: 22715369 PMCID: PMC3371027 DOI: 10.1371/journal.pone.0037645] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Accepted: 04/26/2012] [Indexed: 11/19/2022] Open

Wong DY, Sept D. The interaction of cofilin with the actin filament. J Mol Biol 2011;413:97-105. [PMID: 21875597 DOI: 10.1016/j.jmb.2011.08.039] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Revised: 07/18/2011] [Accepted: 08/15/2011] [Indexed: 11/27/2022]

Daniluk P, Lesyng B. A novel method to compare protein structures using local descriptors. BMC Bioinformatics 2011;12:344. [PMID: 21849047 PMCID: PMC3179968 DOI: 10.1186/1471-2105-12-344] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 08/17/2011] [Indexed: 11/15/2022] Open

Kalaimathy S, Sowdhamini R, Kanagarajadurai K. Critical assessment of structure-based sequence alignment methods at distant relationships. Brief Bioinform 2011;12:163-75. [PMID: 21422071 DOI: 10.1093/bib/bbq025] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Dickson RJ, Wahl LM, Fernandes AD, Gloor GB. Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS One 2010;5:e11082. [PMID: 20596526 PMCID: PMC2893159 DOI: 10.1371/journal.pone.0011082] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2010] [Accepted: 05/17/2010] [Indexed: 11/23/2022] Open

Abstract

Background

There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses.

Methodology/Principal Findings

We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature.

Conclusions/Significance

Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.

Collapse

Detecting internally symmetric protein structures. BMC Bioinformatics 2010;11:303. [PMID: 20525292 PMCID: PMC2894822 DOI: 10.1186/1471-2105-11-303] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Accepted: 06/03/2010] [Indexed: 11/30/2022] Open

The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam. Comput Biol Chem 2010;34:210-4. [PMID: 20537955 DOI: 10.1016/j.compbiolchem.2010.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Revised: 04/09/2010] [Accepted: 04/25/2010] [Indexed: 11/21/2022]

Finding of residues crucial for supersecondary structure formation. Proc Natl Acad Sci U S A 2009;106:18996-9000. [PMID: 19855006 DOI: 10.1073/pnas.0909714106] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Micheletti C, Orland H. MISTRAL: a tool for energy-based multiple structural alignment of proteins. ACTA ACUST UNITED AC 2009;25:2663-9. [PMID: 19692555 DOI: 10.1093/bioinformatics/btp506] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Kim C, Tai CH, Lee B. Iterative refinement of structure-based sequence alignments by Seed Extension. BMC Bioinformatics 2009;10:210. [PMID: 19589133 PMCID: PMC2753854 DOI: 10.1186/1471-2105-10-210] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/09/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment.

RESULTS

RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs.

CONCLUSION

RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.

Collapse

Sippl MJ. Fold space unlimited. Curr Opin Struct Biol 2009;19:312-20. [DOI: 10.1016/j.sbi.2009.03.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2009] [Revised: 02/16/2009] [Accepted: 03/16/2009] [Indexed: 11/25/2022]

Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 2009;19:341-8. [PMID: 19481444 DOI: 10.1016/j.sbi.2009.04.003] [Citation(s) in RCA: 303] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 04/16/2009] [Indexed: 11/30/2022]

Tai CH, Vincent JJ, Kim C, Lee B. SE: an algorithm for deriving sequence alignment from a pair of superimposed structures. BMC Bioinformatics 2009;10 Suppl 1:S4. [PMID: 19208141 PMCID: PMC2648757 DOI: 10.1186/1471-2105-10-s1-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract

Background

Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments.

Results

SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP.

Conclusion

The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.

Collapse

Goonesekere NC. Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST. Adv Appl Bioinform Chem 2009;2:71-8. [PMID: 21918617 PMCID: PMC3169949 DOI: 10.2147/aabc.s5553] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Gupta K, Sehgal V, Levchenko A. A method for probabilistic mapping between protein structure and function taxonomies through cross training. BMC STRUCTURAL BIOLOGY 2008;8:40. [PMID: 18834528 PMCID: PMC2573881 DOI: 10.1186/1472-6807-8-40] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Accepted: 10/03/2008] [Indexed: 02/06/2023]