Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pei J, Grishin NV. MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 2006;34:4364-74. [PMID: 16936316 PMCID: PMC1636350 DOI: 10.1093/nar/gkl514] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

For:	Pei J, Grishin NV. MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res 2006;34:4364-74. [PMID: 16936316 PMCID: PMC1636350 DOI: 10.1093/nar/gkl514] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Santus L, Garriga E, Deorowicz S, Gudyś A, Notredame C. Towards the accurate alignment of over a million protein sequences: Current state of the art. Curr Opin Struct Biol 2023;80:102577. [PMID: 37012200 DOI: 10.1016/j.sbi.2023.102577] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/21/2023] [Accepted: 02/27/2023] [Indexed: 04/04/2023]

Nayeem MA, Bayzid MS, Rahman AH, Shahriyar R, Rahman MS. Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny Inference. IEEE TRANSACTIONS ON CYBERNETICS 2022;52:2775-2786. [PMID: 33044939 DOI: 10.1109/tcyb.2020.3020308] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Zhang Y, Zhang Q, Zhou J, Zou Q. A survey on the algorithm and development of multiple sequence alignment. Brief Bioinform 2022;23:6546258. [PMID: 35272347 DOI: 10.1093/bib/bbac069] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/30/2022] [Accepted: 02/09/2022] [Indexed: 12/21/2022] Open

A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm. Soft comput 2020. [DOI: 10.1007/s00500-020-04917-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Gil N, Fiser A. Identifying functionally informative evolutionary sequence profiles. Bioinformatics 2018;34:1278-1286. [PMID: 29211823 PMCID: PMC5905606 DOI: 10.1093/bioinformatics/btx779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023] Open

Rubio-Largo A, Vanneschi L, Castelli M, Vega-Rodriguez MA. A Characteristic-Based Framework for Multiple Sequence Aligners. IEEE TRANSACTIONS ON CYBERNETICS 2018;48:41-51. [PMID: 27831898 DOI: 10.1109/tcyb.2016.2621129] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Rubio-Largo Á, Vanneschi L, Castelli M, Vega-Rodríguez MA. Using biological knowledge for multiple sequence aligner decision making. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.08.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Chowdhury B, Garai G. A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics 2017;109:419-431. [PMID: 28669847 DOI: 10.1016/j.ygeno.2017.06.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 05/27/2017] [Accepted: 06/27/2017] [Indexed: 01/04/2023]

Arribas-Gil A, Matias C. A time warping approach to multiple sequence alignment. Stat Appl Genet Mol Biol 2017;16:133-144. [PMID: 28593899 DOI: 10.1515/sagmb-2016-0043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Rani RR, Ramyachitra D. Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm. Biosystems 2016;150:177-189. [PMID: 27784624 DOI: 10.1016/j.biosystems.2016.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 10/18/2016] [Accepted: 10/18/2016] [Indexed: 10/20/2022]

Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL. Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.12.034] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Ezawa K. Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map. BMC Bioinformatics 2016;17:133. [PMID: 26992851 PMCID: PMC4799563 DOI: 10.1186/s12859-016-0945-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 02/11/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Reconstruction of multiple sequence alignments (MSAs) is a crucial step in most homology-based sequence analyses, which constitute an integral part of computational biology. To improve the accuracy of this crucial step, it is essential to better characterize errors that state-of-the-art aligners typically make. For this purpose, we here introduce two tools: the complete-likelihood score and the position-shift map.

RESULTS

The logarithm of the total probability of a MSA under a stochastic model of sequence evolution along a time axis via substitutions, insertions and deletions (called the "complete-likelihood score" here) can serve as an ideal score of the MSA. A position-shift map, which maps the difference in each residue's position between two MSAs onto one of them, can clearly visualize where and how MSA errors occurred and help disentangle composite errors. To characterize MSA errors using these tools, we constructed three sets of simulated MSAs of selectively neutral mammalian DNA sequences, with small, moderate and large divergences, under a stochastic evolutionary model with an empirically common power-law insertion/deletion length distribution. Then, we reconstructed MSAs using MAFFT and Prank as representative state-of-the-art single-optimum-search aligners. About 40-99% of the hundreds of thousands of gapped segments were involved in alignment errors. In a substantial fraction, from about 1/4 to over 3/4, of erroneously reconstructed segments, reconstructed MSAs by each aligner showed complete-likelihood scores not lower than those of the true MSAs. Out of the remaining errors, a majority by an iterative option of MAFFT showed discrepancies between the aligner-specific score and the complete-likelihood score, and a majority by Prank seemed due to inadequate exploration of the MSA space. Analyses by position-shift maps indicated that true MSAs are in considerable neighborhoods of reconstructed MSAs in about 80-99% of the erroneous segments for small and moderate divergences, but in only a minority for large divergences.

CONCLUSIONS

The results of this study suggest that measures to further improve the accuracy of reconstructed MSAs would substantially differ depending on the types of aligners. They also re-emphasize the importance of obtaining a probability distribution of fairly likely MSAs, instead of just searching for a single optimum MSA.

Collapse

Kuznetsov IB, McDuffie M. PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids. BMC Res Notes 2015;8:187. [PMID: 25947299 PMCID: PMC4477417 DOI: 10.1186/s13104-015-1152-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 04/24/2015] [Indexed: 12/04/2022] Open

Abstract

Background

Alignment of amino acid sequences is the main sequence comparison method used in computational molecular biology. The selection of the amino acid substitution matrix best suitable for a given alignment problem is one of the most important decisions the user has to make. In a conventional amino acid substitution matrix all elements are fixed and their values cannot be easily adjusted. Moreover, most existing amino acid substitution matrices account for the average (dis)similarities between amino acid types and do not distinguish the contribution of a specific biochemical property to these (dis)similarities.

Findings

PR2ALIGN is a stand-alone software program and a web-server that provide the functionality for implementing flexible user-specified alignment scoring functions and aligning pairs of amino acid sequences based on the comparison of the profiles of biochemical properties of these sequences. Unlike the conventional sequence alignment methods that use 20x20 fixed amino acid substitution matrices, PR2ALIGN uses a set of weighted biochemical properties of amino acids to measure the distance between pairs of aligned residues and to find an optimal minimal distance global alignment. The user can provide any number of amino acid properties and specify a weight for each property. The higher the weight for a given property, the more this property affects the final alignment. We show that in many cases the approach implemented in PR2ALIGN produces better quality pair-wise alignments than the conventional matrix-based approach.

Conclusions

PR2ALIGN will be helpful for researchers who wish to align amino acid sequences by using flexible user-specified alignment scoring functions based on the biochemical properties of amino acids instead of the amino acid substitution matrix. To the best of the authors’ knowledge, there are no existing stand-alone software programs or web-servers analogous to PR2ALIGN. The software is freely available from http://pr2align.rit.albany.edu.

Electronic supplementary material

The online version of this article (doi:10.1186/s13104-015-1152-6) contains supplementary material, which is available to authorized users.

Collapse

PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information. Methods Mol Biol 2014;1079:263-71. [PMID: 24170408 DOI: 10.1007/978-1-62703-646-7_17] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Deng X, Cheng J. MSACompro: improving multiple protein sequence alignment by predicted structural features. Methods Mol Biol 2014;1079:273-283. [PMID: 24170409 DOI: 10.1007/978-1-62703-646-7_18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Sahraeian SME, Yoon BJ. PicXAA: a probabilistic scheme for finding the maximum expected accuracy alignment of multiple biological sequences. Methods Mol Biol 2014;1079:203-210. [PMID: 24170404 DOI: 10.1007/978-1-62703-646-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Heuristic alignment methods. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2013;1079:29-43. [PMID: 24170393 DOI: 10.1007/978-1-62703-646-7_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Sievers F, Dineen D, Wilm A, Higgins DG. Making automated multiple alignments of very large numbers of protein sequences. ACTA ACUST UNITED AC 2013;29:989-95. [PMID: 23428640 DOI: 10.1093/bioinformatics/btt093] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

A data parallel strategy for aligning multiple biological sequences on multi-core computers. Comput Biol Med 2013;43:350-61. [PMID: 23414778 DOI: 10.1016/j.compbiomed.2012.12.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 06/22/2012] [Accepted: 12/25/2012] [Indexed: 11/21/2022]

NARIMANI ZAHRA, BEIGY HAMID, ABOLHASSANI HASSAN. A NEW GENETIC ALGORITHM FOR MULTIPLE SEQUENCE ALIGNMENT. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2013. [DOI: 10.1142/s146902681250023x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Blackburne BP, Whelan S. Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol 2012;30:642-53. [PMID: 23144040 DOI: 10.1093/molbev/mss256] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Abstract

Multiple sequence alignment (MSA) is the heart of comparative sequence analysis. Recent studies demonstrate that MSA algorithms can produce different outcomes when analyzing genomes, including phylogenetic tree inference and the detection of adaptive evolution. These studies also suggest that the difference between MSA algorithms is of a similar order to the uncertainty within an algorithm and suggest integrating across this uncertainty. In this study, we examine further the problem of disagreements between MSA algorithms and how they affect downstream analyses. We also investigate whether integrating across alignment uncertainty affects downstream analyses. We address these questions by analyzing 200 chordate gene families, with properties reflecting those used in large-scale genomic analyses. We find that newly developed distance metrics reveal two significantly different classes of MSA methods (MSAMs). The similarity-based class includes progressive aligners and consistency aligners, representing many methodological innovations for sequence alignment, whereas the evolution-based class includes phylogenetically aware alignment and statistical alignment. We proceed to show that the class of an MSAM has a substantial impact on downstream analyses. For phylogenetic inference, tree estimates and their branch lengths appear highly dependent on the class of aligner used. The number of families, and the sites within those families, inferred to have undergone adaptive evolution depend on the class of aligner used. Similarity-based aligners tend to identify more adaptive evolution. We also develop and test methods for incorporating MSA uncertainty when detecting adaptive evolution but find that although accounting for MSA uncertainty does affect downstream analyses, it appears less important than the class of aligner chosen. Our results demonstrate the critical role that MSA methodology has on downstream analysis, highlighting that the class of aligner chosen in an analysis has a demonstrable effect on its outcome.

Collapse

Wu M, Chatterji S, Eisen JA. Accounting for alignment uncertainty in phylogenomics. PLoS One 2012;7:e30288. [PMID: 22272325 PMCID: PMC3260272 DOI: 10.1371/journal.pone.0030288] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2011] [Accepted: 12/14/2011] [Indexed: 01/12/2023] Open

Lin HN, Notredame C, Chang JM, Sung TY, Hsu WL. Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words. PLoS One 2011;6:e27872. [PMID: 22163274 PMCID: PMC3229492 DOI: 10.1371/journal.pone.0027872] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2011] [Accepted: 10/27/2011] [Indexed: 11/18/2022] Open

Abstract

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.

In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins.

We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.

Collapse

Deng X, Cheng J. MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts. BMC Bioinformatics 2011;12:472. [PMID: 22168237 PMCID: PMC3299741 DOI: 10.1186/1471-2105-12-472] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2011] [Accepted: 12/14/2011] [Indexed: 11/20/2022] Open

Abstract

Background

Multiple Sequence Alignment (MSA) is a basic tool for bioinformatics research and analysis. It has been used essentially in almost all bioinformatics tasks such as protein structure modeling, gene and protein function prediction, DNA motif recognition, and phylogenetic analysis. Therefore, improving the accuracy of multiple sequence alignment is important for advancing many bioinformatics fields.

Results

We designed and developed a new method, MSACompro, to synergistically incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. The method is different from the multiple sequence alignment methods (e.g. 3D-Coffee) that use the tertiary structure information of some sequences since the structural information of our method is fully predicted from sequences. To the best of our knowledge, applying predicted relative solvent accessibility and contact map to multiple sequence alignment is novel. The rigorous benchmarking of our method to the standard benchmarks (i.e. BAliBASE, SABmark and OXBENCH) clearly demonstrated that incorporating predicted protein structural information improves the multiple sequence alignment accuracy over the leading multiple protein sequence alignment tools without using this information, such as MSAProbs, ProbCons, Probalign, T-coffee, MAFFT and MUSCLE. And the performance of the method is comparable to the state-of-the-art method PROMALS of using structural features and additional homologous sequences by slightly lower scores.

Conclusion

MSACompro is an efficient and reliable multiple protein sequence alignment tool that can effectively incorporate predicted protein structural information into multiple sequence alignment. The software is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

Collapse

Protein sequence alignment with family-specific amino acid similarity matrices. BMC Res Notes 2011;4:296. [PMID: 21846354 PMCID: PMC3201029 DOI: 10.1186/1756-0500-4-296] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 08/16/2011] [Indexed: 11/16/2022] Open

Sahraeian SME, Yoon BJ. PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences. Nucleic Acids Res 2011;39:W8-12. [PMID: 21515632 PMCID: PMC3125727 DOI: 10.1093/nar/gkr244] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Hijikata A, Yura K, Noguti T, Go M. Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility. Proteins 2011;79:1868-77. [PMID: 21465562 PMCID: PMC3110861 DOI: 10.1002/prot.23011] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2010] [Revised: 01/23/2011] [Accepted: 01/28/2011] [Indexed: 12/27/2022]

Belal NA, Heath LS. A theoretical model for whole genome alignment. J Comput Biol 2011;18:705-28. [PMID: 21210739 DOI: 10.1089/cmb.2010.0101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Husain N, Obranic S, Koscinski L, Seetharaman J, Babic F, Bujnicki JM, Maravic-Vlahovicek G, Sivaraman J. Structural basis for the methylation of A1408 in 16S rRNA by a panaminoglycoside resistance methyltransferase NpmA from a clinical isolate and analysis of the NpmA interactions with the 30S ribosomal subunit. Nucleic Acids Res 2010;39:1903-18. [PMID: 21062819 PMCID: PMC3061052 DOI: 10.1093/nar/gkq1033] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Aniba MR, Poch O, Thompson JD. Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res 2010;38:7353-63. [PMID: 20639539 PMCID: PMC2995051 DOI: 10.1093/nar/gkq625] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Revised: 06/10/2010] [Accepted: 06/29/2010] [Indexed: 11/13/2022] Open

Sahraeian SME, Yoon BJ. PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 2010;38:4917-28. [PMID: 20413579 PMCID: PMC2926610 DOI: 10.1093/nar/gkq255] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/25/2010] [Accepted: 03/26/2010] [Indexed: 11/13/2022] Open

ElHefnawi MM, Zada S, El-Azab IA. Prediction of prognostic biomarkers for interferon-based therapy to hepatitis C virus patients: a meta-analysis of the NS5A protein in subtypes 1a, 1b, and 3a. Virol J 2010;7:130. [PMID: 20550652 PMCID: PMC3238222 DOI: 10.1186/1743-422x-7-130] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2010] [Accepted: 06/15/2010] [Indexed: 12/19/2022] Open

Dessimoz C, Gil M. Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biol 2010;11:R37. [PMID: 20370897 PMCID: PMC2884540 DOI: 10.1186/gb-2010-11-4-r37] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Revised: 01/26/2010] [Accepted: 04/06/2010] [Indexed: 01/08/2023] Open

Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J, Kim D, Kellogg E, DiMaio F, Lange O, Kinch L, Sheffler W, Kim BH, Das R, Grishin NV, Baker D. Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 2010;77 Suppl 9:89-99. [PMID: 19701941 DOI: 10.1002/prot.22540] [Citation(s) in RCA: 367] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Edgar RC. Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinformatics 2009;10:396. [PMID: 19954534 PMCID: PMC2791778 DOI: 10.1186/1471-2105-10-396] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2009] [Accepted: 12/02/2009] [Indexed: 12/04/2022] Open

Knight HM, Pickard BS, Maclean A, Malloy MP, Soares DC, McRae AF, Condie A, White A, Hawkins W, McGhee K, van Beck M, MacIntyre DJ, Starr JM, Deary IJ, Visscher PM, Porteous DJ, Cannon RE, St Clair D, Muir WJ, Blackwood DH. A cytogenetic abnormality and rare coding variants identify ABCA13 as a candidate gene in schizophrenia, bipolar disorder, and depression. Am J Hum Genet 2009;85:833-46. [PMID: 19944402 PMCID: PMC2790560 DOI: 10.1016/j.ajhg.2009.11.003] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Revised: 10/14/2009] [Accepted: 11/02/2009] [Indexed: 01/22/2023] Open

Affiliation(s)

Helen M. Knight Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Benjamin S. Pickard Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Alan Maclean Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Mary P. Malloy Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Dinesh C. Soares Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Allan F. McRae Queensland Institute of Medical Research, 300 Herston Road, Herston 4006, QLD, Australia
Alison Condie Wellcome Trust Clinical Research Facility, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Angela White Wellcome Trust Clinical Research Facility, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
William Hawkins Wellcome Trust Clinical Research Facility, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Kevin McGhee Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Margaret van Beck Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK
Donald J. MacIntyre Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK
John M. Starr Centre for Cognitive Ageing and Cognitive Epidemiology, Geriatric Medicine Unit, University of Edinburgh, Royal Victoria Hospital, Craigleith Road, Edinburgh EH4 2DN, UK
Ian J. Deary Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, University of Edinburgh, 7 George Square, Edinburgh EH8 9JZ, UK
Peter M. Visscher Queensland Institute of Medical Research, 300 Herston Road, Herston 4006, QLD, Australia
David J. Porteous Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Ronald E. Cannon Cancer Biology Group, National Center for Toxicogenomics, NIEHS, Research Triangle Park, NC 27709, USA
David St Clair Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, UK
Walter J. Muir Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK
Douglas H.R. Blackwood Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK Medical Genetics, Institute of Genetics and Molecular Medicine, University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK

Collapse

Kemena C, Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 2009;25:2455-65. [PMID: 19648142 PMCID: PMC2752613 DOI: 10.1093/bioinformatics/btp452] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Revised: 06/24/2009] [Accepted: 07/16/2009] [Indexed: 12/22/2022] Open

Taheri J, Zomaya AY. RBT-GA: a novel metaheuristic for solving the Multiple Sequence Alignment problem. BMC Genomics 2009;10 Suppl 1:S10. [PMID: 19594869 PMCID: PMC2709253 DOI: 10.1186/1471-2164-10-s1-s10] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Fast statistical alignment. PLoS Comput Biol 2009;5:e1000392. [PMID: 19478997 PMCID: PMC2684580 DOI: 10.1371/journal.pcbi.1000392] [Citation(s) in RCA: 230] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 04/20/2009] [Indexed: 02/01/2023] Open

Abstract

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

Biological sequence alignment is one of the fundamental problems in comparative genomics, yet it remains unsolved. Over sixty sequence alignment programs are listed on Wikipedia, and many new programs are published every year. However, many popular programs suffer from pathologies such as aligning unrelated sequences and producing discordant alignments in protein (amino acid) and codon (nucleotide) space, casting doubt on the accuracy of the inferred alignments. Inaccurate alignments can introduce large and unknown systematic biases into downstream analyses such as phylogenetic tree reconstruction and substitution rate estimation. We describe a new program for multiple sequence alignment which can align protein, RNA and DNA sequence and improves on the accuracy of existing approaches on benchmarks of protein and RNA structural alignments and simulated mammalian and fly genomic alignments. Our approach, which seeks to find the alignment which is closest to the truth under our statistical model, leaves unrelated sequences largely unaligned and produces concordant alignments in protein and codon space. It is fast enough for difficult problems such as aligning orthologous genomic regions or aligning hundreds or thousands of proteins. It furthermore has a companion GUI for visualizing the estimated alignment reliability.

Collapse

Mahalakshmi A, Sujatha K, Shenbagarathai R. Molecular modeling and characterization of the B. thuringiensis and B. thuringiensis LDC-9 cytolytic proteins. J Biomol Struct Dyn 2008;26:375-86. [PMID: 18808203 DOI: 10.1080/07391102.2008.10507252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Lu Y, Sze SH. Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. Nucleic Acids Res 2008;37:463-72. [PMID: 19056820 PMCID: PMC2632924 DOI: 10.1093/nar/gkn945] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Rausch T, Emde AK, Weese D, Döring A, Notredame C, Reinert K. Segment-based multiple sequence alignment. Bioinformatics 2008;24:i187-92. [PMID: 18689823 DOI: 10.1093/bioinformatics/btn281] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Larrea AA, Pedroso IM, Malhotra A, Myers RS. Identification of two conserved aspartic acid residues required for DNA digestion by a novel thermophilic Exonuclease VII in Thermotoga maritima. Nucleic Acids Res 2008;36:5992-6003. [PMID: 18812402 PMCID: PMC2566859 DOI: 10.1093/nar/gkn588] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Lu Y, Sze SH. Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences. J Comput Biol 2008;15:767-77. [PMID: 18662101 DOI: 10.1089/cmb.2007.0132] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Lee BC, Park K, Kim D. Analysis of the residue-residue coevolution network and the functionally important residues in proteins. Proteins 2008;72:863-72. [PMID: 18275083 DOI: 10.1002/prot.21972] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Ahola V, Aittokallio T, Vihinen M, Uusipaikka E. Model-based prediction of sequence alignment quality. Bioinformatics 2008;24:2165-71. [DOI: 10.1093/bioinformatics/btn414] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Pei J, Tang M, Grishin NV. PROMALS3D web server for accurate multiple protein sequence and structure alignments. Nucleic Acids Res 2008;36:W30-4. [PMID: 18503087 PMCID: PMC2447800 DOI: 10.1093/nar/gkn322] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Pei J. Multiple protein sequence alignment. Curr Opin Struct Biol 2008;18:382-6. [PMID: 18485694 DOI: 10.1016/j.sbi.2008.03.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2008] [Accepted: 03/18/2008] [Indexed: 11/16/2022]

Orlowski J, Bujnicki JM. Structural and evolutionary classification of Type II restriction enzymes based on theoretical and experimental analyses. Nucleic Acids Res 2008;36:3552-69. [PMID: 18456708 PMCID: PMC2441816 DOI: 10.1093/nar/gkn175] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Abstract

For a very long time, Type II restriction enzymes (REases) have been a paradigm of ORFans: proteins with no detectable similarity to each other and to any other protein in the database, despite common cellular and biochemical function. Crystallographic analyses published until January 2008 provided high-resolution structures for only 28 of 1637 Type II REase sequences available in the Restriction Enzyme database (REBASE). Among these structures, all but two possess catalytic domains with the common PD-(D/E)XK nuclease fold. Two structures are unrelated to the others: R.BfiI exhibits the phospholipase D (PLD) fold, while R.PabI has a new fold termed 'half-pipe'. Thus far, bioinformatic studies supported by site-directed mutagenesis have extended the number of tentatively assigned REase folds to five (now including also GIY-YIG and HNH folds identified earlier in homing endonucleases) and provided structural predictions for dozens of REase sequences without experimentally solved structures. Here, we present a comprehensive study of all Type II REase sequences available in REBASE together with their homologs detectable in the nonredundant and environmental samples databases at the NCBI. We present the summary and critical evaluation of structural assignments and predictions reported earlier, new classification of all REase sequences into families, domain architecture analysis and new predictions of three-dimensional folds. Among 289 experimentally characterized (not putative) Type II REases, whose apparently full-length sequences are available in REBASE, we assign 199 (69%) to contain the PD-(D/E)XK domain. The HNH domain is the second most common, with 24 (8%) members. When putative REases are taken into account, the fraction of PD-(D/E)XK and HNH folds changes to 48% and 30%, respectively. Fifty-six characterized (and 521 predicted) REases remain unassigned to any of the five REase folds identified so far, and may exhibit new architectures. These enzymes are proposed as the most interesting targets for structure determination by high-resolution experimental methods. Our analysis provides the first comprehensive map of sequence-structure relationships among Type II REases and will help to focus the efforts of structural and functional genomics of this large and biotechnologically important class of enzymes.

Collapse

Perrodou E, Chica C, Poch O, Gibson TJ, Thompson JD. A new protein linear motif benchmark for multiple sequence alignment software. BMC Bioinformatics 2008;9:213. [PMID: 18439277 PMCID: PMC2374782 DOI: 10.1186/1471-2105-9-213] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/25/2008] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.

RESULTS

We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.

CONCLUSION

We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.

Collapse