Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Schwartz AS, Pachter L. Multiple alignment by sequence annealing. Bioinformatics 2007;23:e24-9. [PMID: 17237099 DOI: 10.1093/bioinformatics/btl311] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Abdullahi KB. Kabirian-based optinalysis: A conceptually grounded framework for symmetry/asymmetry, similarity/dissimilarity and identity/unidentity estimations in mathematical structures and biological sequences. MethodsX 2023;11:102400. [PMID: 37928104 PMCID: PMC10622715 DOI: 10.1016/j.mex.2023.102400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/24/2023] [Indexed: 11/07/2023] Open

Abstract

This paper introduces "Kabirian-based optinalysis (KBO)," a pioneering framework that addresses the longstanding challenges in estimating symmetry/asymmetry, similarity/dissimilarity, and identity/unidentity within mathematical structures and biological sequences. The existing methods often lack a strong theoretical foundation, leading to inconsistencies and limitations. Kabirian-based optinalysis draws inspiration from isomorphism and automorphism, providing a theoretically grounded framework that unifies estimation methodologies. It introduces the concept of optiscale, autoreflective pairing, isoreflective pairing, and others ensuring invariance and robustness under various mathematical transformations and establishing functional bijectivity for isomorphic or automorphic structures. This not only overcomes previous limitations but also offers precise and interpretable estimations. Additionally, the framework introduces "geometrical pairwise analysis" to improve sensitivity to position-specific and character-specific variations in biological sequences. This novel approach enhances the accuracy of sequence similarity assessments, surpassing the constraints of conventional methods. The novelty of this work extends beyond mathematics and biology, impacting diverse fields such as computer science, data analysis, pattern recognition, and evolutionary biology. Kabirian-based optinalysis presents a holistic and theoretically grounded solution that has the potential to revolutionize the analysis of complex structures and sequences, opening new horizons for interdisciplinary research.•Inspired by automorphism and isomorphism, Kabirian-based optinalysis introduces a new paradigm-shifting and unified approach to estimations in mathematical structures and biological sequences with a solid conceptual and theoretical foundation.•The GPA method enhances pairwise sequence similarity estimation by being sensitive to position-specific and character-specific variations and providing a comprehensive characterization of these features.

Collapse

Minkin I, Medvedev P. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nat Commun 2020;11:6327. [PMID: 33303762 PMCID: PMC7728760 DOI: 10.1038/s41467-020-19777-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 10/29/2020] [Indexed: 11/29/2022] Open

Vialle RA, Tamuri AU, Goldman N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol Biol Evol 2019;35:1783-1797. [PMID: 29618097 PMCID: PMC5995191 DOI: 10.1093/molbev/msy055] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Dewey CN. Whole-Genome Alignment. Methods Mol Biol 2019;1910:121-147. [PMID: 31278663 DOI: 10.1007/978-1-4939-9074-0_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Shim H, Larget B. BayesCAT: Bayesian co-estimation of alignment and tree. Biometrics 2017;74:270-279. [DOI: 10.1111/biom.12640] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 06/01/2016] [Accepted: 06/01/2016] [Indexed: 11/30/2022]

Ye Y, Lam TW, Ting HF. PnpProbs: a better multiple sequence alignment tool by better handling of guide trees. BMC Bioinformatics 2016;17 Suppl 8:285. [PMID: 27585754 PMCID: PMC5009527 DOI: 10.1186/s12859-016-1121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Barquist L, Burge SW, Gardner PP. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. CURRENT PROTOCOLS IN BIOINFORMATICS 2016;54:12.13.1-12.13.25. [PMID: 27322404 PMCID: PMC5010141 DOI: 10.1002/cpbi.4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Katoh K, Standley DM. A simple method to control over-alignment in the MAFFT multiple sequence alignment program. Bioinformatics 2016;32:1933-42. [PMID: 27153688 PMCID: PMC4920119 DOI: 10.1093/bioinformatics/btw108] [Citation(s) in RCA: 318] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 02/19/2016] [Indexed: 12/17/2022] Open

Ip CL, Loose M, Tyson JR, de Cesare M, Brown BL, Jain M, Leggett RM, Eccles DA, Zalunin V, Urban JM, Piazza P, Bowden RJ, Paten B, Mwaigwisya S, Batty EM, Simpson JT, Snutch TP, Birney E, Buck D, Goodwin S, Jansen HJ, O'Grady J, Olsen HE. MinION Analysis and Reference Consortium: Phase 1 data release and analysis. F1000Res 2015;4:1075. [PMID: 26834992 PMCID: PMC4722697 DOI: 10.12688/f1000research.7201.1] [Citation(s) in RCA: 190] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/09/2015] [Indexed: 11/20/2022] Open

Affiliation(s)

Camilla L.C. Ip Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Matthew Loose School of Life Sciences, Queens Medical Centre, University of Nottingham, Nottingham, UK
John R. Tyson Michael Smith Laboratories and Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada
Mariateresa de Cesare Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Bonnie L. Brown Virginia Commonwealth University, Richmond, VA, USA
Miten Jain University of California, Santa Cruz, Santa Cruz, CA, USA
Richard M. Leggett The Genome Analysis Centre, Norwich Research Park, Norwich, UK
David A. Eccles Malaghan Institute of Medical Research, Wellington, New Zealand
Vadim Zalunin European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
John M. Urban Division of Biology and Medicine, Brown University, Providence, RI, USA
Paolo Piazza Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Rory J. Bowden Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Benedict Paten University of California, Santa Cruz, Santa Cruz, CA, USA
Solomon Mwaigwisya Norwich Medical School, University of East Anglia, Norwich, UK
Elizabeth M. Batty Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Jared T. Simpson Informatics and Biocomputing, Ontario Institute for Cancer Research, ON, Canada
Terrance P. Snutch Michael Smith Laboratories and Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada
Ewan Birney European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
David Buck Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
Sara Goodwin Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
Hans J. Jansen ZF-screens B.V., Leiden, Netherlands
Justin O'Grady Norwich Medical School, University of East Anglia, Norwich, UK
Hugh E. Olsen University of California, Santa Cruz, Santa Cruz, CA, USA
MinION Analysis and Reference Consortium Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK School of Life Sciences, Queens Medical Centre, University of Nottingham, Nottingham, UK Michael Smith Laboratories and Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada Virginia Commonwealth University, Richmond, VA, USA University of California, Santa Cruz, Santa Cruz, CA, USA The Genome Analysis Centre, Norwich Research Park, Norwich, UK Malaghan Institute of Medical Research, Wellington, New Zealand European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK Division of Biology and Medicine, Brown University, Providence, RI, USA Norwich Medical School, University of East Anglia, Norwich, UK Informatics and Biocomputing, Ontario Institute for Cancer Research, ON, Canada Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA ZF-screens B.V., Leiden, Netherlands

Collapse

Herman JL, Novák Á, Lyngsø R, Szabó A, Miklós I, Hein J. Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs. BMC Bioinformatics 2015;16:108. [PMID: 25888064 PMCID: PMC4395974 DOI: 10.1186/s12859-015-0516-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 02/24/2015] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

A standard procedure in many areas of bioinformatics is to use a single multiple sequence alignment (MSA) as the basis for various types of analysis. However, downstream results may be highly sensitive to the alignment used, and neglecting the uncertainty in the alignment can lead to significant bias in the resulting inference. In recent years, a number of approaches have been developed for probabilistic sampling of alignments, rather than simply generating a single optimum. However, this type of probabilistic information is currently not widely used in the context of downstream inference, since most existing algorithms are set up to make use of a single alignment.

RESULTS

In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased.

CONCLUSIONS

The alignment DAG provides a natural way to represent a distribution in the space of MSAs, and allows for existing algorithms to be efficiently scaled up to operate on large sets of alignments. As an example, we show how this can be used to compute marginal probabilities for tree topologies, averaging over a very large number of MSAs. This framework can also be used to generate a statistically meaningful summary alignment; example applications show that this summary alignment is consistently more accurate than the majority of the alignment samples, leading to improvements in downstream tree inference. Implementations of the methods described in this article are available at http://statalign.github.io/WeaveAlign .

Collapse

Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods 2015;12:351-6. [PMID: 25686389 PMCID: PMC4907500 DOI: 10.1038/nmeth.3290] [Citation(s) in RCA: 377] [Impact Index Per Article: 41.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 01/20/2015] [Indexed: 12/31/2022]

Nánási M, Vinař T, Brejová B. Probabilistic approaches to alignment with tandem repeats. Algorithms Mol Biol 2014;9:3. [PMID: 24580741 PMCID: PMC3975930 DOI: 10.1186/1748-7188-9-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 02/24/2014] [Indexed: 11/16/2022] Open

Sahraeian SME, Yoon BJ. PicXAA: a probabilistic scheme for finding the maximum expected accuracy alignment of multiple biological sequences. Methods Mol Biol 2014;1079:203-210. [PMID: 24170404 DOI: 10.1007/978-1-62703-646-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Roozgard A, Barzigar N, Wang S, Jiang X, Cheng S. Empirical Transition Probability Indexing Sparse-Coding Belief Propagation (ETPI-SCoBeP) Genome Sequence Alignment. Cancer Inform 2014;13:159-65. [PMID: 25983537 PMCID: PMC4426956 DOI: 10.4137/cin.s13887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Revised: 10/09/2014] [Accepted: 10/10/2014] [Indexed: 11/29/2022] Open

Fernandes CA, Comparetti EJ, Borges RJ, Huancahuire-Vega S, Ponce-Soto LA, Marangoni S, Soares AM, Fontes MR. Structural bases for a complete myotoxic mechanism: Crystal structures of two non-catalytic phospholipases A2-like from Bothrops brazili venom. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013;1834:2772-81. [DOI: 10.1016/j.bbapap.2013.10.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 10/07/2013] [Accepted: 10/12/2013] [Indexed: 11/16/2022]

Vieira LF, Magro AJ, Fernandes CA, de Souza BM, Cavalcante WL, Palma MS, Rosa JC, Fuly AL, Fontes MR, Gallacci M, Butzke DS, Calderon LA, Stábeli RG, Giglio JR, Soares AM. Biochemical, functional, structural and phylogenetic studies on Intercro, a new isoform phospholipase A2 from Crotalus durissus terrificus snake venom. Biochimie 2013;95:2365-75. [DOI: 10.1016/j.biochi.2013.08.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 08/25/2013] [Indexed: 10/26/2022]

Cho SJ, Vallès Y, Weisblat DA. Differential expression of conserved germ line markers and delayed segregation of male and female primordial germ cells in a hermaphrodite, the leech helobdella. Mol Biol Evol 2013;31:341-54. [PMID: 24217283 PMCID: PMC3907050 DOI: 10.1093/molbev/mst201] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Heuristic alignment methods. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2013;1079:29-43. [PMID: 24170393 DOI: 10.1007/978-1-62703-646-7_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Baumler DJ, Ma B, Reed JL, Perna NT. Inferring ancient metabolism using ancestral core metabolic models of enterobacteria. BMC SYSTEMS BIOLOGY 2013;7:46. [PMID: 23758866 PMCID: PMC3694032 DOI: 10.1186/1752-0509-7-46] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2012] [Accepted: 06/06/2013] [Indexed: 11/30/2022]

Anderson JWJ, Novák Á, Sükösd Z, Golden M, Arunapuram P, Edvardsson I, Hein J. Quantifying variances in comparative RNA secondary structure prediction. BMC Bioinformatics 2013;14:149. [PMID: 23634662 PMCID: PMC3667108 DOI: 10.1186/1471-2105-14-149] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 03/21/2013] [Indexed: 11/11/2022] Open

Salvador GHM, Fernandes CAH, Magro AJ, Marchi-Salvador DP, Cavalcante WLG, Fernandez RM, Gallacci M, Soares AM, Oliveira CLP, Fontes MRM. Structural and phylogenetic studies with MjTX-I reveal a multi-oligomeric toxin--a novel feature in Lys49-PLA2s protein class. PLoS One 2013;8:e60610. [PMID: 23573271 PMCID: PMC3616104 DOI: 10.1371/journal.pone.0060610] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 02/28/2013] [Indexed: 11/19/2022] Open

Bouchard-Côté A, Jordan MI. Evolutionary inference via the Poisson Indel Process. Proc Natl Acad Sci U S A 2013;110:1160-6. [PMID: 23275296 PMCID: PMC3557041 DOI: 10.1073/pnas.1220450110] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Hamada M, Asai K. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 2012;19:532-49. [PMID: 22313125 DOI: 10.1089/cmb.2011.0197] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Wu M, Chatterji S, Eisen JA. Accounting for alignment uncertainty in phylogenomics. PLoS One 2012;7:e30288. [PMID: 22272325 PMCID: PMC3260272 DOI: 10.1371/journal.pone.0030288] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2011] [Accepted: 12/14/2011] [Indexed: 01/12/2023] Open

Löytynoja A. Alignment methods: strategies, challenges, benchmarking, and comparative overview. Methods Mol Biol 2012;855:203-35. [PMID: 22407710 DOI: 10.1007/978-1-61779-582-4_7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Abstract

Comparative evolutionary analyses of molecular sequences are solely based on the identities and differences detected between homologous characters. Errors in this homology statement, that is errors in the alignment of the sequences, are likely to lead to errors in the downstream analyses. Sequence alignment and phylogenetic inference are tightly connected and many popular alignment programs use the phylogeny to divide the alignment problem into smaller tasks. They then neglect the phylogenetic tree, however, and produce alignments that are not evolutionarily meaningful. The use of phylogeny-aware methods reduces the error but the resulting alignments, with evolutionarily correct representation of homology, can challenge the existing practices and methods for viewing and visualising the sequences. The inter-dependency of alignment and phylogeny can be resolved by joint estimation of the two; methods based on statistical models allow for inferring the alignment parameters from the data and correctly take into account the uncertainty of the solution but remain computationally challenging. Widely used alignment methods are based on heuristic algorithms and unlikely to find globally optimal solutions. The whole concept of one correct alignment for the sequences is questionable, however, as there typically exist vast numbers of alternative, roughly equally good alignments that should also be considered. This uncertainty is hidden by many popular alignment programs and is rarely correctly taken into account in the downstream analyses. The quest for finding and improving the alignment solution is complicated by the lack of suitable measures of alignment goodness. The difficulty of comparing alternative solutions also affects benchmarks of alignment methods and the results strongly depend on the measure used. As the effects of alignment error cannot be predicted, comparing the alignments' performance in downstream analyses is recommended.

Collapse

Wang LS, Leebens-Mack J, Kerr Wall P, Beckmann K, dePamphilis CW, Warnow T. The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:1108-1119. [PMID: 21566256 DOI: 10.1109/tcbb.2009.68] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res 2011;21:1512-28. [PMID: 21665927 DOI: 10.1101/gr.123356.111] [Citation(s) in RCA: 162] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Roskin KM, Paten B, Haussler D. Meta-alignment with crumble and prune: partitioning very large alignment problems for performance and parallelization. BMC Bioinformatics 2011;12:144. [PMID: 21569267 PMCID: PMC3114744 DOI: 10.1186/1471-2105-12-144] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 05/10/2011] [Indexed: 11/10/2022] Open

Hudek AK, Brown DG. FEAST: sensitive local alignment with multiple rates of evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:698-709. [PMID: 20733242 DOI: 10.1109/tcbb.2010.76] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Sahraeian SME, Yoon BJ. PicXAA-Web: a web-based platform for non-progressive maximum expected accuracy alignment of multiple biological sequences. Nucleic Acids Res 2011;39:W8-12. [PMID: 21515632 PMCID: PMC3125727 DOI: 10.1093/nar/gkr244] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Markova-Raina P, Petrov D. High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome Res 2011;21:863-74. [PMID: 21393387 DOI: 10.1101/gr.115949.110] [Citation(s) in RCA: 108] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Sahraeian SME, Yoon BJ. PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach. BMC Bioinformatics 2011;12 Suppl 1:S38. [PMID: 21342569 PMCID: PMC3044294 DOI: 10.1186/1471-2105-12-s1-s38] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Santos-Filho NA, Fernandes CAH, Menaldo DL, Magro AJ, Fortes-Dias CL, Estevão-Costa MI, Fontes MRM, Santos CR, Murakami MT, Soares AM. Molecular cloning and biochemical characterization of a myotoxin inhibitor from Bothrops alternatus snake plasma. Biochimie 2010;93:583-92. [PMID: 21144879 DOI: 10.1016/j.biochi.2010.11.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 11/26/2010] [Indexed: 10/18/2022]

dos Santos JI, Cintra-Francischinelli M, Borges RJ, Fernandes CAH, Pizzo P, Cintra ACO, Braz ASK, Soares AM, Fontes MRM. Structural, functional, and bioinformatics studies reveal a new snake venom homologue phospholipase A2class. Proteins 2010;79:61-78. [DOI: 10.1002/prot.22858] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Revised: 07/22/2010] [Accepted: 08/13/2010] [Indexed: 11/09/2022]

Sahraeian SME, Yoon BJ. PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 2010;38:4917-28. [PMID: 20413579 PMCID: PMC2926610 DOI: 10.1093/nar/gkq255] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/25/2010] [Accepted: 03/26/2010] [Indexed: 11/13/2022] Open

Sengupta R, Bastola DR, Ali HH. Classification and identification of fungal sequences using characteristic restriction endonuclease cut order. J Bioinform Comput Biol 2010;8:181-98. [PMID: 20401943 DOI: 10.1142/s0219720010004616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Revised: 10/18/2009] [Accepted: 10/18/2009] [Indexed: 11/18/2022]

Dessimoz C, Gil M. Phylogenetic assessment of alignments reveals neglected tree signal in gaps. Genome Biol 2010;11:R37. [PMID: 20370897 PMCID: PMC2884540 DOI: 10.1186/gb-2010-11-4-r37] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Revised: 01/26/2010] [Accepted: 04/06/2010] [Indexed: 01/08/2023] Open

Brandalise M, Severino FE, Maluf MP, Maia IG. The promoter of a gene encoding an isoflavone reductase-like protein in coffee (Coffea arabica) drives a stress-responsive expression in leaves. PLANT CELL REPORTS 2009;28:1699-708. [PMID: 19756631 DOI: 10.1007/s00299-009-0769-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2009] [Revised: 08/12/2009] [Accepted: 08/20/2009] [Indexed: 05/12/2023]

Bradley RK, Holmes I. Evolutionary triplet models of structured RNA. PLoS Comput Biol 2009;5:e1000483. [PMID: 19714212 PMCID: PMC2725318 DOI: 10.1371/journal.pcbi.1000483] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 07/23/2009] [Indexed: 12/31/2022] Open

Abstract

The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a "transducer composition" algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.

Collapse

Fast statistical alignment. PLoS Comput Biol 2009;5:e1000392. [PMID: 19478997 PMCID: PMC2684580 DOI: 10.1371/journal.pcbi.1000392] [Citation(s) in RCA: 230] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 04/20/2009] [Indexed: 02/01/2023] Open

Abstract

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

Biological sequence alignment is one of the fundamental problems in comparative genomics, yet it remains unsolved. Over sixty sequence alignment programs are listed on Wikipedia, and many new programs are published every year. However, many popular programs suffer from pathologies such as aligning unrelated sequences and producing discordant alignments in protein (amino acid) and codon (nucleotide) space, casting doubt on the accuracy of the inferred alignments. Inaccurate alignments can introduce large and unknown systematic biases into downstream analyses such as phylogenetic tree reconstruction and substitution rate estimation. We describe a new program for multiple sequence alignment which can align protein, RNA and DNA sequence and improves on the accuracy of existing approaches on benchmarks of protein and RNA structural alignments and simulated mammalian and fly genomic alignments. Our approach, which seeks to find the alignment which is closest to the truth under our statistical model, leaves unrelated sequences largely unaligned and produces concordant alignments in protein and codon space. It is fast enough for difficult problems such as aligning orthologous genomic regions or aligning hundreds or thousands of proteins. It furthermore has a companion GUI for visualizing the estimated alignment reliability.

Collapse

Ashkenazy H, Unger R, Kliger Y. Optimal data collection for correlated mutation analysis. Proteins 2009;74:545-55. [PMID: 18655065 DOI: 10.1002/prot.22168] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Paten B, Herrero J, Beal K, Birney E. Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. ACTA ACUST UNITED AC 2008;25:295-301. [PMID: 19056777 DOI: 10.1093/bioinformatics/btn630] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Rausch T, Emde AK, Weese D, Döring A, Notredame C, Reinert K. Segment-based multiple sequence alignment. Bioinformatics 2008;24:i187-92. [PMID: 18689823 DOI: 10.1093/bioinformatics/btn281] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Bradley RK, Pachter L, Holmes I. Specific alignment of structured RNA: stochastic grammars and sequence annealing. ACTA ACUST UNITED AC 2008;24:2677-83. [PMID: 18796475 DOI: 10.1093/bioinformatics/btn495] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Sanchez-Villeda H, Schroeder S, Flint-Garcia S, Guill KE, Yamasaki M, McMullen MD. DNAAlignEditor: DNA alignment editor tool. BMC Bioinformatics 2008;9:154. [PMID: 18366684 PMCID: PMC2322986 DOI: 10.1186/1471-2105-9-154] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Accepted: 03/19/2008] [Indexed: 12/02/2022] Open

Sanchez-Villeda H, Schroeder S, Flint-Garcia S, Guill KE, Yamasaki M, McMullen MD. DNAAlignEditor: DNA alignment editor tool. BMC Bioinformatics 2008. [PMID: 18366684 DOI: 10.1186/1471‐2105‐9‐154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Do CB, Katoh K. Protein multiple sequence alignment. Methods Mol Biol 2008;484:379-413. [PMID: 18592193 DOI: 10.1007/978-1-59745-398-1_25] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Ruby JG, Stark A, Johnston WK, Kellis M, Bartel DP, Lai EC. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 2007;17:1850-64. [PMID: 17989254 DOI: 10.1101/gr.6597907] [Citation(s) in RCA: 462] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Martin W, Roettger M, Lockhart PJ. A reality check for alignments and trees. Trends Genet 2007;23:478-80. [PMID: 17825944 DOI: 10.1016/j.tig.2007.08.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Revised: 07/16/2007] [Accepted: 08/22/2007] [Indexed: 11/18/2022]