Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kupczok A, Haeseler AV, Klaere S. An Exact Algorithm for the Geodesic Distance between Phylogenetic Trees. J Comput Biol 2008;15:577-91. [DOI: 10.1089/cmb.2008.0068] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Kupczok A, Haeseler AV, Klaere S. An Exact Algorithm for the Geodesic Distance between Phylogenetic Trees. J Comput Biol 2008;15:577-91. [DOI: 10.1089/cmb.2008.0068] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Brazão JM, Foster PG, Cox CJ. Data-specific substitution models improve protein-based phylogenetics. PeerJ 2023;11:e15716. [PMID: 37576497 PMCID: PMC10416777 DOI: 10.7717/peerj.15716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/16/2023] [Indexed: 08/15/2023] Open

Abstract

Calculating amino-acid substitution models that are specific for individual protein data sets is often difficult due to the computational burden of estimating large numbers of rate parameters. In this study, we tested the computational efficiency and accuracy of five methods used to estimate substitution models, namely Codeml, FastMG, IQ-TREE, P4 (maximum likelihood), and P4 (Bayesian inference). Data-specific substitution models were estimated from simulated alignments (with different lengths) that were generated from a known simulation model and simulation tree. Each of the resulting data-specific substitution models was used to calculate the maximum likelihood score of the simulation tree and simulated data that was used to calculate the model, and compared with the maximum likelihood scores of the known simulation model and simulation tree on the same simulated data. Additionally, the commonly-used empirical models, cpREV and WAG, were assessed similarly. Data-specific models performed better than the empirical models, which under-fitted the simulated alignments, had the highest difference to the simulation model maximum-likelihood score, clustered further from the simulation model in principal component analysis ordination, and inferred less accurate trees. Data-specific models and the simulation model shared statistically indistinguishable maximum-likelihood scores, indicating that the five methods were reasonably accurate at estimating substitution models by this measure. Nevertheless, tree statistics showed differences between optimal maximum likelihood trees. Unlike other model estimating methods, trees inferred using data-specific models generated with IQ-TREE and P4 (maximum likelihood) were not significantly different from the trees derived from the simulation model in each analysis, indicating that these two methods alone were the most accurate at estimating data-specific models. To show the benefits of using data-specific protein models several published data sets were reanalysed using IQ-TREE-estimated models. These newly estimated models were a better fit to the data than the empirical models that were used by the original authors, often inferred longer trees, and resulted in different tree topologies in more than half of the re-analysed data sets. The results of this study show that software availability and high computation burden are not limitations to generating better-fitting data-specific amino-acid substitution models for phylogenetic analyses.

Collapse

Rangel LT, Soucy SM, Setubal JC, Gogarten JP, Fournier GP. An efficient, non-phylogenetic method for detecting genes sharing evolutionary signals in phylogenomic datasets. Genome Biol Evol 2021;13:6352501. [PMID: 34390574 PMCID: PMC8483891 DOI: 10.1093/gbe/evab187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/11/2021] [Indexed: 11/25/2022] Open

Abstract

Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here, we describe the Evolutionary Similarity Index (IES), a means to assess shared evolution between gene families using a weighted orthogonal distance regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to evolutionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, IES enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, and yielding the least sum of squared residuals. Analyses of simulated gene family data sets show that IES’s accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying IES to an empirical data set of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, which occur as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched toward specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of interphyla horizontal gene transfer. An IES implementation is available at https://github.com/lthiberiol/evolSimIndex.

Collapse

Irisarri I, Strassert JFH, Burki F. Phylogenomic Insights into the Origin of Primary Plastids. Syst Biol 2021;71:105-120. [PMID: 33988690 DOI: 10.1093/sysbio/syab036] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 05/07/2021] [Accepted: 05/10/2021] [Indexed: 11/13/2022] Open

Bansal MS. Linear-time algorithms for phylogenetic tree completion under Robinson-Foulds distance. Algorithms Mol Biol 2020;15:6. [PMID: 32313549 PMCID: PMC7155338 DOI: 10.1186/s13015-020-00166-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 04/04/2020] [Indexed: 11/10/2022] Open

Avino M, Ng GT, He Y, Renaud MS, Jones BR, Poon AFY. Tree shape-based approaches for the comparative study of cophylogeny. Ecol Evol 2019;9:6756-6771. [PMID: 31312429 PMCID: PMC6618157 DOI: 10.1002/ece3.5185] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 02/21/2019] [Accepted: 03/29/2019] [Indexed: 12/17/2022] Open

Blackburne BP, Whelan S. Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol 2012;30:642-53. [PMID: 23144040 DOI: 10.1093/molbev/mss256] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Abstract

Multiple sequence alignment (MSA) is the heart of comparative sequence analysis. Recent studies demonstrate that MSA algorithms can produce different outcomes when analyzing genomes, including phylogenetic tree inference and the detection of adaptive evolution. These studies also suggest that the difference between MSA algorithms is of a similar order to the uncertainty within an algorithm and suggest integrating across this uncertainty. In this study, we examine further the problem of disagreements between MSA algorithms and how they affect downstream analyses. We also investigate whether integrating across alignment uncertainty affects downstream analyses. We address these questions by analyzing 200 chordate gene families, with properties reflecting those used in large-scale genomic analyses. We find that newly developed distance metrics reveal two significantly different classes of MSA methods (MSAMs). The similarity-based class includes progressive aligners and consistency aligners, representing many methodological innovations for sequence alignment, whereas the evolution-based class includes phylogenetically aware alignment and statistical alignment. We proceed to show that the class of an MSAM has a substantial impact on downstream analyses. For phylogenetic inference, tree estimates and their branch lengths appear highly dependent on the class of aligner used. The number of families, and the sites within those families, inferred to have undergone adaptive evolution depend on the class of aligner used. Similarity-based aligners tend to identify more adaptive evolution. We also develop and test methods for incorporating MSA uncertainty when detecting adaptive evolution but find that although accounting for MSA uncertainty does affect downstream analyses, it appears less important than the class of aligner chosen. Our results demonstrate the critical role that MSA methodology has on downstream analysis, highlighting that the class of aligner chosen in an analysis has a demonstrable effect on its outcome.

Collapse

Nye TMW. Principal components analysis in the space of phylogenetic trees. Ann Stat 2011. [DOI: 10.1214/11-aos915] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Battagliero S, Puglia G, Vicario S, Rubino F, Scioscia G, Leo P. An efficient algorithm for approximating geodesic distances in tree space. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:1196-1207. [PMID: 21116041 DOI: 10.1109/tcbb.2010.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Huber KT, Spillner A, Suchecki R, Moulton V. Metrics on multilabeled trees: interrelationships and diameter bounds. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:1029-1040. [PMID: 21116046 DOI: 10.1109/tcbb.2010.122] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Huggins PM, Li W, Haws D, Friedrich T, Liu J, Yoshida R. Bayes estimators for phylogenetic reconstruction. Syst Biol 2011;60:528-40. [PMID: 21471560 DOI: 10.1093/sysbio/syr021] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Owen M, Provan JS. A fast algorithm for computing geodesic distances in tree space. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:2-13. [PMID: 21071792 DOI: 10.1109/tcbb.2010.3] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Sundberg K, Clement M, Snell Q. On the use of cartographic projections in visualizing phylo-genetic tree space. Algorithms Mol Biol 2010;5:26. [PMID: 20529355 PMCID: PMC2891783 DOI: 10.1186/1748-7188-5-26] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Accepted: 06/08/2010] [Indexed: 11/10/2022] Open