Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: DeGiorgio M, Degnan JH. Robustness to divergence time underestimation when inferring species trees from estimated gene trees. Syst Biol 2013;63:66-82. [PMID: 23988674 DOI: 10.1093/sysbio/syt059] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	DeGiorgio M, Degnan JH. Robustness to divergence time underestimation when inferring species trees from estimated gene trees. Syst Biol 2013;63:66-82. [PMID: 23988674 DOI: 10.1093/sysbio/syt059] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Arasti S, Tabaghi P, Tabatabaee Y, Mirarab S. Branch Length Transforms using Optimal Tree Metric Matching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.13.566962. [PMID: 38746464 PMCID: PMC11092445 DOI: 10.1101/2023.11.13.566962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Abstract

The abundant discordance between evolutionary relationships across the genome has rekindled interest in ways of comparing and averaging trees on a shared leaf set. However, most attempts at reconciling trees have focused on tree topology, producing metrics for comparing topologies and methods for computing median tree topologies. Using branch lengths, however, has been more elusive, due to several challenges. Species tree branch lengths can be measured in many units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times. Thus, branch lengths are highly heterogeneous across both the genome and the tree. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we make progress on the question of mapping one tree to another, incorporating both topology and branch length. We define a series of computational problems to formalize finding the best transformation of one tree to another while maintaining its topology and other constraints. We show that all these problems can be solved in quadratic time and memory using a linear algebraic formulation coupled with dynamic programming preprocessing. Our formulations lead to convex optimization problems, with efficient and theoretically optimal solutions. While many applications can be imagined for this framework, we apply it to measure species tree branch lengths in the unit of the expected number of substitutions per site while allowing divergence from ultrametricity across the tree. In these applications, our method matches or surpasses other methods designed directly for solving those problems. Thus, our approach provides a versatile toolkit that finds applications in similar evolutionary questions.

Code availability

The software is available at https://github.com/shayesteh99/TCMM.git .

Data availability

Data are available on Github https://github.com/shayesteh99/TCMM-Data.git .

Collapse

Frankel LE, Ané C. Summary Tests of Introgression Are Highly Sensitive to Rate Variation Across Lineages. Syst Biol 2023;72:1357-1369. [PMID: 37698548 DOI: 10.1093/sysbio/syad056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/07/2023] [Accepted: 08/29/2023] [Indexed: 09/13/2023] Open

Bernot JP, Owen CL, Wolfe JM, Meland K, Olesen J, Crandall KA. Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling. Mol Biol Evol 2023;40:msad175. [PMID: 37552897 PMCID: PMC10414812 DOI: 10.1093/molbev/msad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open

Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022;39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open

Interpreting phylogenetic conflict: Hybridization in the most speciose genus of lichen-forming fungi. Mol Phylogenet Evol 2022;174:107543. [PMID: 35690378 DOI: 10.1016/j.ympev.2022.107543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 02/06/2022] [Accepted: 05/13/2022] [Indexed: 11/24/2022]

Abstract

While advances in sequencing technologies have been invaluable for understanding evolutionary relationships, increasingly large genomic data sets may result in conflicting evolutionary signals that are often caused by biological processes, including hybridization. Hybridization has been detected in a variety of organisms, influencing evolutionary processes such as generating reproductive barriers and mixing standing genetic variation. Here, we investigate the potential role of hybridization in the diversification of the most speciose genus of lichen-forming fungi, Xanthoparmelia. As Xanthoparmelia is projected to have gone through recent, rapid diversification, this genus is particularly suitable for investigating and interpreting the origins of phylogenomic conflict. Focusing on a clade of Xanthoparmelia largely restricted to the Holarctic region, we used a genome skimming approach to generate 962 single-copy gene regions representing over 2 Mbp of the mycobiont genome. From this genome-scale dataset, we inferred evolutionary relationships using both concatenation and coalescent-based species tree approaches. We also used three independent tests for hybridization. Although different species tree reconstruction methods recovered largely consistent and well-supported trees, there was widespread incongruence among individual gene trees. Despite challenges in differentiating hybridization from ILS in situations of recent rapid radiations, our genome-wide analyses detected multiple potential hybridization events in the Holarctic clade, suggesting one possible source of trait variability in this hyperdiverse genus. This study highlights the value in using a pluralistic approach for characterizing genome-scale conflict, even in groups with well-resolved phylogenies, while highlighting current challenges in detecting the specific impacts of hybridization.

Collapse

Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022;71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open

A stochastic Farris transform for genetic data under the multispecies coalescent with applications to data requirements. J Math Biol 2022;84:36. [PMID: 35394192 DOI: 10.1007/s00285-022-01731-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/15/2022] [Accepted: 02/17/2022] [Indexed: 10/18/2022]

Abstract

Species tree estimation faces many significant hurdles. Chief among them is that the trees describing the ancestral lineages of each individual gene-the gene trees-often differ from the species tree. The multispecies coalescent is commonly used to model this gene tree discordance, at least when it is believed to arise from incomplete lineage sorting, a population-genetic effect. Another significant challenge in this area is that molecular sequences associated to each gene typically provide limited information about the gene trees themselves. While the modeling of sequence evolution by single-site substitutions is well-studied, few species tree reconstruction methods with theoretical guarantees actually address this latter issue. Instead, a standard-but unsatisfactory-assumption is that gene trees are perfectly reconstructed before being fed into a so-called summary method. Hence much remains to be done in the development of inference methodologies that rigorously account for gene tree estimation error-or completely avoid gene tree estimation in the first place. In previous work, a data requirement trade-off was derived between the number of loci m needed for an accurate reconstruction and the length of the locus sequences k. It was shown that to reconstruct an internal branch of length f, one needs m to be of the order of [Formula: see text]. That previous result was obtained under the restrictive assumption that mutation rates as well as population sizes are constant across the species phylogeny. Here we further generalize this result beyond this assumption. Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent, which we refer to as a stochastic Farris transform. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with [Formula: see text] species, the rooted topology of the species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.

Collapse

McLean BS, Bell KC, Cook JA. SNP-based Phylogenomic Inference in Holarctic Ground Squirrels (Urocitellus). Mol Phylogenet Evol 2022;169:107396. [PMID: 35031463 DOI: 10.1016/j.ympev.2022.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/02/2021] [Accepted: 12/08/2021] [Indexed: 11/24/2022]

Jiao X, Flouri T, Yang Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Natl Sci Rev 2022;8:nwab127. [PMID: 34987842 PMCID: PMC8692950 DOI: 10.1093/nsr/nwab127] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/10/2021] [Accepted: 07/11/2021] [Indexed: 02/06/2023] Open

Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Koch H, DeGiorgio M. Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure. Genome Biol Evol 2020;12:3977-3995. [PMID: 32022857 PMCID: PMC7061232 DOI: 10.1093/gbe/evaa022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2020] [Indexed: 11/12/2022] Open

Mclean BS, Bell KC, Allen JM, Helgen KM, Cook JA. Impacts of Inference Method and Data set Filtering on Phylogenomic Resolution in a Rapid Radiation of Ground Squirrels (Xerinae: Marmotini). Syst Biol 2018;68:298-316. [DOI: 10.1093/sysbio/syy064] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 09/12/2018] [Indexed: 12/20/2022] Open

Degnan JH. Modeling Hybridization Under the Network Multispecies Coalescent. Syst Biol 2018;67:786-799. [PMID: 29846734 PMCID: PMC6101600 DOI: 10.1093/sysbio/syy040] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2017] [Revised: 05/13/2018] [Accepted: 05/16/2018] [Indexed: 11/13/2022] Open

SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space. Mol Phylogenet Evol 2018. [DOI: 10.1016/j.ympev.2018.03.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Pei J, Wu Y. STELLS2: fast and accurate coalescent-based maximum likelihood inference of species trees from gene tree topologies. Bioinformatics 2018;33:1789-1797. [PMID: 28186220 DOI: 10.1093/bioinformatics/btx079] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 02/07/2017] [Indexed: 11/14/2022] Open

Abstract

Motivation

It is well known that gene trees and species trees may have different topologies. One explanation is incomplete lineage sorting, which is commonly modeled by the coalescent process. In multispecies coalescent, a gene tree topology is observed with some probability (called the gene tree probability) for a given species tree. Gene tree probability is the main tool for the program STELLS, which finds the maximum likelihood estimate of the species tree from the given gene tree topologies. However, STELLS becomes slow when data size increases. Recently, several fast species tree inference methods have been developed, which can handle large data. However, these methods often do not fully utilize the information in the gene trees.

Results

In this paper, we present an algorithm (called STELLS2) for computing the gene tree probability more efficiently than the original STELLS. The key idea of STELLS2 is taking some 'shortcuts' during the computation and computing the gene tree probability approximately. We apply the STELLS2 algorithm in the species tree inference approach in the original STELLS, which leads to a new maximum likelihood species tree inference method (also called STELLS2). Through simulation we demonstrate that the gene tree probabilities computed by STELLS2 and STELLS have strong correlation. We show that STELLS2 is almost as accurate in species tree inference as STELLS. Also STELLS2 is usually more accurate than several existing methods when there is one allele per species, although STELLS2 is slower than these methods. STELLS2 outperforms these methods significantly when there are multiple alleles per species.

Availability and Implementation

The program STELLS2 is available for download at: https://github.com/yufengwudcs/STELLS2.

Contact

yufeng.wu@uconn.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Platt RN, Faircloth BC, Sullivan KAM, Kieran TJ, Glenn TC, Vandewege MW, Lee TE, Baker RJ, Stevens RD, Ray DA. Conflicting Evolutionary Histories of the Mitochondrial and Nuclear Genomes in New World Myotis Bats. Syst Biol 2018;67:236-249. [PMID: 28945862 PMCID: PMC5837689 DOI: 10.1093/sysbio/syx070] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 07/31/2017] [Accepted: 08/15/2017] [Indexed: 01/05/2023] Open

Zhu S, Degnan JH. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. Syst Biol 2018;66:283-298. [PMID: 27780899 DOI: 10.1093/sysbio/syw097] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 03/08/2016] [Indexed: 11/13/2022] Open

Allman ES, Degnan JH, Rhodes JA. Species Tree Inference from Gene Splits by Unrooted STAR Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:337-342. [PMID: 28113601 PMCID: PMC5388605 DOI: 10.1109/tcbb.2016.2604812] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Wen D, Nakhleh L. Coestimating Reticulate Phylogenies and Gene Trees from Multilocus Sequence Data. Syst Biol 2017;67:439-457. [DOI: 10.1093/sysbio/syx085] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 10/24/2017] [Indexed: 11/13/2022] Open

Molloy EK, Warnow T. To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods. Syst Biol 2017;67:285-303. [DOI: 10.1093/sysbio/syx077] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 09/13/2017] [Indexed: 01/27/2023] Open

Bhattacharyya S, Mukherjee J. IDXL: Species Tree Inference Using Internode Distance and Excess Gene Leaf Count. J Mol Evol 2017;85:57-78. [PMID: 28835989 DOI: 10.1007/s00239-017-9807-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 08/09/2017] [Indexed: 11/28/2022]

Kamneva OK, Rosenberg NA. Simulation-Based Evaluation of Hybridization Network Reconstruction Methods in the Presence of Incomplete Lineage Sorting. Evol Bioinform Online 2017;13:1176934317691935. [PMID: 28469378 PMCID: PMC5395256 DOI: 10.1177/1176934317691935] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Accepted: 01/11/2017] [Indexed: 11/22/2022] Open

Yu Y, Jermaine C, Nakhleh L. Exploring phylogenetic hypotheses via Gibbs sampling on evolutionary networks. BMC Genomics 2016;17:784. [PMID: 28185563 PMCID: PMC5123299 DOI: 10.1186/s12864-016-3099-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Uricchio LH, Warnow T, Rosenberg NA. An analytical upper bound on the number of loci required for all splits of a species tree to appear in a set of gene trees. BMC Bioinformatics 2016;17:417. [PMID: 28185570 PMCID: PMC5123308 DOI: 10.1186/s12859-016-1266-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

McLean BS, Jackson DJ, Cook JA. Rapid divergence and gene flow at high latitudes shape the history of Holarctic ground squirrels (Urocitellus). Mol Phylogenet Evol 2016;102:174-88. [DOI: 10.1016/j.ympev.2016.05.040] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Revised: 05/26/2016] [Accepted: 05/31/2016] [Indexed: 11/26/2022]

Bayesian Inference of Reticulate Phylogenies under the Multispecies Network Coalescent. PLoS Genet 2016;12:e1006006. [PMID: 27144273 PMCID: PMC4856265 DOI: 10.1371/journal.pgen.1006006] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 04/04/2016] [Indexed: 11/19/2022] Open

Abstract

The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.

Trees have long formed in biology the basic structure with which to represent and understand evolutionary relationships. Mathematical models, computational methods, and software tools for inferring phylogenetic trees and studying their mathematical properties are currently the norm in biology. The availability of genomic data from closely related species, as well as from multiple individuals within species, have brought the two fields of phylogenetics and population genetics closer than ever. In particular, the last two decades have witnessed a great flourish in the development and implementation of phylogenetic methods based on the multispecies coalescent model to capture the intricate relationship between gene and genome evolution. However, when reticulation processes such as hybridization occur, the phylogenetic history is best represented by a network. In this work, we demonstrate how the multispecies coalescent model can be adapted to reticulate evolutionary histories and report on a Bayesian method for inference of such histories under this extended model. As networks subsume trees, the model and method provide a principled and unified statistical framework for inferring treelike and non-treelike evolutionary relationships.

Collapse

Consistency and inconsistency of consensus methods for inferring species trees from gene trees in the presence of ancestral population structure. Theor Popul Biol 2016;110:12-24. [PMID: 27086043 DOI: 10.1016/j.tpb.2016.02.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 12/22/2015] [Accepted: 02/05/2016] [Indexed: 11/21/2022]

Abstract

In the last few years, several statistically consistent consensus methods for species tree inference have been devised that are robust to the gene tree discordance caused by incomplete lineage sorting in unstructured ancestral populations. One source of gene tree discordance that has only recently been identified as a potential obstacle for phylogenetic inference is ancestral population structure. In this article, we describe a general model of ancestral population structure, and by relying on a single carefully constructed example scenario, we show that the consensus methods Democratic Vote, STEAC, STAR, R(∗) Consensus, Rooted Triple Consensus, Minimize Deep Coalescences, and Majority-Rule Consensus are statistically inconsistent under the model. We find that among the consensus methods evaluated, the only method that is statistically consistent in the presence of ancestral population structure is GLASS/Maximum Tree. We use simulations to evaluate the behavior of the various consensus methods in a model with ancestral population structure, showing that as the number of gene trees increases, estimates on the basis of GLASS/Maximum Tree approach the true species tree topology irrespective of the level of population structure, whereas estimates based on the remaining methods only approach the true species tree topology if the level of structure is low. However, through simulations using species trees both with and without ancestral population structure, we show that GLASS/Maximum Tree performs unusually poorly on gene trees inferred from alignments with little information. This practical limitation of GLASS/Maximum Tree together with the inconsistency of other methods prompts the need for both further testing of additional existing methods and development of novel methods under conditions that incorporate ancestral population structure.

Collapse

Wen D, Yu Y, Hahn MW, Nakhleh L. Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis. Mol Ecol 2016;25:2361-72. [PMID: 26808290 DOI: 10.1111/mec.13544] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2015] [Revised: 12/15/2015] [Accepted: 01/06/2016] [Indexed: 12/27/2022]

The Pace of Hybrid Incompatibility Evolution in House Mice. Genetics 2015. [PMID: 26199234 DOI: 10.1534/genetics.115.179499] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open

Bayzid MS, Mirarab S, Boussau B, Warnow T. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses. PLoS One 2015;10:e0129183. [PMID: 26086579 PMCID: PMC4472720 DOI: 10.1371/journal.pone.0129183] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/05/2015] [Indexed: 11/19/2022] Open

Abstract

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning.

Collapse

van Tuinen M, Torres CR. Potential for bias and low precision in molecular divergence time estimation of the Canopy of Life: an example from aquatic bird families. Front Genet 2015;6:203. [PMID: 26106406 PMCID: PMC4459087 DOI: 10.3389/fgene.2015.00203] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2015] [Accepted: 05/25/2015] [Indexed: 11/13/2022] Open

Abstract

Uncertainty in divergence time estimation is frequently studied from many angles but rarely from the perspective of phylogenetic node age. If appropriate molecular models and fossil priors are used, a multi-locus, partitioned analysis is expected to equally minimize error in accuracy and precision across all nodes of a given phylogeny. In contrast, if available models fail to completely account for rate heterogeneity, substitution saturation and incompleteness of the fossil record, uncertainty in divergence time estimation may increase with node age. While many studies have stressed this concern with regard to deep nodes in the Tree of Life, the inference that molecular divergence time estimation of shallow nodes is less sensitive to erroneous model choice has not been tested explicitly in a Bayesian framework. Because of available divergence time estimation methods that permit fossil priors across any phylogenetic node and the present increase in efficient, cheap collection of species-level genomic data, insight is needed into the performance of divergence time estimation of shallow (<10 MY) nodes. Here, we performed multiple sensitivity analyses in a multi-locus data set of aquatic birds with six fossil constraints. Comparison across divergence time analyses that varied taxon and locus sampling, number and position of fossil constraint and shape of prior distribution showed various insights. Deviation from node ages obtained from a reference analysis was generally highest for the shallowest nodes but determined more by temporal placement than number of fossil constraints. Calibration with only the shallowest nodes significantly underestimated the aquatic bird fossil record, indicating the presence of saturation. Although joint calibration with all six priors yielded ages most consistent with the fossil record, ages of shallow nodes were overestimated. This bias was found in both mtDNA and nDNA regions. Thus, divergence time estimation of shallow nodes may suffer from bias and low precision, even when appropriate fossil priors and best available substitution models are chosen. Much care must be taken to address the possible ramifications of substitution saturation across the entire Tree of Life.

Collapse

Joly S, Bryant D, Lockhart PJ. Flexible methods for estimating genetic distances from single nucleotide polymorphisms. Methods Ecol Evol 2015. [DOI: 10.1111/2041-210x.12343] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. ASTRAL: genome-scale coalescent-based species tree estimation. ACTA ACUST UNITED AC 2015;30:i541-8. [PMID: 25161245 PMCID: PMC4147915 DOI: 10.1093/bioinformatics/btu462] [Citation(s) in RCA: 717] [Impact Index Per Article: 79.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Affiliation(s)

S Mirarab Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
R Reaz Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
Md S Bayzid Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
T Zimmermann Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
M S Swenson Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA
T Warnow Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA, Departement d'informatique, Ecole Normale Superieure, 45 Rue d'Ulm, F-75230 Paris Cedex 05, France and Department of Electrical Engineering, The University of Southern California, Los Angeles, CA 90089, USA

Collapse

Mirarab S, Bayzid MS, Boussau B, Warnow T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 2014;346:1250463. [PMID: 25504728 DOI: 10.1126/science.1250463] [Citation(s) in RCA: 197] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Pyron RA, Hendry CR, Chou VM, Lemmon EM, Lemmon AR, Burbrink FT. Effectiveness of phylogenomic data and coalescent species-tree methods for resolving difficult nodes in the phylogeny of advanced snakes (Serpentes: Caenophidia). Mol Phylogenet Evol 2014;81:221-31. [DOI: 10.1016/j.ympev.2014.08.023] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 07/29/2014] [Accepted: 08/22/2014] [Indexed: 11/15/2022]

Gatesy J, Springer MS. Phylogenetic analysis at deep timescales: Unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Mol Phylogenet Evol 2014;80:231-66. [DOI: 10.1016/j.ympev.2014.08.013] [Citation(s) in RCA: 239] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Revised: 07/26/2014] [Accepted: 08/10/2014] [Indexed: 11/16/2022]

Jockusch EL, Martínez-Solano I, Timpe EK. The Effects of Inference Method, Population Sampling, and Gene Sampling on Species Tree Inferences: An Empirical Study in Slender Salamanders (Plethodontidae: Batrachoseps). Syst Biol 2014;64:66-83. [DOI: 10.1093/sysbio/syu078] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Mirarab S, Bayzid MS, Warnow T. Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. Syst Biol 2014;65:366-80. [PMID: 25164915 DOI: 10.1093/sysbio/syu063] [Citation(s) in RCA: 181] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 08/18/2014] [Indexed: 12/13/2022] Open

DeGiorgio M, Syring J, Eckert AJ, Liston A, Cronn R, Neale DB, Rosenberg NA. An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines. BMC Evol Biol 2014;14:67. [PMID: 24678701 PMCID: PMC4021425 DOI: 10.1186/1471-2148-14-67] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 02/10/2014] [Indexed: 12/26/2022] Open

Abstract

Background

As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size.

Results

Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies.

Conclusions

When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.

Collapse