1
|
Doulcier G, Lambert A. Neutral diversity in experimental metapopulations. Theor Popul Biol 2024; 158:89-108. [PMID: 38493997 DOI: 10.1016/j.tpb.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 02/07/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024]
Abstract
New automated and high-throughput methods allow the manipulation and selection of numerous bacterial populations. In this manuscript we are interested in the neutral diversity patterns that emerge from such a setup in which many bacterial populations are grown in parallel serial transfers, in some cases with population-wide extinction and splitting events. We model bacterial growth by a birth-death process and use the theory of coalescent point processes. We show that there is a dilution factor that optimises the expected amount of neutral diversity for a given number of cycles, and study the power law behaviour of the mutation frequency spectrum for different experimental regimes. We also explore how neutral variation diverges between two recently split populations by establishing a new formula for the expected number of shared and private mutations. Finally, we show the interest of such a setup to select a phenotype of interest that requires multiple mutations.
Collapse
Affiliation(s)
- Guilhem Doulcier
- Macquarie University, Department of Philosophy, Sydney, Australia; Max Planck Institute for Evolutionary Biology, Department of Theoretical Biology, Plön, Germany.
| | - Amaury Lambert
- SMILE - Stochastic Models for the Inference of Life Evolution, Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS UMR8197, INSERM U1024, France; Centre Interdisciplinaire de Recherche en Biologie (CIRB), Collège de France, CNRS UMR7241, INSERM U1050, PSL Université, Paris, France.
| |
Collapse
|
2
|
Bienvenu F, Steel M. 0-1 Laws for Pattern Occurrences in Phylogenetic Trees and Networks. Bull Math Biol 2024; 86:94. [PMID: 38896355 DOI: 10.1007/s11538-024-01316-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 05/26/2024] [Indexed: 06/21/2024]
Abstract
In a recent paper, the question of determining the fraction of binary trees that contain a fixed pattern known as the snowflake was posed. We show that this fraction goes to 1, providing two very different proofs: a purely combinatorial one that is quantitative and specific to this problem; and a proof using branching process techniques that is less explicit, but also much more general, as it applies to any fixed patterns and can be extended to other trees and networks. In particular, it follows immediately from our second proof that the fraction of d-ary trees (resp. level-k networks) that contain a fixed d-ary tree (resp. level-k network) tends to 1 as the number of leaves grows.
Collapse
Affiliation(s)
- François Bienvenu
- Institute for Theoretical Studies, ETH Zürich, 8092, Zürich, Switzerland
- Université de Franche-Comté, CNRS, LmB, F-25000, Besançon, France
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
| |
Collapse
|
3
|
Budd GE, Mann RP. Two Notorious Nodes: A Critical Examination of Relaxed Molecular Clock Age Estimates of the Bilaterian Animals and Placental Mammals. Syst Biol 2024; 73:223-234. [PMID: 37695319 PMCID: PMC11129587 DOI: 10.1093/sysbio/syad057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 08/30/2023] [Accepted: 09/08/2023] [Indexed: 09/12/2023] Open
Abstract
The popularity of relaxed clock Bayesian inference of clade origin timings has generated several recent publications with focal results considerably older than the fossils of the clades in question. Here, we critically examine two such clades: the animals (with a focus on the bilaterians) and the mammals (with a focus on the placentals). Each example displays a set of characteristic pathologies which, although much commented on, are rarely corrected for. We conclude that in neither case does the molecular clock analysis provide any evidence for an origin of the clade deeper than what is suggested by the fossil record. In addition, both these clades have other features (including, in the case of the placental mammals, proximity to a large mass extinction) that allow us to generate precise expectations of the timings of their origins. Thus, in these instances, the fossil record can provide a powerful test of molecular clock methodology, and why it goes astray, and we have every reason to think these problems are general. [Cambrian explosion; mammalian evolution; molecular clocks.].
Collapse
Affiliation(s)
- Graham E Budd
- Department of Earth Sciences, Palaeobiology Programme, Uppsala University, Villavägen 16 SE 75236, Sweden
| | - Richard P Mann
- Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, UK
| |
Collapse
|
4
|
Diamantidis D, Fan WTL, Birkner M, Wakeley J. Bursts of coalescence within population pedigrees whenever big families occur. Genetics 2024; 227:iyae030. [PMID: 38408329 DOI: 10.1093/genetics/iyae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 01/23/2024] [Accepted: 02/18/2024] [Indexed: 02/28/2024] Open
Abstract
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Collapse
Affiliation(s)
| | - Wai-Tong Louis Fan
- Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Matthias Birkner
- Institut für Mathematik, Johannes-Gutenberg-Universität, 55099 Mainz, Germany
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
5
|
Shao Y, Magee AF, Vasylyeva TI, Suchard MA. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models. PLoS Comput Biol 2024; 20:e1011640. [PMID: 38551979 PMCID: PMC11006205 DOI: 10.1371/journal.pcbi.1011640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 04/10/2024] [Accepted: 03/10/2024] [Indexed: 04/09/2024] Open
Abstract
Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.
Collapse
Affiliation(s)
- Yucai Shao
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
| | - Andrew F. Magee
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- Department of Population Health and Disease Prevention, University of California Irvine, Irvine, California, United States of America
| | - Marc A. Suchard
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, Universtiy of California, Los Angeles, California, United States of America
| |
Collapse
|
6
|
Shao Y, Magee AF, Vasylyeva TI, Suchard MA. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.31.564882. [PMID: 37961423 PMCID: PMC10634968 DOI: 10.1101/2023.10.31.564882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.
Collapse
Affiliation(s)
- Yucai Shao
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States
| | - Andrew F. Magee
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, United States
- Department of Population Health and Disease Prevention, University of California Irvine, Irvine, United States
| | - Marc A. Suchard
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California, Los Angeles, United States
| |
Collapse
|
7
|
Luo A, Zhang C, Zhou QS, Ho SYW, Zhu CD. Impacts of Taxon-Sampling Schemes on Bayesian Tip Dating Under the Fossilized Birth-Death Process. Syst Biol 2023; 72:781-801. [PMID: 36919368 PMCID: PMC10405359 DOI: 10.1093/sysbio/syad011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/18/2023] [Accepted: 03/14/2023] [Indexed: 03/16/2023] Open
Abstract
Evolutionary timescales can be inferred by molecular-clock analyses of genetic data and fossil evidence. Bayesian phylogenetic methods such as tip dating provide a powerful framework for inferring evolutionary timescales, but the most widely used priors for tree topologies and node times often assume that present-day taxa have been sampled randomly or exhaustively. In practice, taxon sampling is often carried out so as to include representatives of major lineages, such as orders or families. We examined the impacts of different densities of diversified sampling on Bayesian tip dating on unresolved fossilized birth-death (FBD) trees, in which fossil taxa are topologically constrained but their exact placements are averaged out. We used synthetic data generated by simulations of nucleotide sequence evolution, fossil occurrences, and diversified taxon sampling. Our analyses under the diversified-sampling FBD process show that increasing taxon-sampling density does not necessarily improve divergence-time estimates. However, when informative priors were specified for the root age or when tree topologies were fixed to those used for simulation, the performance of tip dating on unresolved FBD trees maintains its accuracy and precision or improves with taxon-sampling density. By exploring three situations in which models are mismatched, we find that including all relevant fossils, without pruning off those that are incompatible with the diversified-sampling FBD process, can lead to underestimation of divergence times. Our reanalysis of a eutherian mammal data set confirms some of the findings from our simulation study, and reveals the complexity of diversified taxon sampling in phylogenomic data sets. In highlighting the interplay of taxon-sampling density and other factors, the results of our study have practical implications for using Bayesian tip dating to infer evolutionary timescales across the Tree of Life. [Bayesian tip dating; eutherian mammals; fossilized birth-death process; phylogenomics; taxon sampling.].
Collapse
Affiliation(s)
- Arong Luo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Chi Zhang
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing 100044, China
- Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, Beijing 100044, China
| | - Qing-Song Zhou
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Chao-Dong Zhu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- State Key Laboratory of Integrated Pest Management, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
- International College, University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
8
|
Disanto F, Fuchs M, Paningbatan AR, Rosenberg NA. The distributions under two species-tree models of the number of root ancestral configurations for matching gene trees and species trees. ANN APPL PROBAB 2022. [DOI: 10.1214/22-aap1791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
| | - Michael Fuchs
- Department of Mathematical Sciences, National Chengchi University
| | | | | |
Collapse
|
9
|
Soewongsono AC, Holland BR, O’Reilly MM. The Shape of Phylogenies Under Phase-Type Distributed Times to Speciation and Extinction. Bull Math Biol 2022; 84:118. [PMID: 36103093 PMCID: PMC9474389 DOI: 10.1007/s11538-022-01072-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 08/29/2022] [Indexed: 11/26/2022]
Abstract
Phylogenetic trees describe relationships between extant species, but beyond that their shape and their relative branch lengths can provide information on broader evolutionary processes of speciation and extinction. However, currently many of the most widely used macro-evolutionary models make predictions about the shapes of phylogenetic trees that differ considerably from what is observed in empirical phylogenies. Here, we propose a flexible and biologically plausible macroevolutionary model for phylogenetic trees where times to speciation or extinction events are drawn from a Coxian phase-type (PH) distribution. First, we show that different choices of parameters in our model lead to a range of tree balances as measured by Aldous’ \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta $$\end{document}β statistic. In particular, we demonstrate that it is possible to find parameters that correspond well to empirical tree balance. Next, we provide a natural extension of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta $$\end{document}β statistic to sets of trees. This extension produces less biased estimates of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta $$\end{document}β compared to using the median \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\beta $$\end{document}β values from individual trees. Furthermore, we derive a likelihood expression for the probability of observing an edge-weighted tree under a model with speciation but no extinction. Finally, we illustrate the application of our model by performing both absolute and relative goodness-of-fit tests for two large empirical phylogenies (squamates and angiosperms) that compare models with Coxian PH distributed times to speciation with models that assume exponential or Weibull distributed waiting times. In our numerical analysis, we found that, in most cases, models assuming a Coxian PH distribution provided the best fit.
Collapse
Affiliation(s)
- Albert Ch. Soewongsono
- School of Natural Sciences (Discipline of Mathematics), University of Tasmania, Hobart, 7005 Australia
| | - Barbara R. Holland
- School of Natural Sciences (Discipline of Mathematics), University of Tasmania, Hobart, 7005 Australia
| | - Małgorzata M. O’Reilly
- School of Natural Sciences (Discipline of Mathematics), University of Tasmania, Hobart, 7005 Australia
| |
Collapse
|
10
|
Ho LST, Dinh V. When can we reconstruct the ancestral state? A unified theory. Theor Popul Biol 2022; 148:22-27. [PMID: 36167107 DOI: 10.1016/j.tpb.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 09/09/2022] [Accepted: 09/19/2022] [Indexed: 10/14/2022]
Abstract
Ancestral state reconstruction is one of the most important tasks in evolutionary biology. Conditions under which we can reliably reconstruct the ancestral state have been studied for both discrete and continuous traits. However, the connection between these results is unclear, and it seems that each model needs different conditions. In this work, we provide a unifying theory on the consistency of ancestral state reconstruction for various types of trait evolution models. Notably, we show that for a sequence of nested trees with bounded heights, the necessary and sufficient conditions for the existence of a consistent ancestral state reconstruction method under discrete models, the Brownian motion model, and the threshold model are equivalent. When tree heights are unbounded, we provide a simple counter-example to show that this equivalence is no longer valid.
Collapse
Affiliation(s)
- Lam Si Tung Ho
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
| | - Vu Dinh
- Department of Mathematical Sciences, University of Delaware, USA.
| |
Collapse
|
11
|
Shchur V, Spirin V, Sirotkin D, Burovski E, De Maio N, Corbett-Detig R. VGsim: Scalable viral genealogy simulator for global pandemic. PLoS Comput Biol 2022; 18:e1010409. [PMID: 36001646 PMCID: PMC9447924 DOI: 10.1371/journal.pcbi.1010409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 09/06/2022] [Accepted: 07/18/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. We develop a fast and flexible simulation software package VGsim for modeling epidemiological processes and generating genealogies of large pathogen samples. The software takes into account host population structure, pathogen evolution, host immunity and some other epidemiological aspects. The computational efficiency of the package allows to simulate genealogies of tens of millions of samples, which is important, e.g., for SARS-CoV-2 genome studies.
Collapse
Affiliation(s)
- Vladimir Shchur
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
- * E-mail:
| | - Vadim Spirin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | - Dmitry Sirotkin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California, United States of America
| |
Collapse
|
12
|
Bocharov S, Harris S, Kominek E, Mooers AØ, Steel M. Predicting long pendant edges in model phylogenies, with applications to biodiversity and tree inference. Syst Biol 2022:6671239. [PMID: 35980265 DOI: 10.1093/sysbio/syac059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 08/01/2022] [Accepted: 08/08/2022] [Indexed: 11/12/2022] Open
Abstract
In the simplest phylogenetic diversification model (the pure-birth Yule process), lineages split independently at a constant rate λ for time t. The length of a randomly chosen edge (either interior or pendant) in the resulting tree has an expected value that rapidly converges to 12λ as t grows, and thus is essentially independent of t. However, the behaviour of the length L of the longest pendant edge reveals remarkably different behaviour: L converges to t/2 as the expected number of leaves grows. Extending this model to allow an extinction rate μ (where μ < λ), we also establish a similar result for birth-death trees, except that t/2 is replaced by t/2 ⋅ (1 - μ/λ). This 'complete' tree may contain subtrees that have died out before time t; for the 'reduced tree' that just involves the leaves present at time t and their direct ancestors, the longest pendant edge length L again converges to t/2. Thus, there is likely to be at least one extant species whose associated pendant branch attaches to the tree approximately half-way back in time to the origin of the entire clade. We also briefly consider the length of the shortest edges. Our results are relevant to phylogenetic diversity indices in biodiversity conservation, and to quantifying the length of aligned sequences required to correctly infer a tree. We compare our theoretical results with simulations, and with the branch lengths from a recent phylogenetic tree of all mammals.
Collapse
Affiliation(s)
- Sergey Bocharov
- Department of Foundational Mathematics, Xian Jiaotong-Liverpool University, Suzhou, China
| | - Simon Harris
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Emma Kominek
- Biological Sciences, Simon Fraser University, 8888 Univ. Drive, Burnaby BC Canada V5A 1S6
| | - Arne Ø Mooers
- Biological Sciences, Simon Fraser University, 8888 Univ. Drive, Burnaby BC Canada V5A 1S6
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
13
|
Kurpas MK, Kimmel M. Modes of Selection in Tumors as Reflected by Two Mathematical Models and Site Frequency Spectra. Front Ecol Evol 2022; 10:889438. [PMID: 37333691 PMCID: PMC10275603 DOI: 10.3389/fevo.2022.889438] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024] Open
Abstract
The tug-of-war model was developed in a series of papers of McFarland and co-authors to account for existence of mutually counteracting rare advantageous driver mutations and more frequent slightly deleterious passenger mutations in cancer. In its original version, it was a state-dependent branching process. Because of its formulation, the tug-of-war model is of importance for tackling the problem as to whether evolution of cancerous tumors is "Darwinian" or "non-Darwinian." We define two Time-Continuous Markov Chain versions of the model, including identical mutation processes but adopting different drift and selection components. In Model A, drift and selection process preserves expected fitness whereas in Model B it leads to non-decreasing expected fitness. We investigate these properties using mathematical analysis and extensive simulations, which detect the effect of the so-called drift barrier in Model B but not in Model A. These effects are reflected in different structure of clone genealogies in the two models. Our work is related to the past theoretical work in the field of evolutionary genetics, concerning the interplay among mutation, drift and selection, in absence of recombination (asexual reproduction), where epistasis plays a major role. Finally, we use the statistics of mutation frequencies known as the Site Frequency Spectra (SFS), to compare the variant frequencies in DNA of sequenced HER2+ breast cancers, to those based on Model A and B simulations. The tumor-based SFS are better reproduced by Model A, pointing out a possible selection pattern of HER2+ tumor evolution. To put our models in context, we carried out an exploratory study of how publicly accessible data from breast, prostate, skin and ovarian cancers fit a range of models found in the literature.
Collapse
Affiliation(s)
- Monika K. Kurpas
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Marek Kimmel
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
- Department of Statistics and Bioengineering, Rice University, Houston, TX, United States
| |
Collapse
|
14
|
Diao J, M O'Reilly M, Holland B. A subfunctionalisation model of gene family evolution predicts balanced tree shapes. Mol Phylogenet Evol 2022; 176:107566. [PMID: 35810972 DOI: 10.1016/j.ympev.2022.107566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 05/18/2022] [Accepted: 05/25/2022] [Indexed: 11/26/2022]
Abstract
We consider a subfunctionalisation model of gene family evolution. A family of n genes that perform z functions is represented by an n×z binary matrix Yt where a 1 in the ijth position indicates that gene i can perform function j. Yt evolves according to a continuous time Markov chain (CTMC) that represents the processes of gene duplication, coding region loss and regulatory region loss with the restriction that each function is protected by selection, meaning that each column in the matrix must contain at least one 1. We generate gene trees based on the CTMC {Yt,t⩾0}. We analyse the long-run behaviour of the model and specify the conditions where we expect gene trees to continue to grow and where we expect them to have a stable number of genes. We show that different choices of rate parameters for the processes of duplication and loss lead to different tree shapes as measured by two common tree-shape statistics: the β-statistic for measuring tree balance and the γ-statistic for assessing diversification rate. We use an extension of β that is estimated from sets of trees. This extension is less biased compared to using the average β value for individual trees. When the rate of gene duplication dominates the rate of gene loss, the process is unstable and the distribution of tree shapes is close to following the uniform ranked tree shape (URT) distribution. However, when the process is stable, gene trees are predicted to have positive values of β indicating balanced trees and negative values of γ indicating that diversification occurs more towards the root of the tree. The results of our analyses suggest that comparing the tree-shape statistics of empirical gene-trees to the predictions presented here will provide a test of the subfunctionalisation model.
Collapse
Affiliation(s)
- Jiahao Diao
- Discipline of Mathematics, University of Tasmania, Australia; Australian Research Council Centre of Excellence for Plant Success, Australia.
| | - Małgorzata M O'Reilly
- Discipline of Mathematics, University of Tasmania, Australia; Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers, Australia.
| | - Barbara Holland
- Discipline of Mathematics, University of Tasmania, Australia; Australian Research Council Centre of Excellence for Plant Success, Australia.
| |
Collapse
|
15
|
Hua X, Herdha T, Burden C. Protracted speciation under the state-dependent speciation and extinction approach. Syst Biol 2022; 71:1362-1377. [PMID: 35699529 PMCID: PMC9558848 DOI: 10.1093/sysbio/syac041] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 05/16/2022] [Accepted: 06/07/2022] [Indexed: 11/17/2022] Open
Abstract
How long does speciation take? The answer to this important question in evolutionary biology lies in the genetic difference not only among species, but also among lineages within each species. With the advance of genome sequencing in non-model organisms and the statistical tools to improve accuracy in inferring evolutionary histories among recently diverged lineages, we now have the lineage-level trees to answer these questions. However, we do not yet have an analytical tool for inferring speciation processes from these trees. What is needed is a model of speciation processes that generates both the trees and species identities of extant lineages. The model should allow calculation of the probability that certain lineages belong to certain species and have an evolutionary history consistent with the tree. Here, we propose such a model and test the model performance on both simulated data and real data. We show that maximum-likelihood estimates of the model are highly accurate and give estimates from real data that generate patterns consistent with observations. We discuss how to extend the model to account for different rates and types of speciation processes across lineages in a species group. By linking evolutionary processes on lineage level to species level, the model provides a new phylogenetic approach to study not just when speciation happened, but how speciation happened. [Micro–macro evolution; Protracted birth–death process; speciation completion rate; SSE approach.]
Collapse
Affiliation(s)
- Xia Hua
- Mathematical Sciences Institute, Australian National University, Canberra ACT 0200 Australia
| | - Tyara Herdha
- Mathematical Sciences Institute, Australian National University, Canberra ACT 0200 Australia
| | - Conrad Burden
- Mathematical Sciences Institute, Australian National University, Canberra ACT 0200 Australia
| |
Collapse
|
16
|
Cheek D. The coalescent tree of a Markov branching process with generalised logistic growth. J Math Biol 2022; 84:33. [PMID: 35380291 PMCID: PMC10362510 DOI: 10.1007/s00285-022-01735-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Revised: 03/02/2022] [Accepted: 03/12/2022] [Indexed: 11/25/2022]
Abstract
We consider a class of density-dependent branching processes which generalises exponential, logistic and Gompertz growth. A population begins with a single individual, grows exponentially initially, and then growth may slow down as the population size moves towards a carrying capacity. At a time while the population is still growing superlinearly, a fixed number of individuals are sampled and their coalescent tree is drawn. Taking the sampling time and carrying capacity simultaneously to infinity, we prove convergence of the coalescent tree to a limiting tree which is in a sense universal over our class of models.
Collapse
Affiliation(s)
- David Cheek
- Massachusetts General Hospital, Harvard Medical School, 149 13th St., Charlestown, MA, 02129, USA.
| |
Collapse
|
17
|
Cornuault J, Sanmartín I. A road map for phylogenetic models of species trees. Mol Phylogenet Evol 2022; 173:107483. [DOI: 10.1016/j.ympev.2022.107483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 10/18/2022]
|
18
|
Krivina ES, Temraleeva AD, Bukin YS. Species delimitation and microalgal cryptic diversity analysis of the genus Micractinium (Chlorophyta). Vavilovskii Zhurnal Genet Selektsii 2022; 26:74-85. [PMID: 35342860 PMCID: PMC8894098 DOI: 10.18699/vjgb-22-11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 07/28/2021] [Accepted: 08/06/2021] [Indexed: 11/19/2022] Open
Abstract
In this article, the system of the green microalgal genus Micractinium, based on morphological, physiological, ecological and molecular data, is considered. The main diagnostic species characteristics and the taxonomic placement of some taxa are also discussed. Phylogenetic analysis showed that the genus Micractinium is characterized by high cryptic diversity. The algorithms used for species delimitation had different results on the number of potentially species-level clusters allocated. The ABGD method was less “sensitive”. The tree-based approaches GMYC and PTP showed a more feasible taxonomy of the genus Micractinium, being an effective additional tool for distinguishing species. The clustering obtained by the latter two methods is in good congruence with morphological (cell size and shape, ability to form colonies, production of bristles, chloroplast type), physiological (vitamin requirements, reaction to high and low temperatures), molecular (presence of introns, level of genetic differences, presence of CBCs or special features of the secondary structure in ITS1 and ITS2) and ecological characteristics (habitat). The polyphyly
of the holotype of the genus M. pusillum as well as M. belenophorum is shown. The intron was effective as an additional
tool for distinguishing species, and the results of the intron analysis should be taken into account together
with other characteristics. The CBC approach, based on the search for compensatory base changes in conservative
ITS2 regions, was successful only for distinguishing cryptic species from “true” members of M. pusillum. Therefore, to
distinguish species, it is more effective to take into account all the CBC in ITS1 and ITS2 and analyze characteristic
structural differences (molecular signatures) in the secondary structure of internal transcribed spacers. The genetic
distances analysis of 18S–ITS1–5.8S–ITS2 nucleotide sequences showed that intraspecific differences in the genus
ranged from 0 to 0.5 % and interspecific differences, from 0.6 to 4.7 %. Due to the polyphasic approach, it was possible
to characterize 29 clusters and phylogenetic lines at the species level within the genus Micractinium and to
make assumptions about the species.
Collapse
Affiliation(s)
- E. S. Krivina
- Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”
| | - A. D. Temraleeva
- Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”
| | - Yu. S. Bukin
- Limnological Institute of the Siberian Branch of the Russian Academy of Sciences
| |
Collapse
|
19
|
Shchur V, Spirin V, Sirotkin D, Burovski E, De Maio N, Corbett-Detig R. VGsim: scalable viral genealogy simulator for global pandemic. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.04.21.21255891. [PMID: 33948608 PMCID: PMC8095227 DOI: 10.1101/2021.04.21.21255891] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. The code is freely available at https://github.com/Genomics-HSE/VGsim.
Collapse
Affiliation(s)
| | | | | | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Russell Corbett-Detig
- HSE University, Russian Federation
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California 95064
| |
Collapse
|
20
|
Kayondo HW, Ssekagiri A, Nabakooza G, Bbosa N, Ssemwanga D, Kaleebu P, Mwalili S, Mango JM, Leigh Brown AJ, Saenz RA, Galiwango R, Kitayimbwa JM. Employing phylogenetic tree shape statistics to resolve the underlying host population structure. BMC Bioinformatics 2021; 22:546. [PMID: 34758743 PMCID: PMC8579572 DOI: 10.1186/s12859-021-04465-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 10/29/2021] [Indexed: 12/24/2022] Open
Abstract
Background Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure. Results In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are: the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$R_0$$\end{document}R0) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models. Conclusions Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$92.3\%$$\end{document}92.3% using SVM-polynomial classifier.
Collapse
Affiliation(s)
- Hassan W Kayondo
- Institute of Basic Sciences, Technology and Innovation (PAUSTI), Pan African University, Nairobi, Kenya. .,Department of Mathematics, Makerere University, Kampala, Uganda.
| | - Alfred Ssekagiri
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda
| | - Grace Nabakooza
- Department of Immunology and Molecular Biology, Makerere University, Kampala, Uganda.,UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Entebbe, Uganda.,Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| | - Nicholas Bbosa
- Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Deogratius Ssemwanga
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Pontiano Kaleebu
- Uganda Virus Research Institute (UVRI), Entebbe, Uganda.,Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene and Tropical Medicine (LSHTM) Uganda Research Unit, Entebbe, Uganda
| | - Samuel Mwalili
- Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - John M Mango
- Department of Mathematics, Makerere University, Kampala, Uganda
| | | | | | - Ronald Galiwango
- Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| | - John M Kitayimbwa
- Centre for Computational Biology, Uganda Christian University, Mukono, Uganda
| |
Collapse
|
21
|
Upham NS, Esselstyn JA, Jetz W. Molecules and fossils tell distinct yet complementary stories of mammal diversification. Curr Biol 2021; 31:4195-4206.e3. [PMID: 34329589 DOI: 10.1016/j.cub.2021.07.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/05/2021] [Accepted: 07/07/2021] [Indexed: 11/25/2022]
Abstract
Reconstructing the tempo at which biodiversity arose is a fundamental goal of evolutionary biologists, yet the relative merits of evolutionary-rate estimates are debated based on whether they are derived from the fossil record or time-calibrated phylogenies (timetrees) of living species. Extinct lineages unsampled in timetrees are known to "pull" speciation rates downward, but the temporal scale at which this bias matters is unclear. To investigate this problem, we compare mammalian diversification-rate signatures in a credible set of molecular timetrees (n = 5,911 species, ∼70% from DNA) to those in fossil genus durations (n = 5,320). We use fossil extinction rates to correct or "push" the timetree-based (pulled) speciation-rate estimates, finding a surge of speciation during the Paleocene (∼66-56 million years ago, Ma) between the Cretaceous-Paleogene (K-Pg) boundary and the Paleocene-Eocene Thermal Maximum (PETM). However, about two-thirds of the K-Pg-to-PETM originating taxa did not leave modern descendants, indicating that this rate signature is likely undetectable from extant lineages alone. For groups without substantial fossil records, thankfully all is not lost. Pushed and pulled speciation rates converge starting ∼10 Ma and are equal at the present day when recent evolutionary processes can be estimated without bias using species-specific "tip" rates of speciation. Clade-wide moments of tip rates also enable enriched inference, as the skewness of tip rates is shown to approximate a clade's extent of past diversification-rate shifts. Molecular timetrees need fossil-correction to address deep-time questions, but they are sufficient for shallower time questions where extinctions are fewer.
Collapse
Affiliation(s)
- Nathan S Upham
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Center for Biodiversity and Global Change, Yale University, New Haven, CT 06511, USA; School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA.
| | - Jacob A Esselstyn
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Walter Jetz
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, USA; Center for Biodiversity and Global Change, Yale University, New Haven, CT 06511, USA.
| |
Collapse
|
22
|
Helmstetter AJ, Glemin S, Käfer J, Zenil-Ferguson R, Sauquet H, de Boer H, Dagallier LPMJ, Mazet N, Reboud EL, Couvreur TLP, Condamine FL. Pulled Diversification Rates, Lineages-Through-Time Plots and Modern Macroevolutionary Modelling. Syst Biol 2021; 71:758-773. [PMID: 34613395 PMCID: PMC9016617 DOI: 10.1093/sysbio/syab083] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 11/29/2022] Open
Abstract
Estimating time-dependent rates of speciation and extinction from dated phylogenetic trees of extant species (timetrees), and determining how and why they vary, is key to understanding how ecological and evolutionary processes shape biodiversity. Due to an increasing availability of phylogenetic trees, a growing number of process-based methods relying on the birth–death model have been developed in the last decade to address a variety of questions in macroevolution. However, this methodological progress has regularly been criticized such that one may wonder how reliable the estimations of speciation and extinction rates are. In particular, using lineages-through-time (LTT) plots, a recent study has shown that there are an infinite number of equally likely diversification scenarios that can generate any timetree. This has led to questioning whether or not diversification rates should be estimated at all. Here, we summarize, clarify, and highlight technical considerations on recent findings regarding the capacity of models to disentangle diversification histories. Using simulations, we illustrate the characteristics of newly proposed “pulled rates” and their utility. We recognize that the recent findings are a step forward in understanding the behavior of macroevolutionary modeling, but they in no way suggest we should abandon diversification modeling altogether. On the contrary, the study of macroevolution using phylogenetic trees has never been more exciting and promising than today. We still face important limitations in regard to data availability and methods, but by acknowledging them we can better target our joint efforts as a scientific community. [Birth–death models; extinction; phylogenetics; speciation.]
Collapse
Affiliation(s)
- Andrew J Helmstetter
- Fondation pour la Recherche sur la Biodiversité - Centre for the Synthesis and Analysis of Biodiversity, 34000 Montpellier, France
| | - Sylvain Glemin
- CNRS, Ecosystmes Biodiversit Evolution (Universit de Rennes), 35000 Rennes, France
| | - Jos Käfer
- Universit de Lyon, Universit Lyon 1, CNRS, Laboratoire de Biomtrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France
| | | | - Herv Sauquet
- National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, 2000, Australia.,Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, Australia
| | - Hugo de Boer
- Natural History Museum, University of Oslo, 0318 Oslo, Norway
| | | | - Nathan Mazet
- CNRS, Institut des Sciences de l'Evolution de Montpellier (Universit de Montpellier), Place Eugne Bataillon, 34095 Montpellier, France
| | - Eliette L Reboud
- CNRS, Institut des Sciences de l'Evolution de Montpellier (Universit de Montpellier), Place Eugne Bataillon, 34095 Montpellier, France
| | | | - Fabien L Condamine
- CNRS, Institut des Sciences de l'Evolution de Montpellier (Universit de Montpellier), Place Eugne Bataillon, 34095 Montpellier, France
| |
Collapse
|
23
|
Chazot N, Condamine FL, Dudas G, Peña C, Kodandaramaiah U, Matos-Maraví P, Aduse-Poku K, Elias M, Warren AD, Lohman DJ, Penz CM, DeVries P, Fric ZF, Nylin S, Müller C, Kawahara AY, Silva-Brandão KL, Lamas G, Kleckova I, Zubek A, Ortiz-Acevedo E, Vila R, Vane-Wright RI, Mullen SP, Jiggins CD, Wheat CW, Freitas AVL, Wahlberg N. Conserved ancestral tropical niche but different continental histories explain the latitudinal diversity gradient in brush-footed butterflies. Nat Commun 2021; 12:5717. [PMID: 34588433 PMCID: PMC8481491 DOI: 10.1038/s41467-021-25906-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 09/07/2021] [Indexed: 02/08/2023] Open
Abstract
The global increase in species richness toward the tropics across continents and taxonomic groups, referred to as the latitudinal diversity gradient, stimulated the formulation of many hypotheses to explain the underlying mechanisms of this pattern. We evaluate several of these hypotheses to explain spatial diversity patterns in a butterfly family, the Nymphalidae, by assessing the contributions of speciation, extinction, and dispersal, and also the extent to which these processes differ among regions at the same latitude. We generate a time-calibrated phylogeny containing 2,866 nymphalid species (~45% of extant diversity). Neither speciation nor extinction rate variations consistently explain the latitudinal diversity gradient among regions because temporal diversification dynamics differ greatly across longitude. The Neotropical diversity results from low extinction rates, not high speciation rates, and biotic interchanges with other regions are rare. Southeast Asia is also characterized by a low speciation rate but, unlike the Neotropics, is the main source of dispersal events through time. Our results suggest that global climate change throughout the Cenozoic, combined with tropical niche conservatism, played a major role in generating the modern latitudinal diversity gradient of nymphalid butterflies.
Collapse
Affiliation(s)
- Nicolas Chazot
- Department of Ecology, Swedish University of Agricultural Sciences, Ulls väg 16, 75651, Uppsala, Sweden.
- Systematic Biology Group, Department of Biology, Lund University, Lund, Sweden.
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden.
| | - Fabien L Condamine
- CNRS, UMR 5554 Institut des Sciences de l'Evolution de Montpellier (Université de Montpellier|CNRS|IRD|EPHE), Place Eugene Bataillon, 34095, Montpellier, France
| | - Gytis Dudas
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Carlos Peña
- Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Ullasa Kodandaramaiah
- IISER-TVM Centre for Research and Education in Ecology and Evolution (ICREEE), School of Biology, Indian Institute of Science Education and Research Thiruvananthapuram, Thiruvananthapuram, India
| | - Pável Matos-Maraví
- Gothenburg Global Biodiversity Centre, Gothenburg, Sweden
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic
| | - Kwaku Aduse-Poku
- Department of Life and Earth Sciences, Perimeter College, Georgia State University, 33 Gilmer Street, Atlanta, GA, 30303, USA
| | - Marianne Elias
- ISYEB, CNRS, MNHN, Sorbonne Université, EPHE, Université des Antilles, 57 rue Cuvier, Paris, 75005, France
| | - Andrew D Warren
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - David J Lohman
- City College of New York and Graduate Center, CUNY, New York, NY, USA
- National Museum of Natural History, Manila, Philippines
| | - Carla M Penz
- Department of Biological Sciences, University of New Orleans, New Orleans, LA, USA
| | - Phil DeVries
- Department of Biological Sciences, University of New Orleans, New Orleans, LA, USA
| | - Zdenek F Fric
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic
| | - Soren Nylin
- Department of Zoology, Stockholm University, 10691, Stockholm, Sweden
| | - Chris Müller
- Australian Museum, 6 College Street, Sydney, NSW, 2010, Australia
| | - Akito Y Kawahara
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - Karina L Silva-Brandão
- Universidade Estadual de Campinas, Centro de Biologia Molecular e Engenharia Genética, Av. Candido Rondom, 400, 13083-875, Campinas, SP, Brazil
| | - Gerardo Lamas
- Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Irena Kleckova
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic
| | - Anna Zubek
- Nature Education Centre, Jagiellonian University, ul. Gronostajowa 5, 30-387, Kraków, Poland
| | - Elena Ortiz-Acevedo
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- Departamento de Química y Biología, Universidad del Norte, Barranquilla, Colombia
| | - Roger Vila
- Institut de Biologia Evolutiva (CSIC-UPF), Barcelona, Spain
| | - Richard I Vane-Wright
- Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK
- Durrell Institute of Conservation and Ecology (DICE), University of Kent, Canterbury, CT2 7NR, UK
| | - Sean P Mullen
- 5 Cummington Street, Department of Biology, Boston University, Boston, MA, 02215, USA
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Downing St., Cambridge, CB2 3EJ, UK
- Smithsonian Tropical Research Institute, Gamboa, Panama
| | | | - Andre V L Freitas
- Departamento de Biologia Animal, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), 13083-862, Campinas, SP, Brazil
| | - Niklas Wahlberg
- Systematic Biology Group, Department of Biology, Lund University, Lund, Sweden
| |
Collapse
|
24
|
Coalescent models derived from birth-death processes. Theor Popul Biol 2021; 142:1-11. [PMID: 34563554 DOI: 10.1016/j.tpb.2021.09.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 11/21/2022]
Abstract
A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N0, the size of the present-day population, while letting N0→∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.
Collapse
|
25
|
Louca S, McLaughlin A, MacPherson A, Joy JB, Pennell MW. Fundamental Identifiability Limits in Molecular Epidemiology. Mol Biol Evol 2021; 38:4010-4024. [PMID: 34009339 PMCID: PMC8382926 DOI: 10.1093/molbev/msab149] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Viral phylogenies provide crucial information on the spread of infectious diseases, and many studies fit mathematical models to phylogenetic data to estimate epidemiological parameters such as the effective reproduction ratio (Re) over time. Such phylodynamic inferences often complement or even substitute for conventional surveillance data, particularly when sampling is poor or delayed. It remains generally unknown, however, how robust phylodynamic epidemiological inferences are, especially when there is uncertainty regarding pathogen prevalence and sampling intensity. Here, we use recently developed mathematical techniques to fully characterize the information that can possibly be extracted from serially collected viral phylogenetic data, in the context of the commonly used birth-death-sampling model. We show that for any candidate epidemiological scenario, there exists a myriad of alternative, markedly different, and yet plausible "congruent" scenarios that cannot be distinguished using phylogenetic data alone, no matter how large the data set. In the absence of strong constraints or rate priors across the entire study period, neither maximum-likelihood fitting nor Bayesian inference can reliably reconstruct the true epidemiological dynamics from phylogenetic data alone; rather, estimators can only converge to the "congruence class" of the true dynamics. We propose concrete and feasible strategies for making more robust epidemiological inferences from viral phylogenetic data.
Collapse
Affiliation(s)
- Stilianos Louca
- Department of Biology, University of Oregon, Eugene, OR, USA
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Angela McLaughlin
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Ailene MacPherson
- Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, ON, Canada
| | - Jeffrey B Joy
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Bioinformatics, University of British Columbia, Vancouver, BC, Canada
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Matthew W Pennell
- Biodiversity Research Centre, University of British Columbia, Vancouver, BC, Canada
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
26
|
MacPherson A, Louca S, McLaughlin A, Joy JB, Pennell MW. Unifying Phylogenetic Birth-Death Models in Epidemiology and Macroevolution. Syst Biol 2021; 71:172-189. [PMID: 34165577 PMCID: PMC8972974 DOI: 10.1093/sysbio/syab049] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 06/09/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
Birth–death stochastic processes are the foundations of many phylogenetic models and are
widely used to make inferences about epidemiological and macroevolutionary dynamics. There
are a large number of birth–death model variants that have been developed; these impose
different assumptions about the temporal dynamics of the parameters and about the sampling
process. As each of these variants was individually derived, it has been difficult to
understand the relationships between them as well as their precise biological and
mathematical assumptions. Without a common mathematical foundation, deriving new models is
nontrivial. Here, we unify these models into a single framework, prove that many
previously developed epidemiological and macroevolutionary models are all special cases of
a more general model, and illustrate the connections between these variants. This
unification includes both models where the process is the same for all lineages and those
in which it varies across types. We also outline a straightforward procedure for deriving
likelihood functions for arbitrarily complex birth–death(-sampling) models that will
hopefully allow researchers to explore a wider array of scenarios than was previously
possible. By rederiving existing single-type birth–death sampling models, we clarify and
synthesize the range of explicit and implicit assumptions made by these models.
[Birth–death processes; epidemiology; macroevolution; phylogenetics; statistical
inference.]
Collapse
Affiliation(s)
- Ailene MacPherson
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Stilianos Louca
- Department of Biology, University of Oregon, USA.,Institute of Ecology and Evolution, University of Oregon, USA
| | - Angela McLaughlin
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada.,Bioinformatics, University of British Columbia, Vancouver, Canada
| | - Jeffrey B Joy
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada.,Bioinformatics, University of British Columbia, Vancouver, Canada.,Department of Medicine, University of British Columbia, Vancouver, Canada
| | - Matthew W Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| |
Collapse
|
27
|
Gascuel O, Steel M. A Darwinian Uncertainty Principle. Syst Biol 2020; 69:521-529. [PMID: 31432087 PMCID: PMC7188465 DOI: 10.1093/sysbio/syz054] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 08/15/2019] [Indexed: 02/04/2023] Open
Abstract
Reconstructing ancestral characters and traits along a phylogenetic tree is central to evolutionary biology. It is the key to understanding morphology changes among species, inferring ancestral biochemical properties of life, or recovering migration routes in phylogeography. The goal is 2-fold: to reconstruct the character state at the tree root (e.g., the region of origin of some species) and to understand the process of state changes along the tree (e.g., species flow between countries). We deal here with discrete characters, which are “unique,” as opposed to sequence characters (nucleotides or amino-acids), where we assume the same model for all the characters (or for large classes of characters with site-dependent models) and thus benefit from multiple information sources. In this framework, we use mathematics and simulations to demonstrate that although each goal can be achieved with high accuracy individually, it is generally impossible to accurately estimate both the root state and the rates of state changes along the tree branches, from the observed data at the tips of the tree. This is because the global rates of state changes along the branches that are optimal for the two estimation tasks have opposite trends, leading to a fundamental trade-off in accuracy. This inherent “Darwinian uncertainty principle” concerning the simultaneous estimation of “patterns” and “processes” governs ancestral reconstructions in biology. For certain tree shapes (typically speciation trees) the uncertainty of simultaneous estimation is reduced when more tips are present; however, for other tree shapes it does not (e.g., coalescent trees used in population genetics).
Collapse
Affiliation(s)
- Olivier Gascuel
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
28
|
Puillandre N, Brouillet S, Achaz G. ASAP: assemble species by automatic partitioning. Mol Ecol Resour 2020; 21:609-620. [PMID: 33058550 DOI: 10.1111/1755-0998.13281] [Citation(s) in RCA: 397] [Impact Index Per Article: 99.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 11/27/2022]
Abstract
Here, we describe Assemble Species by Automatic Partitioning (ASAP), a new method to build species partitions from single locus sequence alignments (i.e., barcode data sets). ASAP is efficient enough to split data sets as large 104 sequences into putative species in several minutes. Although grounded in evolutionary theory, ASAP is the implementation of a hierarchical clustering algorithm that only uses pairwise genetic distances, avoiding the computational burden of phylogenetic reconstruction. Importantly, ASAP proposes species partitions ranked by a new scoring system that uses no biological prior insight of intraspecific diversity. ASAP is a stand-alone program that can be used either through a graphical web-interface or that can be downloaded and compiled for local usage. We have assessed its power along with three others programs (ABGD, PTP and GMYC) on 10 real COI barcode data sets representing various degrees of challenge (from small and easy cases to large and complicated data sets). We also used Monte-Carlo simulations of a multispecies coalescent framework to assess the strengths and weaknesses of ASAP and the other programs. Through these analyses, we demonstrate that ASAP has the potential to become a major tool for taxonomists as it proposes rapidly in a full graphical exploratory interface relevant species hypothesis as a first step of the integrative taxonomy process.
Collapse
Affiliation(s)
- Nicolas Puillandre
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Sophie Brouillet
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Guillaume Achaz
- Institut Systématique Evolution Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France.,SMILE Group, CIRB, UMR 7241, Collège de France, CNRS, INSERM, Paris, France.,Éco-anthropologie, Muséum National d'Histoire Naturelle, CNRS UMR 7206, Université de Paris, Paris, France
| |
Collapse
|
29
|
Abstract
Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked genealogies without leaf labels unlock opportunities in the analysis of evolutionary trees. In particular, comparisons between ranked genealogies facilitate the study of evolutionary processes of different organisms sampled at multiple time periods. We propose metrics on ranked tree shapes and ranked genealogies for lineages isochronously and heterochronously sampled. Our proposed tree metrics make it possible to conduct statistical analyses of ranked tree shapes and timed ranked tree shapes or ranked genealogies. Such analyses allow us to assess differences in tree distributions, quantify estimation uncertainty, and summarize tree distributions. We show the utility of our metrics via simulations and an application in infectious diseases.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Biology, Stanford University, Stanford, CA 94305
| | | | - Julia A Palacios
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA 94305
| |
Collapse
|
30
|
Manceau M, Marin J, Morlon H, Lambert A. Model-Based Inference of Punctuated Molecular Evolution. Mol Biol Evol 2020; 37:3308-3323. [PMID: 32521005 DOI: 10.1093/molbev/msaa144] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
In standard models of molecular evolution, DNA sequences evolve through asynchronous substitutions according to Poisson processes with a constant rate (called the molecular clock) or a rate that can vary (relaxed clock). However, DNA sequences can also undergo episodes of fast divergence that will appear as synchronous substitutions affecting several sites simultaneously at the macroevolutionary timescale. Here, we develop a model, which we call the Relaxed Clock with Spikes model, combining basal, clock-like molecular substitutions with episodes of fast divergence called spikes arising at speciation events. Given a multiple sequence alignment and its time-calibrated species phylogeny, our model is able to detect speciation events (including hidden ones) cooccurring with spike events and to estimate the probability and amplitude of these spikes on the phylogeny. We identify the conditions under which spikes can be distinguished from the natural variance of the clock-like component of molecular substitutions and from variations of the clock. We apply the method to genes underlying snake venom proteins and identify several spikes at gene-specific locations in the phylogeny. This work should pave the way for analyses relying on whole genomes to inform on modes of species diversification.
Collapse
Affiliation(s)
- Marc Manceau
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U 1050, PSL Research University, Paris, France.,IBENS, Ecole Normale Supérieure, UMR 8197 CNRS, Paris, France.,DBSSE, ETH Zürich, Basel, Switzerland
| | - Julie Marin
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U 1050, PSL Research University, Paris, France
| | - Hélène Morlon
- IBENS, Ecole Normale Supérieure, UMR 8197 CNRS, Paris, France
| | - Amaury Lambert
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U 1050, PSL Research University, Paris, France.,Laboratoire de Probabilités, Statistique et Modélisation (LPSM), Sorbonne Université, CNRS UMR 8001, Paris, France
| |
Collapse
|
31
|
Budd GE, Mann RP. Survival and selection biases in early animal evolution and a source of systematic overestimation in molecular clocks. Interface Focus 2020; 10:20190110. [PMID: 32637066 PMCID: PMC7333906 DOI: 10.1098/rsfs.2019.0110] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2020] [Indexed: 12/21/2022] Open
Abstract
Important evolutionary events such as the Cambrian Explosion have inspired many attempts at explanation: why do they happen when they do? What shapes them, and why do they eventually come to an end? However, much less attention has been paid to the idea of a 'null hypothesis'-that certain features of such diversifications arise simply through their statistical structure. Such statistical features also appear to influence our perception of the timing of these events. Here, we show in particular that study of unusually large clades leads to systematic overestimates of clade ages from some types of molecular clocks, and that the size of this effect may be enough to account for the puzzling mismatches seen between these molecular clocks and the fossil record. Our analysis of the fossil record of the late Ediacaran to Cambrian suggests that it is likely to be recording a true evolutionary radiation of the bilaterians at this time, and that explanations involving various sorts of cryptic origins for the bilaterians do not seem to be necessary.
Collapse
Affiliation(s)
- Graham E. Budd
- Department of Earth Sciences, Palaeobiology, Uppsala University, Villavägen 16, Uppsala 752 36, Sweden
| | - Richard P. Mann
- Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, UK
- The Alan Turing Institute, London NW1 2DB, UK
| |
Collapse
|
32
|
: A unifying framework for modelling evolutionary trees. Theor Popul Biol 2020; 133:38-39. [DOI: 10.1016/j.tpb.2019.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 06/18/2019] [Accepted: 07/01/2019] [Indexed: 11/20/2022]
|
33
|
Harris SC, Johnston SGG, Roberts MI. The coalescent structure of continuous-time Galton–Watson trees. ANN APPL PROBAB 2020. [DOI: 10.1214/19-aap1532] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
A characterisation of the reconstructed birth-death process through time rescaling. Theor Popul Biol 2020; 134:61-76. [PMID: 32439294 DOI: 10.1016/j.tpb.2020.05.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/15/2020] [Accepted: 05/05/2020] [Indexed: 11/23/2022]
Abstract
The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present. We show that a simple, analytic, time rescaling of the RRP provides a straightforward way to derive its inter-event times. The same rescaling characterises other distributions underlying this process, obtained elsewhere in the literature via more cumbersome calculations. We also consider the case of incomplete sampling of the population, in which each leaf of the genealogy is retained with an independent Bernoulli trial with probability ψ, and we show that corresponding results for Bernoulli-sampled RRPs can be derived using time rescaling, for any values of the underlying parameters. A central result is the derivation of a scaling limit as ψ approaches 0, corresponding to the underlying population growing to infinity, using the time rescaling formalism. We show that in this setting, after a linear time rescaling, the event times are the order statistics of n logistic random variables with mode log(1∕ψ); moreover, we show that the inter-event times are approximately exponentially distributed.
Collapse
|
35
|
Extant timetrees are consistent with a myriad of diversification histories. Nature 2020; 580:502-505. [PMID: 32322065 DOI: 10.1038/s41586-020-2176-1] [Citation(s) in RCA: 226] [Impact Index Per Article: 56.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 03/10/2020] [Indexed: 11/09/2022]
Abstract
Time-calibrated phylogenies of extant species (referred to here as 'extant timetrees') are widely used for estimating diversification dynamics1. However, there has been considerable debate surrounding the reliability of these inferences2-5 and, to date, this critical question remains unresolved. Here we clarify the precise information that can be extracted from extant timetrees under the generalized birth-death model, which underlies most existing methods of estimation. We prove that, for any diversification scenario, there exists an infinite number of alternative diversification scenarios that are equally likely to have generated any given extant timetree. These 'congruent' scenarios cannot possibly be distinguished using extant timetrees alone, even in the presence of infinite data. Importantly, congruent diversification scenarios can exhibit markedly different and yet similarly plausible dynamics, which suggests that many previous studies may have over-interpreted phylogenetic evidence. We introduce identifiable and easily interpretable variables that contain all available information about past diversification dynamics, and demonstrate that these can be estimated from extant timetrees. We suggest that measuring and modelling these identifiable variables offers a more robust way to study historical diversification dynamics. Our findings also make it clear that palaeontological data will continue to be crucial for answering some macroevolutionary questions.
Collapse
|
36
|
Abstract
The year 2020 marks the 50th anniversary of Theoretical Population Biology. This special issue examines the past and continuing contributions of the journal. We identify some of the most important developments that have taken place in the pages of TPB, connecting them to current research and to the numerous forms of significance achieved by theory in population biology.
Collapse
|
37
|
Budd GE, Mann RP. The dynamics of stem and crown groups. SCIENCE ADVANCES 2020; 6:eaaz1626. [PMID: 32128421 PMCID: PMC7030935 DOI: 10.1126/sciadv.aaz1626] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 12/03/2019] [Indexed: 05/22/2023]
Abstract
The fossil record of the origins of major groups such as animals and birds has generated considerable controversy, especially when it conflicts with timings based on molecular clock estimates. Here, we model the diversity of "stem" (basal) and "crown" (modern) members of groups using a "birth-death model," the results of which qualitatively match many large-scale patterns seen in the fossil record. Typically, the stem group diversifies rapidly until the crown group emerges, at which point its diversity collapses, followed shortly by its extinction. Mass extinctions can disturb this pattern and create long stem groups such as the dinosaurs. Crown groups are unlikely to emerge either cryptically or just before mass extinctions, in contradiction to popular hypotheses such as the "phylogenetic fuse". The patterns revealed provide an essential context for framing ecological and evolutionary explanations for how major groups originate, and strengthen our confidence in the reliability of the fossil record.
Collapse
Affiliation(s)
- Graham E. Budd
- Department of Earth Sciences, Palaeobiology Programme, Uppsala University, Uppsala, Sweden
- Corresponding author.
| | - Richard P. Mann
- Department of Statistics, School of Mathematics, University of Leeds, Leeds, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
38
|
Dinh KN, Jaksik R, Kimmel M, Lambert A, Tavaré S. Statistical Inference for the Evolutionary History of Cancer Genomes. Stat Sci 2020. [DOI: 10.1214/19-sts7561] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
Laudanno G, Haegeman B, Etienne RS. Additional Analytical Support for a New Method to Compute the Likelihood of Diversification Models. Bull Math Biol 2020; 82:22. [PMID: 31970528 PMCID: PMC6976549 DOI: 10.1007/s11538-020-00698-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Accepted: 12/02/2019] [Indexed: 11/24/2022]
Abstract
Molecular phylogenies have been increasingly recognized as an important source of information on species diversification. For many models of macroevolution, analytical likelihood formulas have been derived to infer macroevolutionary parameters from phylogenies. A few years ago, a general framework to numerically compute such likelihood formulas was proposed, which accommodates models that allow speciation and/or extinction rates to depend on diversity. This framework calculates the likelihood as the probability of the diversification process being consistent with the phylogeny from the root to the tips. However, while some readers found the framework presented in Etienne et al. (Proc R Soc Lond B Biol Sci 279(1732):1300-1309, 2012) convincing, others still questioned it (personal communication), despite numerical evidence that for special cases the framework yields the same (i.e., within double precision) numerical value for the likelihood as analytical formulas do that were independently derived for these special cases. Here we prove analytically that the likelihoods calculated in the new framework are correct for all special cases with known analytical likelihood formula. Our results thus add substantial mathematical support for the overall coherence of the general framework.
Collapse
Affiliation(s)
- Giovanni Laudanno
- Groningen Institute for Evolutionary Life Sciences, Box 11103, 9700 CC, Groningen, The Netherlands.
| | - Bart Haegeman
- Theoretical and Experimental Ecology Station, CNRS and Paul Sabatier University, Moulis, France
| | - Rampal S Etienne
- Groningen Institute for Evolutionary Life Sciences, Box 11103, 9700 CC, Groningen, The Netherlands
| |
Collapse
|
40
|
Stadler T, Steel M. Swapping Birth and Death: Symmetries and Transformations in Phylodynamic Models. Syst Biol 2020; 68:852-858. [PMID: 31135030 PMCID: PMC6701459 DOI: 10.1093/sysbio/syz039] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 05/17/2019] [Indexed: 11/30/2022] Open
Abstract
Stochastic birth–death models provide the foundation for studying and simulating evolutionary trees in phylodynamics. A curious feature of such models is that they exhibit fundamental symmetries when the birth and death rates are interchanged. In this article, we first provide intuitive reasons for these known transformational symmetries. We then show that these transformational symmetries (encoded in algebraic identities) are preserved even when individuals at the present are sampled with some probability. However, these extended symmetries require the death rate parameter to sometimes take a negative value. In the last part of this article, we describe the relevance of these transformations and their application to computational phylodynamics, particularly to maximum likelihood and Bayesian inference methods, as well as to model selection.
Collapse
Affiliation(s)
- Tanja Stadler
- Department for Biosystems Science and Engineering, ETH Zürich, Basel 4058, Switzerland
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch 4800, New Zealand
| |
Collapse
|
41
|
Ho LST, Dinh V, Matsen FA, Suchard MA. On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model. J Math Biol 2019; 80:1119-1138. [PMID: 31754778 DOI: 10.1007/s00285-019-01453-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 11/04/2019] [Indexed: 10/25/2022]
Abstract
Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a 2-state symmetric model for a single binary trait and investigate the theoretical properties of the MLE for the transition rate in the large-tree limit. Here, the large-tree limit is a theoretical scenario where the number of taxa increases to infinity and we can observe the trait values for all species. Specifically, we prove that the MLE converges to the true value under some regularity conditions. These conditions ensure that the tree shape is not too irregular, and holds for many practical scenarios such as trees with bounded edges, trees generated from the Yule (pure birth) process, and trees generated from the coalescent point process. Our result also provides an upper bound for the distance between the MLE and the true value.
Collapse
Affiliation(s)
- Lam Si Tung Ho
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada.
| | - Vu Dinh
- Department of Mathematical Sciences, University of Delaware, Newark, USA
| | - Frederick A Matsen
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, USA
| | - Marc A Suchard
- Departments of Biomathematics, Biostatistics and Human Genetics, University of California, Los Angeles, USA
| |
Collapse
|
42
|
Moshiri N, Mirarab S. A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Syst Biol 2018; 67:475-489. [PMID: 29165679 DOI: 10.1093/sysbio/syx088] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 11/15/2017] [Indexed: 11/14/2022] Open
Abstract
Models of tree evolution have mostly focused on capturing the cladogenesis processes behind speciation. Processes that derive the evolution of genomic elements, such as repeats, are not necessarily captured by these existing models. In this article, we design a model of tree evolution that we call the dual-birth model, and we show how it can be useful in studying the evolution of short Alu repeats found in the human genome in abundance. The dual-birth model extends the traditional birth-only model to have two rates of propagation, one for active nodes that propagate often, and another for inactive nodes, that with a lower rate, activate and start propagating. Adjusting the ratio of the rates controls the expected tree balance. We present several theoretical results under the dual-birth model, introduce parameter estimation techniques, and study the properties of the model in simulations. We then use the dual-birth model to estimate the number of active Alu elements and their rates of propagation and activation in the human genome based on a large phylogenetic tree that we build from close to one million Alu sequences.
Collapse
Affiliation(s)
- Niema Moshiri
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| |
Collapse
|
43
|
Li J, Huang JP, Sukumaran J, Knowles LL. Microevolutionary processes impact macroevolutionary patterns. BMC Evol Biol 2018; 18:123. [PMID: 30097006 PMCID: PMC6086068 DOI: 10.1186/s12862-018-1236-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 08/01/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Macroevolutionary modeling of species diversification plays important roles in inferring large-scale biodiversity patterns. It allows estimation of speciation and extinction rates and statistically testing their relationships with different ecological factors. However, macroevolutionary patterns are ultimately generated by microevolutionary processes acting at population levels, especially when speciation and extinction are considered protracted instead of point events. Neglecting the connection between micro- and macroevolution may hinder our ability to fully understand the underlying mechanisms that drive the observed patterns. RESULTS In this simulation study, we used the protracted speciation framework to demonstrate that distinct microevolutionary scenarios can generate very similar biodiversity patterns (e.g., latitudinal diversity gradient). We also showed that current macroevolutionary models may not be able to distinguish these different scenarios. CONCLUSIONS Given the compounded nature of speciation and extinction rates, one needs to be cautious when inferring causal relationships between ecological factors and macroevolutioanry rates. Future studies that incorporate microevolutionary processes into current modeling approaches are in need.
Collapse
Affiliation(s)
- Jingchun Li
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, USA. .,Museum of Natural History, University of Colorado Boulder, Boulder, USA. .,Museum of Zoology, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, USA.
| | - Jen-Pen Huang
- Integrative Research Center, The Field Museum, Chicago, USA.,Museum of Zoology, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, USA
| | - Jeet Sukumaran
- Museum of Zoology, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, USA
| | - L Lacey Knowles
- Museum of Zoology, Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, USA
| |
Collapse
|
44
|
Wiuf C. Some properties of the conditioned reconstructed process with Bernoulli sampling. Theor Popul Biol 2018; 122:36-45. [DOI: 10.1016/j.tpb.2018.02.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
45
|
|
46
|
Hagen O, Andermann T, Quental TB, Antonelli A, Silvestro D. Estimating Age-Dependent Extinction: Contrasting Evidence from Fossils and Phylogenies. Syst Biol 2018; 67:458-474. [PMID: 29069434 PMCID: PMC5920349 DOI: 10.1093/sysbio/syx082] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 03/03/2017] [Accepted: 10/15/2017] [Indexed: 01/12/2023] Open
Abstract
The estimation of diversification rates is one of the most vividly debated topics in modern systematics, with considerable controversy surrounding the power of phylogenetic and fossil-based approaches in estimating extinction. Van Valen's seminal work from 1973 proposed the "Law of constant extinction," which states that the probability of extinction of taxa is not dependent on their age. This assumption of age-independent extinction has prevailed for decades with its assessment based on survivorship curves, which, however, do not directly account for the incompleteness of the fossil record, and have rarely been applied at the species level. Here, we present a Bayesian framework to estimate extinction rates from the fossil record accounting for age-dependent extinction (ADE). Our approach, unlike previous implementations, explicitly models unobserved species and accounts for the effects of fossil preservation on the observed longevity of sampled lineages. We assess the performance and robustness of our method through extensive simulations and apply it to a fossil data set of terrestrial Carnivora spanning the past 40 myr. We find strong evidence of ADE, as we detect the extinction rate to be highest in young species and declining with increasing species age. For comparison, we apply a recently developed analogous ADE model to a dated phylogeny of extant Carnivora. Although the phylogeny-based analysis also infers ADE, it indicates that the extinction rate, instead, increases with increasing taxon age. The estimated mean species longevity also differs substantially, with the fossil-based analyses estimating 2.0 myr, in contrast to 9.8 myr derived from the phylogeny-based inference. Scrutinizing these discrepancies, we find that both fossil and phylogeny-based ADE models are prone to high error rates when speciation and extinction rates increase or decrease through time. However, analyses of simulated and empirical data show that fossil-based inferences are more robust. This study shows that an accurate estimation of ADE from incomplete fossil data is possible when the effects of preservation are jointly modeled, thus allowing for a reassessment of Van Valen's model as a general rule in macroevolution.
Collapse
Affiliation(s)
- Oskar Hagen
- Swiss Federal Research Institute WSL, 8903 Birmensdorf, Switzerland
- Landscape Ecology, Institute of Terrestrial Ecosystems, ETH Zurich, 8092 Zurich, Switzerland
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
| | - Tobias Andermann
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Göteborg, Sweden
| | - Tiago B Quental
- Departamento de Ecologia, Universidade de São Paulo, 05508-900 São Paulo, Brazil
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Göteborg, Sweden
- Gothenburg Botanical Garden, Carl Skottsbergs gata 22A, SE-413 19 Göteborg, Sweden
| | - Daniele Silvestro
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Göteborg, Sweden
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
47
|
Maliet O, Gascuel F, Lambert A. Ranked Tree Shapes, Nonrandom Extinctions, and the Loss of Phylogenetic Diversity. Syst Biol 2018; 67:1025-1040. [DOI: 10.1093/sysbio/syy030] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 04/08/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Odile Maliet
- Institut de Biologie de l’École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, PSL Research University, Paris, France
- ED 227, Sorbonne Universités, Paris, France
| | - Fanny Gascuel
- Institut de Biologie de l’École Normale Supérieure (IBENS), École Normale Supérieure, CNRS, INSERM, PSL Research University, Paris, France
- ED 227, Sorbonne Universités, Paris, France
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, Paris, France
| | - Amaury Lambert
- Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS, INSERM, PSL Research University, Paris, France
- Laboratoire Probabilités, Statistique et Modélisation (LPSM), Sorbonne Université, CNRS, Paris, France
| |
Collapse
|
48
|
Colijn C, Plazzotta G. A Metric on Phylogenetic Tree Shapes. Syst Biol 2018; 67:113-126. [PMID: 28472435 PMCID: PMC5790134 DOI: 10.1093/sysbio/syx046] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 04/11/2017] [Indexed: 11/15/2022] Open
Abstract
The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees’ branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes.
Collapse
Affiliation(s)
- C Colijn
- Department of Mathematics, Imperial College, 180 Queen's Gate, London SW7 2AZ, UK
| | - G Plazzotta
- Department of Mathematics, Imperial College, 180 Queen's Gate, London SW7 2AZ, UK
| |
Collapse
|
49
|
Simonet C, Scherrer R, Rego-Costa A, Etienne RS. Robustness of the approximate likelihood of the protracted speciation model. J Evol Biol 2017; 31:469-479. [PMID: 29274113 DOI: 10.1111/jeb.13233] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Revised: 11/22/2017] [Accepted: 12/04/2017] [Indexed: 11/29/2022]
Abstract
The protracted speciation model presents a realistic and parsimonious explanation for the observed slowdown in lineage accumulation through time, by accounting for the fact that speciation takes time. A method to compute the likelihood for this model given a phylogeny is available and allows estimation of its parameters (rate of initiation of speciation, rate of completion of speciation and extinction rate) and statistical comparison of this model to other proposed models of diversification. However, this likelihood computation method makes an approximation of the protracted speciation model to be mathematically tractable: it sometimes counts fewer species than one would do from a biological perspective. This approximation may have large consequences for likelihood-based inferences: it may render any conclusions based on this method completely irrelevant. Here, we study to what extent this approximation affects parameter estimations. We simulated phylogenies from which we reconstructed the tree of extant species according to the original, biologically meaningful protracted speciation model and according to the approximation. We then compared the resulting parameter estimates. We found that the differences were larger for high values of extinction rates and small values of speciation-completion rates. Indeed, a long speciation-completion time and a high extinction rate promote the appearance of cases to which the approximation applies. However, surprisingly, the deviation introduced is largely negligible over the parameter space explored, suggesting that this approximate likelihood can be applied reliably in practice to estimate biologically relevant parameters under the original protracted speciation model.
Collapse
Affiliation(s)
- C Simonet
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands
| | - R Scherrer
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands
| | - A Rego-Costa
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands
| | - R S Etienne
- Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
50
|
Steel M, Pourfaraj V, Chaudhary A, Mooers A. Evolutionary isolation and phylogenetic diversity loss under random extinction events. J Theor Biol 2017; 438:151-155. [PMID: 29146280 DOI: 10.1016/j.jtbi.2017.11.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 11/06/2017] [Accepted: 11/08/2017] [Indexed: 11/27/2022]
Abstract
The extinction of species at the present leads to the loss of 'phylogenetic diversity' (PD) from the evolutionary tree in which these species lie. Prior to extinction, the total PD present can be divided up among the species in various ways using measures of evolutionary isolation (such as 'fair proportion' and 'equal splits'). However, the loss of PD when certain combinations of species become extinct can be either larger or smaller than the cumulative loss of the isolation values associated with the extinct species. In this paper, we show that for trees generated under neutral evolutionary models, the loss of PD under a null model of random extinction at the present can be predicted from the loss of the cumulative isolation values, by applying a non-linear transformation that is independent of the tree. Moreover, the error in the prediction provably converges to zero as the size of the tree grows, with simulations showing good agreement even for moderate sized trees (n=64).
Collapse
Affiliation(s)
- Mike Steel
- Biomathematics Resarch Centre, University of Canterbury, Christchurch, New Zealand.
| | - Vahab Pourfaraj
- Department of Biological Sciences and IRMACS, Simon Fraser University, Burnaby, British Columbia, V5A1S6, Canada.
| | - Abhishek Chaudhary
- Institute of Food, Nutrition and Health, ETH Zurich, Schmelzbergstrasse 9, Zurich 8092, Switzerland
| | - Arne Mooers
- Department of Biological Sciences and IRMACS, Simon Fraser University, Burnaby, British Columbia, V5A1S6, Canada
| |
Collapse
|