1
|
Khurana MP, Scheidwasser-Clow N, Penn MJ, Bhatt S, Duchêne DA. The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference. Syst Biol 2024; 73:235-246. [PMID: 38153910 PMCID: PMC11129600 DOI: 10.1093/sysbio/syad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/20/2023] [Accepted: 12/27/2023] [Indexed: 12/30/2023] Open
Abstract
Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.
Collapse
Affiliation(s)
- Mark P Khurana
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Neil Scheidwasser-Clow
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Matthew J Penn
- Department of Statistics, University of Oxford, OX1 3LB, Oxford, UK
| | - Samir Bhatt
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, SW7 2AZ, London, UK
| | - David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, 1352 Copenhagen, Denmark
| |
Collapse
|
2
|
Henao-Diaz LF, Pennell M. The Major Features of Macroevolution. Syst Biol 2023; 72:1188-1198. [PMID: 37248967 DOI: 10.1093/sysbio/syad032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 05/02/2023] [Accepted: 05/29/2023] [Indexed: 05/31/2023] Open
Abstract
Evolutionary dynamics operating across deep time leave footprints in the shapes of phylogenetic trees. For the last several decades, researchers have used increasingly large and robust phylogenies to study the evolutionary history of individual clades and to investigate the causes of the glaring disparities in diversity among groups. Whereas typically not the focal point of individual clade-level studies, many researchers have remarked on recurrent patterns that have been observed across many different groups and at many different time scales. Whereas previous studies have documented various such regularities in topology and branch length distributions, they have typically focused on a single pattern and used a disparate collection (oftentimes, of quite variable reliability) of trees to assess it. Here we take advantage of modern megaphylogenies and unify previous disparate observations about the shapes embedded in the Tree of Life to create a catalog of the "major features of macroevolution." By characterizing such a large swath of subtrees in a consistent way, we hope to provide a set of phenomena that process-based macroevolutionary models of diversification ought to seek to explain.
Collapse
Affiliation(s)
- L Francisco Henao-Diaz
- Department of Ecology and Evolution, University of Chicago, Chicago, USA
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
| | - Matt Pennell
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
- Department of Biological Sciences, University of Southern California, Los Angeles, USA
| |
Collapse
|
3
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
4
|
Moshiri N, Mirarab S. A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition. Syst Biol 2018; 67:475-489. [PMID: 29165679 DOI: 10.1093/sysbio/syx088] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 11/15/2017] [Indexed: 11/14/2022] Open
Abstract
Models of tree evolution have mostly focused on capturing the cladogenesis processes behind speciation. Processes that derive the evolution of genomic elements, such as repeats, are not necessarily captured by these existing models. In this article, we design a model of tree evolution that we call the dual-birth model, and we show how it can be useful in studying the evolution of short Alu repeats found in the human genome in abundance. The dual-birth model extends the traditional birth-only model to have two rates of propagation, one for active nodes that propagate often, and another for inactive nodes, that with a lower rate, activate and start propagating. Adjusting the ratio of the rates controls the expected tree balance. We present several theoretical results under the dual-birth model, introduce parameter estimation techniques, and study the properties of the model in simulations. We then use the dual-birth model to estimate the number of active Alu elements and their rates of propagation and activation in the human genome based on a large phylogenetic tree that we build from close to one million Alu sequences.
Collapse
Affiliation(s)
- Niema Moshiri
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| |
Collapse
|
5
|
Hagen O, Andermann T, Quental TB, Antonelli A, Silvestro D. Estimating Age-Dependent Extinction: Contrasting Evidence from Fossils and Phylogenies. Syst Biol 2018; 67:458-474. [PMID: 29069434 PMCID: PMC5920349 DOI: 10.1093/sysbio/syx082] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 03/03/2017] [Accepted: 10/15/2017] [Indexed: 01/12/2023] Open
Abstract
The estimation of diversification rates is one of the most vividly debated topics in modern systematics, with considerable controversy surrounding the power of phylogenetic and fossil-based approaches in estimating extinction. Van Valen's seminal work from 1973 proposed the "Law of constant extinction," which states that the probability of extinction of taxa is not dependent on their age. This assumption of age-independent extinction has prevailed for decades with its assessment based on survivorship curves, which, however, do not directly account for the incompleteness of the fossil record, and have rarely been applied at the species level. Here, we present a Bayesian framework to estimate extinction rates from the fossil record accounting for age-dependent extinction (ADE). Our approach, unlike previous implementations, explicitly models unobserved species and accounts for the effects of fossil preservation on the observed longevity of sampled lineages. We assess the performance and robustness of our method through extensive simulations and apply it to a fossil data set of terrestrial Carnivora spanning the past 40 myr. We find strong evidence of ADE, as we detect the extinction rate to be highest in young species and declining with increasing species age. For comparison, we apply a recently developed analogous ADE model to a dated phylogeny of extant Carnivora. Although the phylogeny-based analysis also infers ADE, it indicates that the extinction rate, instead, increases with increasing taxon age. The estimated mean species longevity also differs substantially, with the fossil-based analyses estimating 2.0 myr, in contrast to 9.8 myr derived from the phylogeny-based inference. Scrutinizing these discrepancies, we find that both fossil and phylogeny-based ADE models are prone to high error rates when speciation and extinction rates increase or decrease through time. However, analyses of simulated and empirical data show that fossil-based inferences are more robust. This study shows that an accurate estimation of ADE from incomplete fossil data is possible when the effects of preservation are jointly modeled, thus allowing for a reassessment of Van Valen's model as a general rule in macroevolution.
Collapse
Affiliation(s)
- Oskar Hagen
- Swiss Federal Research Institute WSL, 8903 Birmensdorf, Switzerland
- Landscape Ecology, Institute of Terrestrial Ecosystems, ETH Zurich, 8092 Zurich, Switzerland
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
| | - Tobias Andermann
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Göteborg, Sweden
| | - Tiago B Quental
- Departamento de Ecologia, Universidade de São Paulo, 05508-900 São Paulo, Brazil
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Göteborg, Sweden
- Gothenburg Botanical Garden, Carl Skottsbergs gata 22A, SE-413 19 Göteborg, Sweden
| | - Daniele Silvestro
- Department of Biological and Environmental Sciences, University of Gothenburg, SE-405 30 Göteborg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Göteborg, Sweden
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
6
|
Holman EW. Age-Dependent and Lineage-Dependent Speciation and Extinction in the Imbalance of Phylogenetic Trees. Syst Biol 2017; 66:912-916. [PMID: 28169404 DOI: 10.1093/sysbio/syx031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 01/25/2017] [Indexed: 11/13/2022] Open
Abstract
It is known that phylogenetic trees are more imbalanced than expected from a birth-death model with constant rates of speciation and extinction, and also that imbalance can be better fit by allowing the rate of speciation to decrease as the age of the parent species increases. If imbalance is measured in more detail, at nodes within trees as a function of the number of species descended from the nodes, age-dependent models predict levels of imbalance comparable to real trees for small numbers of descendent species, but predicted imbalance approaches an asymptote not found in real trees as the number of descendent species becomes large. Age-dependence must therefore be complemented by another process such as inheritance of different rates along different lineages, which is known to predict insufficient imbalance at nodes with few descendent species, but can predict increasing imbalance with increasing numbers of descendent species. [Crump-Mode-Jagers process; diversification; macroevolution; taxon sampling; tree of life.].
Collapse
Affiliation(s)
- Eric W Holman
- Department of Psychology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
7
|
Sainudiin R, Véber A. A Beta-splitting model for evolutionary trees. ROYAL SOCIETY OPEN SCIENCE 2016; 3:160016. [PMID: 27293780 PMCID: PMC4892442 DOI: 10.1098/rsos.160016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 04/13/2016] [Indexed: 06/06/2023]
Abstract
In this article, we construct a generalization of the Blum-François Beta-splitting model for evolutionary trees, which was itself inspired by Aldous' Beta-splitting model on cladograms. The novelty of our approach allows for asymmetric shares of diversification rates (or diversification 'potential') between two sister species in an evolutionarily interpretable manner, as well as the addition of extinction to the model in a natural way. We describe the incremental evolutionary construction of a tree with n leaves by splitting or freezing extant lineages through the generating, organizing and deleting processes. We then give the probability of any (binary rooted) tree under this model with no extinction, at several resolutions: ranked planar trees giving asymmetric roles to the first and second offspring species of a given species and keeping track of the order of the speciation events occurring during the creation of the tree, unranked planar trees, ranked non-planar trees and finally (unranked non-planar) trees. We also describe a continuous-time equivalent of the generating, organizing and deleting processes where tree topology and branch lengths are jointly modelled and provide code in SageMath/Python for these algorithms.
Collapse
Affiliation(s)
- Raazesh Sainudiin
- School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8041, New Zealand
| | - Amandine Véber
- CMAP-CNRS, Ecole Polytechnique, 91128 Palaiseau Cedex, France
| |
Collapse
|
8
|
Keller-Schmidt S, Tuğrul M, Eguíluz VM, Hernández-García E, Klemm K. Anomalous scaling in an age-dependent branching model. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:022803. [PMID: 25768548 DOI: 10.1103/physreve.91.022803] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Indexed: 06/04/2023]
Abstract
We introduce a one-parametric family of tree growth models, in which branching probabilities decrease with branch age τ as τ(-α). Depending on the exponent α, the scaling of tree depth with tree size n displays a transition between the logarithmic scaling of random trees and an algebraic growth. At the transition (α=1) tree depth grows as (logn)(2). This anomalous scaling is in good agreement with the trend observed in evolution of biological species, thus providing a theoretical support for age-dependent speciation and associating it to the occurrence of a critical point.
Collapse
Affiliation(s)
- Stephanie Keller-Schmidt
- Bioinformatics, Institute of Computer Science, University Leipzig, Härtelstr. 16-18, 04107 Leipzig, Germany
| | - Murat Tuğrul
- IST Austria, Am Campus 1, 3400 Klosterneuburg, Austria
| | - Víctor M Eguíluz
- IFISC (CSIC-UIB), Instituto de Física Interdisciplinar y Sistemas Complejos, E-07122 Palma de Mallorca, Spain
| | - Emilio Hernández-García
- IFISC (CSIC-UIB), Instituto de Física Interdisciplinar y Sistemas Complejos, E-07122 Palma de Mallorca, Spain
| | - Konstantin Klemm
- Bioinformatics, Institute of Computer Science, University Leipzig, Härtelstr. 16-18, 04107 Leipzig, Germany
- Bioinformatics and Computational Biology, University of Vienna, Währingerstraße 29, 1090 Vienna, Austria
- Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria
- School of Science and Technology, Nazarbayev University, Kabanbay Batyr Ave. 53, 010000 Astana, Kazakhstan
| |
Collapse
|
9
|
Kim KM, Nasir A, Caetano-Anollés G. The importance of using realistic evolutionary models for retrodicting proteomes. Biochimie 2014; 99:129-37. [DOI: 10.1016/j.biochi.2013.11.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/22/2013] [Indexed: 01/16/2023]
|
10
|
Root location in random trees: a polarity property of all sampling consistent phylogenetic models except one. Mol Phylogenet Evol 2012; 65:345-8. [PMID: 22772025 DOI: 10.1016/j.ympev.2012.06.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Revised: 06/06/2012] [Accepted: 06/25/2012] [Indexed: 11/21/2022]
Abstract
Neutral macroevolutionary models, such as the Yule model, give rise to a probability distribution on the set of discrete rooted binary trees over a given leaf set. Such models can provide a signal as to the approximate location of the root when only the unrooted phylogenetic tree is known, and this signal becomes relatively more significant as the number of leaves grows. In this short note, we show that among models that treat all taxa equally, and are sampling consistent (i.e. the distribution on trees is not affected by taxa yet to be included), all such models, except one (the so-called PDA model), convey some information as to the location of the ancestral root in an unrooted tree.
Collapse
|