1
|
Zhang W, Kenney T, Ho LST. Evolutionary shift detection with ensemble variable selection. BMC Ecol Evol 2024; 24:11. [PMID: 38245667 PMCID: PMC10800078 DOI: 10.1186/s12862-024-02201-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 01/10/2024] [Indexed: 01/22/2024] Open
Abstract
Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. The detection performances of different methods are influenced by many factors, including different numbers of shifts, shift sizes, where a shift occurs on a tree, and the types of phylogenetic structure. Furthermore, the model assumptions are oversimplified, so are likely to be violated in real data, which could cause the methods to fail. We perform simulations to assess the effect of these factors on the performance of shift detection methods. To make the comparisons more complete, we also propose an ensemble variable selection method (R package ELPASO) and compare it with existing methods (R packages [Formula: see text]1ou and PhylogeneticEM). The performances of methods are highly dependent on the selection criterion. [Formula: see text]1ou+pBIC is usually the most conservative method and it performs well when signal sizes are large. [Formula: see text]1ou+BIC is the least conservative method and it performs well when signal sizes are small. The ensemble method provides more balanced choices between those two methods. Moreover, the performances of all methods are heavily impacted by measurement error, tree reconstruction error and shifts in variance.
Collapse
Affiliation(s)
- Wensha Zhang
- Department of Mathematics and Statistics, Dalhousie University, Nova Scotia, Canada.
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Nova Scotia, Canada
| | - Lam Si Tung Ho
- Department of Mathematics and Statistics, Dalhousie University, Nova Scotia, Canada
| |
Collapse
|
2
|
Weil SS, Gallien L, Nicolaï MPJ, Lavergne S, Börger L, Allen WL. Body size and life history shape the historical biogeography of tetrapods. Nat Ecol Evol 2023; 7:1467-1479. [PMID: 37604875 PMCID: PMC10482685 DOI: 10.1038/s41559-023-02150-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 07/04/2023] [Indexed: 08/23/2023]
Abstract
Dispersal across biogeographic barriers is a key process determining global patterns of biodiversity as it allows lineages to colonize and diversify in new realms. Here we demonstrate that past biogeographic dispersal events often depended on species' traits, by analysing 7,009 tetrapod species in 56 clades. Biogeographic models incorporating body size or life history accrued more statistical support than trait-independent models in 91% of clades. In these clades, dispersal rates increased by 28-32% for lineages with traits favouring successful biogeographic dispersal. Differences between clades in the effect magnitude of life history on dispersal rates are linked to the strength and type of biogeographic barriers and intra-clade trait variability. In many cases, large body sizes and fast life histories facilitate dispersal success. However, species with small bodies and/or slow life histories, or those with average traits, have an advantage in a minority of clades. Body size-dispersal relationships were related to a clade's average body size and life history strategy. These results provide important new insight into how traits have shaped the historical biogeography of tetrapod lineages and may impact present-day and future biogeographic dispersal.
Collapse
Affiliation(s)
- Sarah-Sophie Weil
- CNRS, Laboratoire d'Ecologie Alpine, University Savoie Mont Blanc, University Grenoble Alpes, Grenoble, France.
- Department of Biosciences, Swansea University, Swansea, UK.
| | - Laure Gallien
- CNRS, Laboratoire d'Ecologie Alpine, University Savoie Mont Blanc, University Grenoble Alpes, Grenoble, France
| | - Michaël P J Nicolaï
- Biology Department, Evolution and Optics of Nanostructures Group, Ghent University, Ghent, Belgium
| | - Sébastien Lavergne
- CNRS, Laboratoire d'Ecologie Alpine, University Savoie Mont Blanc, University Grenoble Alpes, Grenoble, France
| | - Luca Börger
- Department of Biosciences, Swansea University, Swansea, UK
| | | |
Collapse
|
3
|
Martin BS, Bradburd GS, Harmon LJ, Weber MG. Modeling the Evolution of Rates of Continuous Trait Evolution. Syst Biol 2022:6830631. [PMID: 36380474 DOI: 10.1093/sysbio/syac068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Indexed: 11/17/2022] Open
Abstract
Rates of phenotypic evolution vary markedly across the tree of life, from the accelerated evolution apparent in adaptive radiations to the remarkable evolutionary stasis exhibited by so-called "living fossils". Such rate variation has important consequences for large-scale evolutionary dynamics, generating vast disparities in phenotypic diversity across space, time, and taxa. Despite this, most methods for estimating trait evolution rates assume rates vary deterministically with respect to some variable of interest or change infrequently during a clade's history. These assumptions may cause underfitting of trait evolution models and mislead hypothesis testing. Here, we develop a new trait evolution model that allows rates to vary gradually and stochastically across a clade. Further, we extend this model to accommodate generally decreasing or increasing rates over time, allowing for flexible modeling of "early/late bursts" of trait evolution. We implement a Bayesian method, termed "evolving rates" (evorates for short), to efficiently fit this model to comparative data. Through simulation, we demonstrate that evorates can reliably infer both how and in which lineages trait evolution rates varied during a clade's history. We apply this method to body size evolution in cetaceans, recovering substantial support for an overall slowdown in body size evolution over time with recent bursts among some oceanic dolphins and relative stasis among beaked whales of the genus Mesoplodon. These results unify and expand on previous research, demonstrating the empirical utility of evorates.
Collapse
Affiliation(s)
- B S Martin
- Department of Plant Biology, Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
| | - G S Bradburd
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - L J Harmon
- Department of Biological Sciences, Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, ID 83843, USA
| | - M G Weber
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Fisher AA, Hassler GW, Ji X, Baele G, Suchard MA, Lemey P. Scalable Bayesian phylogenetics. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210242. [PMID: 35989603 PMCID: PMC9393558 DOI: 10.1098/rstb.2021.0242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Recent advances in Bayesian phylogenetics offer substantial computational savings to accommodate increased genomic sampling that challenges traditional inference methods. In this review, we begin with a brief summary of the Bayesian phylogenetic framework, and then conceptualize a variety of methods to improve posterior approximations via Markov chain Monte Carlo (MCMC) sampling. Specifically, we discuss methods to improve the speed of likelihood calculations, reduce MCMC burn-in, and generate better MCMC proposals. We apply several of these techniques to study the evolution of HIV virulence along a 1536-tip phylogeny and estimate the internal node heights of a 1000-tip SARS-CoV-2 phylogenetic tree in order to illustrate the speed-up of such analyses using current state-of-the-art approaches. We conclude our review with a discussion of promising alternatives to MCMC that approximate the phylogenetic posterior. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
| | - Gabriel W. Hassler
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
| | - Xiang Ji
- Department of Mathematics, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| | - Marc A. Suchard
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA,Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
5
|
Hassler GW, Magee A, Zhang Z, Baele G, Lemey P, Ji X, Fourment M, Suchard MA. Data integration in Bayesian phylogenetics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 10:353-377. [PMID: 38774036 PMCID: PMC11108065 DOI: 10.1146/annurev-statistics-033021-112532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.
Collapse
Affiliation(s)
- Gabriel W Hassler
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
| | - Andrew Magee
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Zhenyu Zhang
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, USA, 70118
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo NSW, Australia, 2007
| | - Marc A Suchard
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
- Department of Human Genetics, University of California, Los Angeles, USA, 90095
| |
Collapse
|
6
|
Dellicour S, Gill MS, Faria NR, Rambaut A, Pybus OG, Suchard MA, Lemey P. Relax, Keep Walking - A Practical Guide to Continuous Phylogeographic Inference with BEAST. Mol Biol Evol 2021; 38:3486-3493. [PMID: 33528560 PMCID: PMC8321535 DOI: 10.1093/molbev/msab031] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Spatially explicit phylogeographic analyses can be performed with an inference framework that employs relaxed random walks to reconstruct phylogenetic dispersal histories in continuous space. This core model was first implemented 10 years ago and has opened up new opportunities in the field of phylodynamics, allowing researchers to map and analyze the spatial dissemination of rapidly evolving pathogens. We here provide a detailed and step-by-step guide on how to set up, run, and interpret continuous phylogeographic analyses using the programs BEAUti, BEAST, Tracer, and TreeAnnotator.
Collapse
Affiliation(s)
- Simon Dellicour
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, Bruxelles, Belgium
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Mandev S Gill
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Nuno R Faria
- MRC Centre for Global Infectious Disease Analysis, J-IDEA, Imperial College London, London, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- Instituto de Medicina Tropical, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| |
Collapse
|
7
|
Fisher AA, Ji X, Zhang Z, Lemey P, Suchard MA. Relaxed Random Walks at Scale. Syst Biol 2021; 70:258-267. [PMID: 32687171 PMCID: PMC7875444 DOI: 10.1093/sysbio/syaa056] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 07/09/2020] [Indexed: 11/14/2022] Open
Abstract
Relaxed random walk (RRW) models of trait evolution introduce branch-specific rate multipliers to modulate the variance of a standard Brownian diffusion process along a phylogeny and more accurately model overdispersed biological data. Increased taxonomic sampling challenges inference under RRWs as the number of unknown parameters grows with the number of taxa. To solve this problem, we present a scalable method to efficiently fit RRWs and infer this branch-specific variation in a Bayesian framework. We develop a Hamiltonian Monte Carlo (HMC) sampler to approximate the high-dimensional, correlated posterior that exploits a closed-form evaluation of the gradient of the trait data log-likelihood with respect to all branch-rate multipliers simultaneously. Our gradient calculation achieves computational complexity that scales only linearly with the number of taxa under study. We compare the efficiency of our HMC sampler to the previously standard univariable Metropolis-Hastings approach while studying the spatial emergence of the West Nile virus in North America in the early 2000s. Our method achieves at least a 6-fold speed increase over the univariable approach. Additionally, we demonstrate the scalability of our method by applying the RRW to study the correlation between five mammalian life history traits in a phylogenetic tree with $3650$ tips.[Bayesian inference; BEAST; Hamiltonian Monte Carlo; life history; phylodynamics, relaxed random walk.].
Collapse
Affiliation(s)
- Alexander A Fisher
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| | - Xiang Ji
- Department of Mathematics, School of Science & Engineering, Tulane University, USA
| | - Zhenyu Zhang
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA, USA
| |
Collapse
|