1
|
Barido-Sottani J, Schwery O, Warnock RCM, Zhang C, Wright AM. Practical guidelines for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC). OPEN RESEARCH EUROPE 2024; 3:204. [PMID: 38481771 PMCID: PMC10933576 DOI: 10.12688/openreseurope.16679.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 07/30/2024] [Indexed: 06/06/2024]
Abstract
Phylogenetic estimation is, and has always been, a complex endeavor. Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, typically by using a model of evolution. Values for all model parameters need to be evaluated as well. Modern statistical methods involve not just the estimation of a tree, but also solutions to more complex models involving fossil record information and other data sources. Markov chain Monte Carlo (MCMC) is a leading method for approximating the posterior distribution of parameters in a mathematical model. It is deployed in all Bayesian phylogenetic tree estimation software. While many researchers use MCMC in phylogenetic analyses, interpreting results and diagnosing problems with MCMC remain vexing issues to many biologists. In this manuscript, we will offer an overview of how MCMC is used in Bayesian phylogenetic inference, with a particular emphasis on complex hierarchical models, such as the fossilized birth-death (FBD) model. We will discuss strategies to diagnose common MCMC problems and troubleshoot difficult analyses, in particular convergence issues. We will show how the study design, the choice of models and priors, but also technical features of the inference tools themselves can all be adjusted to obtain the best results. Finally, we will also discuss the unique challenges created by the incorporation of fossil information in phylogenetic inference, and present tips to address them.
Collapse
Affiliation(s)
- Joëlle Barido-Sottani
- Institut de Biologie de l’ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, Paris, Île-de-France, 75005, France
| | - Orlando Schwery
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, 70402, USA
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, 24061, USA
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, 70803, USA
| | - Rachel C. M. Warnock
- GeoZentrum Nordbayern, Department of Geography and Geosciences, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Bavaria, 91054, Germany
| | - Chi Zhang
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, 100044, China
| | - April Marie Wright
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, 70402, USA
| |
Collapse
|
2
|
Khurana MP, Scheidwasser-Clow N, Penn MJ, Bhatt S, Duchêne DA. The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference. Syst Biol 2024; 73:235-246. [PMID: 38153910 PMCID: PMC11129600 DOI: 10.1093/sysbio/syad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/20/2023] [Accepted: 12/27/2023] [Indexed: 12/30/2023] Open
Abstract
Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.
Collapse
Affiliation(s)
- Mark P Khurana
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Neil Scheidwasser-Clow
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
| | - Matthew J Penn
- Department of Statistics, University of Oxford, OX1 3LB, Oxford, UK
| | - Samir Bhatt
- Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, SW7 2AZ, London, UK
| | - David A Duchêne
- Centre for Evolutionary Hologenomics, University of Copenhagen, 1352 Copenhagen, Denmark
| |
Collapse
|
3
|
Billenstein RJ, Höhna S. Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories. Mol Biol Evol 2024; 41:msae073. [PMID: 38630635 PMCID: PMC11068272 DOI: 10.1093/molbev/msae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 02/16/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024] Open
Abstract
Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.
Collapse
Affiliation(s)
- Ronja J Billenstein
- GeoBio-Center, Ludwig-Maximilians-Universität München, Munich 80333, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Munich 80333, Germany
| | - Sebastian Höhna
- GeoBio-Center, Ludwig-Maximilians-Universität München, Munich 80333, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Munich 80333, Germany
| |
Collapse
|
4
|
Vaughan TG. ReMASTER: improved phylodynamic simulation for BEAST 2.7. Bioinformatics 2024; 40:btae015. [PMID: 38195927 PMCID: PMC10796175 DOI: 10.1093/bioinformatics/btae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/30/2023] [Accepted: 01/08/2024] [Indexed: 01/11/2024] Open
Abstract
SUMMARY Phylodynamic models link phylogenetic trees to biologically-relevant parameters such as speciation and extinction rates (macroevolution), effective population sizes and migration rates (ecology and phylogeography), and transmission and removal/recovery rates (epidemiology) to name a few. Being able to simulate phylogenetic trees and population dynamics under these models is the basis for (i) developing and testing of phylodynamic inference algorithms, (ii) performing simulation studies which quantify the biases stemming from model-misspecification, and (iii) performing so-called model adequacy assessments by simulating samples from the posterior predictive distribution. Here I introduce ReMASTER, a package for the phylogenetic inference platform BEAST 2 that provides a simple and efficient approach to specifying and simulating the phylogenetic trees and population dynamics arising from phylodynamic models. Being a component of BEAST 2 allows ReMASTER to also form the basis of joint simulation and inference analyses. ReMASTER is a complete rewrite of an earlier package, MASTER, and boasts improved efficiency, ease of use, flexibility of model specification, and deeper integration with BEAST 2. AVAILABILITY AND IMPLEMENTATION ReMASTER can be installed directly from the BEAST 2 package manager, and its documentation is available online at https://tgvaughan.github.io/remaster. ReMASTER is free software, and is distributed under version 3 of the GNU General Public License. The Java source code for ReMASTER is available from https://github.com/tgvaughan/remaster.
Collapse
Affiliation(s)
- Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4056, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
5
|
May MR, Rothfels CJ. Diversification Models Conflate Likelihood and Prior, and Cannot be Compared Using Conventional Model-Comparison Tools. Syst Biol 2023; 72:713-722. [PMID: 36897743 DOI: 10.1093/sysbio/syad010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 02/14/2023] [Accepted: 02/28/2023] [Indexed: 03/11/2023] Open
Abstract
Time-calibrated phylogenetic trees are a tremendously powerful tool for studying evolutionary, ecological, and epidemiological phenomena. Such trees are predominantly inferred in a Bayesian framework, with the phylogeny itself treated as a parameter with a prior distribution (a "tree prior"). However, we show that the tree "parameter" consists, in part, of data, in the form of taxon samples. Treating the tree as a parameter fails to account for these data and compromises our ability to compare among models using standard techniques (e.g., marginal likelihoods estimated using path-sampling and stepping-stone sampling algorithms). Since accuracy of the inferred phylogeny strongly depends on how well the tree prior approximates the true diversification process that gave rise to the tree, the inability to accurately compare competing tree priors has broad implications for applications based on time-calibrated trees. We outline potential remedies to this problem, and provide guidance for researchers interested in assessing the fit of tree models. [Bayes factors; Bayesian model comparison; birth-death models; divergence-time estimation; lineage diversification].
Collapse
Affiliation(s)
- Michael R May
- Department of Integrative Biology, University of California, Berkeley, CA, USA
- University Herbarium and Department of Integrative Biology, University of California, Berkeley, CA, USA
| | - Carl J Rothfels
- University Herbarium and Department of Integrative Biology, University of California, Berkeley, CA, USA
- Intermountain Herbarium, Ecology Center, and Biology Department, Utah State University, Logan, UT, USA
| |
Collapse
|
6
|
Quintero I, Landis MJ, Jetz W, Morlon H. The build-up of the present-day tropical diversity of tetrapods. Proc Natl Acad Sci U S A 2023; 120:e2220672120. [PMID: 37159475 PMCID: PMC10194011 DOI: 10.1073/pnas.2220672120] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 04/04/2023] [Indexed: 05/11/2023] Open
Abstract
The extraordinary number of species in the tropics when compared to the extra-tropics is probably the most prominent and consistent pattern in biogeography, suggesting that overarching processes regulate this diversity gradient. A major challenge to characterizing which processes are at play relies on quantifying how the frequency and determinants of tropical and extra-tropical speciation, extinction, and dispersal events shaped evolutionary radiations. We address this question by developing and applying spatiotemporal phylogenetic and paleontological models of diversification for tetrapod species incorporating paleoenvironmental variation. Our phylogenetic model results show that area, energy, or species richness did not uniformly affect speciation rates across tetrapods and dispute expectations of a latitudinal gradient in speciation rates. Instead, both neontological and fossil evidence coincide in underscoring the role of extra-tropical extinctions and the outflow of tropical species in shaping biodiversity. These diversification dynamics accurately predict present-day levels of species richness across latitudes and uncover temporal idiosyncrasies but spatial generality across the major tetrapod radiations.
Collapse
Affiliation(s)
- Ignacio Quintero
- Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université Paris Science & Lettres, Paris75005, France
| | - Michael J. Landis
- Landis Lab, Department of Biology, Washington University in St. Louis, St. Louis, MO63130
| | - Walter Jetz
- Jetz Lab, Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT06511
- Center for Biodiversity and Global Change, Yale University, New Haven, CT06511
| | - Hélène Morlon
- Institut de Biologie de l’ENS, Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université Paris Science & Lettres, Paris75005, France
| |
Collapse
|
7
|
Didelot X, Franceschi V, Frost SDW, Dennis A, Volz EM. Model design for nonparametric phylodynamic inference and applications to pathogen surveillance. Virus Evol 2023; 9:vead028. [PMID: 37229349 PMCID: PMC10205094 DOI: 10.1093/ve/vead028] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 04/17/2023] [Accepted: 04/26/2023] [Indexed: 05/27/2023] Open
Abstract
Inference of effective population size from genomic data can provide unique information about demographic history and, when applied to pathogen genetic data, can also provide insights into epidemiological dynamics. The combination of nonparametric models for population dynamics with molecular clock models which relate genetic data to time has enabled phylodynamic inference based on large sets of time-stamped genetic sequence data. The methodology for nonparametric inference of effective population size is well-developed in the Bayesian setting, but here we develop a frequentist approach based on nonparametric latent process models of population size dynamics. We appeal to statistical principles based on out-of-sample prediction accuracy in order to optimize parameters that control shape and smoothness of the population size over time. Our methodology is implemented in a new R package entitled mlesky. We demonstrate the flexibility and speed of this approach in a series of simulation experiments and apply the methodology to a dataset of HIV-1 in the USA. We also estimate the impact of non-pharmaceutical interventions for COVID-19 in England using thousands of SARS-CoV-2 sequences. By incorporating a measure of the strength of these interventions over time within the phylodynamic model, we estimate the impact of the first national lockdown in the UK on the epidemic reproduction number.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, United Kingdom
| | - Vinicius Franceschi
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | | | - Ann Dennis
- Department of Medicine, University of North Carolina, USA
| | - Erik M Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| |
Collapse
|
8
|
Duvvuri VR, Hicks JT, Damodaran L, Grunnill M, Braukmann T, Wu J, Gubbay JB, Patel SN, Bahl J. Comparing the transmission potential from sequence and surveillance data of 2009 North American influenza pandemic waves. Infect Dis Model 2023; 8:240-252. [PMID: 36844759 PMCID: PMC9944206 DOI: 10.1016/j.idm.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/10/2023] [Accepted: 02/15/2023] [Indexed: 02/18/2023] Open
Abstract
Technological advancements in phylodynamic modeling coupled with the accessibility of real-time pathogen genetic data are increasingly important for understanding the infectious disease transmission dynamics. In this study, we compare the transmission potentials of North American influenza A(H1N1)pdm09 derived from sequence data to that derived from surveillance data. The impact of the choice of tree-priors, informative epidemiological priors, and evolutionary parameters on the transmission potential estimation is evaluated. North American Influenza A(H1N1)pdm09 hemagglutinin (HA) gene sequences are analyzed using the coalescent and birth-death tree prior models to estimate the basic reproduction number (R 0 ). Epidemiological priors gathered from published literature are used to simulate the birth-death skyline models. Path-sampling marginal likelihood estimation is conducted to assess model fit. A bibliographic search to gather surveillance-based R 0 values were consistently lower (mean ≤ 1.2) when estimated by coalescent models than by the birth-death models with informative priors on the duration of infectiousness (mean ≥ 1.3 to ≤2.88 days). The user-defined informative priors for use in the birth-death model shift the directionality of epidemiological and evolutionary parameters compared to non-informative estimates. While there was no certain impact of clock rate and tree height on the R 0 estimation, an opposite relationship was observed between coalescent and birth-death tree priors. There was no significant difference (p = 0.46) between the birth-death model and surveillance R 0 estimates. This study concludes that tree-prior methodological differences may have a substantial impact on the transmission potential estimation as well as the evolutionary parameters. The study also reports a consensus between the sequence-based R 0 estimation and surveillance-based R 0 estimates. Altogether, these outcomes shed light on the potential role of phylodynamic modeling to augment existing surveillance and epidemiological activities to better assess and respond to emerging infectious diseases.
Collapse
Affiliation(s)
- Venkata R. Duvvuri
- Public Health Ontario, Toronto, Ontario, Canada,Department of Laboratory Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada,Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada,Center for the Ecology of Infectious Disease, Department of Infectious Diseases, Institute of Bioinformatics, University of Georgia, Athens, Georgia,Department of Epidemiology and Biostatistics, Institute of Bioinformatics, University of Georgia, Athens, Georgia,Corresponding author. Public Health Ontario, Toronto, Ontario, Canada.
| | - Joseph T. Hicks
- Center for the Ecology of Infectious Disease, Department of Infectious Diseases, Institute of Bioinformatics, University of Georgia, Athens, Georgia,Department of Epidemiology and Biostatistics, Institute of Bioinformatics, University of Georgia, Athens, Georgia
| | - Lambodhar Damodaran
- Center for the Ecology of Infectious Disease, Department of Infectious Diseases, Institute of Bioinformatics, University of Georgia, Athens, Georgia,Department of Epidemiology and Biostatistics, Institute of Bioinformatics, University of Georgia, Athens, Georgia
| | - Martin Grunnill
- Public Health Ontario, Toronto, Ontario, Canada,Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada
| | | | - Jianhong Wu
- Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada
| | - Jonathan B. Gubbay
- Public Health Ontario, Toronto, Ontario, Canada,Department of Laboratory Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Samir N. Patel
- Public Health Ontario, Toronto, Ontario, Canada,Department of Laboratory Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Justin Bahl
- Center for the Ecology of Infectious Disease, Department of Infectious Diseases, Institute of Bioinformatics, University of Georgia, Athens, Georgia,Department of Epidemiology and Biostatistics, Institute of Bioinformatics, University of Georgia, Athens, Georgia,Duke-NUS Graduate Medical School, Singapore,Corresponding author. Center for the Ecology of Infectious Disease, Department of Infectious Diseases, Institute of Bioinformatics, University of Georgia, Athens, Georgia, USA.
| |
Collapse
|
9
|
Seidel S, Stadler T. TiDeTree: a Bayesian phylogenetic framework to estimate single-cell trees and population dynamic parameters from genetic lineage tracing data. Proc Biol Sci 2022; 289:20221844. [PMID: 36350216 PMCID: PMC9653226 DOI: 10.1098/rspb.2022.1844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
The development of organisms and tissues is dictated by an elaborate balance between cell division, apoptosis and differentiation: the cell population dynamics. To quantify these dynamics, we propose a phylodynamic inference approach based on single-cell lineage recorder data. We developed a Bayesian phylogenetic framework-time-scaled developmental trees (TiDeTree)-that uses lineage recorder data to estimate time-scaled single-cell trees. By implementing TiDeTree within BEAST 2, we enable joint inference of the time-scaled trees and the cell population dynamics. We validated TiDeTree using simulations and showed that performance further improves when including multiple independent sources of information into the inference, such as frequencies of editing outcomes or experimental replicates. We benchmarked TiDeTree against state-of-the-art methods and show comparable performance in terms of tree topology, plus direct assessment of uncertainty and co-estimation of additional parameters. To demonstrate TiDeTree's use in practice, we analysed a public dataset containing lineage data from approximately 100 stem cell colonies. We estimated a time-scaled phylogeny for each colony; as well as the cell division and apoptosis rates underlying the growth dynamics of all colonies. We envision that TiDeTree will find broad application in the analysis of single-cell lineage tracing data, which will improve our understanding of cellular processes during development.
Collapse
Affiliation(s)
- Sophie Seidel
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
10
|
Revell LJ, Toyama KS, Mahler DL. A simple hierarchical model for heterogeneity in the evolutionary correlation on a phylogenetic tree. PeerJ 2022; 10:e13910. [PMID: 35999851 PMCID: PMC9393011 DOI: 10.7717/peerj.13910] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/27/2022] [Indexed: 01/19/2023] Open
Abstract
Numerous questions in phylogenetic comparative biology revolve around the correlated evolution of two or more phenotypic traits on a phylogeny. In many cases, it may be sufficient to assume a constant value for the evolutionary correlation between characters across all the clades and branches of the tree. Under other circumstances, however, it is desirable or necessary to account for the possibility that the evolutionary correlation differs through time or in different sections of the phylogeny. Here, we present a method designed to fit a hierarchical series of models for heterogeneity in the evolutionary rates and correlation of two quantitative traits on a phylogenetic tree. We apply the method to two datasets: one for different attributes of the buccal morphology in sunfishes (Centrarchidae); and a second for overall body length and relative body depth in rock- and non-rock-dwelling South American iguanian lizards. We also examine the performance of the method for parameter estimation and model selection using a small set of numerical simulations.
Collapse
Affiliation(s)
- Liam J. Revell
- Department of Biology, University of Massachusetts Boston, Boston, MA, USA
- Facultad de Ciencias, Universidad Católica de la Santísima Concepción, Concepción, Chile
| | - Ken S. Toyama
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - D. Luke Mahler
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
11
|
Fonseca EM, Duckett DJ, Almeida FG, Smith ML, Thomé MTC, Carstens BC. Assessing model adequacy for Bayesian Skyline plots using posterior predictive simulation. PLoS One 2022; 17:e0269438. [PMID: 35877611 PMCID: PMC9312427 DOI: 10.1371/journal.pone.0269438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 05/23/2022] [Indexed: 11/30/2022] Open
Abstract
Bayesian skyline plots (BSPs) are a useful tool for making inferences about demographic history. For example, researchers typically apply BSPs to test hypotheses regarding how climate changes have influenced intraspecific genetic diversity over time. Like any method, BSP has assumptions that may be violated in some empirical systems (e.g., the absence of population genetic structure), and the naïve analysis of data collected from these systems may lead to spurious results. To address these issues, we introduce P2C2M.Skyline, an R package designed to assess model adequacy for BSPs using posterior predictive simulation. P2C2M.Skyline uses a phylogenetic tree and the log file output from Bayesian Skyline analyses to simulate posterior predictive datasets and then compares this null distribution to statistics calculated from the empirical data to check for model violations. P2C2M.Skyline was able to correctly identify model violations when simulated datasets were generated assuming genetic structure, which is a clear violation of BSP model assumptions. Conversely, P2C2M.Skyline showed low rates of false positives when models were simulated under the BSP model. We also evaluate the P2C2M.Skyline performance in empirical systems, where we detected model violations when DNA sequences from multiple populations were lumped together. P2C2M.Skyline represents a user-friendly and computationally efficient resource for researchers aiming to make inferences from BSP.
Collapse
Affiliation(s)
- Emanuel M. Fonseca
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, United States of America
- Museum of Biological Diversity, The Ohio State University, Columbus, OH, United States of America
| | - Drew J. Duckett
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, United States of America
- Museum of Biological Diversity, The Ohio State University, Columbus, OH, United States of America
| | - Filipe G. Almeida
- Department of Zoology, Federal University at Juiz de Fora, Juiz de Fora, Minas Gerais, Brazil
| | - Megan L. Smith
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, IN, United States of America
| | - Maria Tereza C. Thomé
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, United States of America
- Museum of Biological Diversity, The Ohio State University, Columbus, OH, United States of America
| | - Bryan C. Carstens
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH, United States of America
- Museum of Biological Diversity, The Ohio State University, Columbus, OH, United States of America
- * E-mail:
| |
Collapse
|
12
|
Featherstone LA, Zhang JM, Vaughan TG, Duchene S. Epidemiological inference from pathogen genomes: A review of phylodynamic models and applications. Virus Evol 2022; 8:veac045. [PMID: 35775026 PMCID: PMC9241095 DOI: 10.1093/ve/veac045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 05/23/2022] [Accepted: 06/02/2022] [Indexed: 11/24/2022] Open
Abstract
Phylodynamics requires an interdisciplinary understanding of phylogenetics, epidemiology, and statistical inference. It has also experienced more intense application than ever before amid the SARS-CoV-2 pandemic. In light of this, we present a review of phylodynamic models beginning with foundational models and assumptions. Our target audience is public health researchers, epidemiologists, and biologists seeking a working knowledge of the links between epidemiology, evolutionary models, and resulting epidemiological inference. We discuss the assumptions linking evolutionary models of pathogen population size to epidemiological models of the infected population size. We then describe statistical inference for phylodynamic models and list how output parameters can be rearranged for epidemiological interpretation. We go on to cover more sophisticated models and finish by highlighting future directions.
Collapse
Affiliation(s)
- Leo A Featherstone
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Joshua M Zhang
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland
- Swiss Institute of Bioinformatics, Geneva 1015, Switzerland
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| |
Collapse
|
13
|
Carstens BC, Smith ML, Duckett DJ, Fonseca EM, Thomé MTC. Assessing model adequacy leads to more robust phylogeographic inference. Trends Ecol Evol 2022; 37:402-410. [PMID: 35027224 DOI: 10.1016/j.tree.2021.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/06/2021] [Accepted: 12/14/2021] [Indexed: 11/29/2022]
Abstract
Phylogeographic studies base inferences on large data sets and complex demographic models, but these models are applied in ways that could mislead researchers and compromise their inference. Researchers face three challenges associated with the use of models: (i) 'model selection', or the identification of an appropriate model for analysis; (ii) 'evaluation of analytical results', or the interpretation of the biological significance of the resulting parameter estimates, delimitations, and topologies; and (iii) 'model evaluation', or the use of statistical approaches to assess the fit of the model to the data. The field collectively invests most of its energy in point (ii) without considering the other points; we argue that attention to points (i) and (iii) is essential to phylogeographic inference.
Collapse
Affiliation(s)
- Bryan C Carstens
- Department of Evolution, Ecology, and Organismal Biology at The Ohio State University, Columbus, OH, USA.
| | - Megan L Smith
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - Drew J Duckett
- Department of Evolution, Ecology, and Organismal Biology at The Ohio State University, Columbus, OH, USA
| | - Emanuel M Fonseca
- Department of Evolution, Ecology, and Organismal Biology at The Ohio State University, Columbus, OH, USA
| | - M Tereza C Thomé
- Department of Evolution, Ecology, and Organismal Biology at The Ohio State University, Columbus, OH, USA
| |
Collapse
|
14
|
Winkworth RC, Bellgard SE, McLenachan PA, Lockhart PJ. The mitogenome of Phytophthora agathidicida: Evidence for a not so recent arrival of the "kauri killing" Phytophthora in New Zealand. PLoS One 2021; 16:e0250422. [PMID: 34019564 PMCID: PMC8139493 DOI: 10.1371/journal.pone.0250422] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 04/06/2021] [Indexed: 12/13/2022] Open
Abstract
Phytophthora agathidicida is associated with a root rot that threatens the long-term survival of the iconic New Zealand kauri. Although it is widely assumed that this pathogen arrived in New Zealand post-1945, this hypothesis has yet to be formally tested. Here we describe evolutionary analyses aimed at evaluating this and two alternative hypotheses. As a basis for our analyses, we assembled complete mitochondrial genome sequences from 16 accessions representing the geographic range of P. agathidicida as well as those of five other members of Phytophthora clade 5. All 21 mitogenome sequences were very similar, differing little in size with all sharing the same gene content and arrangement. We first examined the temporal origins of genetic diversity using a pair of calibration schemes. Both resulted in similar age estimates; specifically, a mean age of 303.0-304.4 years and 95% HPDs of 206.9-414.6 years for the most recent common ancestor of the included isolates. We then used phylogenetic tree building and network analyses to investigate the geographic distribution of the genetic diversity. Four geographically distinct genetic groups were recognised within P. agathidicida. Taken together the inferred age and geographic distribution of the sampled mitogenome diversity suggests that this pathogen diversified following arrival in New Zealand several hundred to several thousand years ago. This conclusion is consistent with the emergence of kauri dieback disease being a consequence of recent changes in the relationship between the pathogen, host, and environment rather than a post-1945 introduction of the causal pathogen into New Zealand.
Collapse
Affiliation(s)
- Richard C. Winkworth
- Bio-Protection Research Centre, Massey University, Palmerston North, New Zealand
- School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Stanley E. Bellgard
- School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | | | - Peter J. Lockhart
- Bio-Protection Research Centre, Massey University, Palmerston North, New Zealand
- School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
15
|
The Impacts of Low Diversity Sequence Data on Phylodynamic Inference during an Emerging Epidemic. Viruses 2021; 13:v13010079. [PMID: 33430050 PMCID: PMC7826997 DOI: 10.3390/v13010079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/05/2021] [Accepted: 01/05/2021] [Indexed: 01/06/2023] Open
Abstract
Phylodynamic inference is a pivotal tool in understanding transmission dynamics of viral outbreaks. These analyses are strongly guided by the input of an epidemiological model as well as sequence data that must contain sufficient intersequence variability in order to be informative. These criteria, however, may not be met during the early stages of an outbreak. Here we investigate the impact of low diversity sequence data on phylodynamic inference using the birth–death and coalescent exponential models. Through our simulation study, estimating the molecular evolutionary rate required enough sequence diversity and is an essential first step for any phylodynamic inference. Following this, the birth–death model outperforms the coalescent exponential model in estimating epidemiological parameters, when faced with low diversity sequence data due to explicitly exploiting the sampling times. In contrast, the coalescent model requires additional samples and therefore variability in sequence data before accurate estimates can be obtained. These findings were also supported through our empirical data analyses of an Australian and a New Zealand cluster outbreaks of SARS-CoV-2. Overall, the birth–death model is more robust when applied to datasets with low sequence diversity given sampling is specified and this should be considered for future viral outbreak investigations.
Collapse
|
16
|
Squaring within the Colless index yields a better balance index. Math Biosci 2020; 331:108503. [PMID: 33253745 DOI: 10.1016/j.mbs.2020.108503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 10/28/2020] [Accepted: 10/28/2020] [Indexed: 11/24/2022]
Abstract
The Colless index for bifurcating phylogenetic trees, introduced by Colless (1982), is defined as the sum, over all internal nodes v of the tree, of the absolute value of the difference of the sizes of the clades defined by the children of v. It is one of the most popular phylogenetic balance indices, because, in addition to measuring the balance of a tree in a very simple and intuitive way, it turns out to be one of the most powerful and discriminating phylogenetic shape indices. But it has some drawbacks. On the one hand, although its minimum value is reached at the so-called maximally balanced trees, it is almost always reached also at trees that are not maximally balanced. On the other hand, its definition as a sum of absolute values of differences makes it difficult to study analytically its distribution under probabilistic models of bifurcating phylogenetic trees. In this paper we show that if we replace in its definition the absolute values of the differences of clade sizes by the squares of these differences, all these drawbacks are overcome and the resulting index is still more powerful and discriminating than the original Colless index.
Collapse
|
17
|
Bilderbeek RJC, Laudanno G, Etienne RS. Quantifying the impact of an inference model in Bayesian phylogenetics. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13514] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Richèl J. C. Bilderbeek
- Groningen Institute for Evolutionary Life Sciences University of Groningen Groningen The Netherlands
| | - Giovanni Laudanno
- Groningen Institute for Evolutionary Life Sciences University of Groningen Groningen The Netherlands
| | - Rampal S. Etienne
- Groningen Institute for Evolutionary Life Sciences University of Groningen Groningen The Netherlands
| |
Collapse
|
18
|
M Coronado T, Mir A, Rosselló F, Rotger L. On Sackin's original proposal: the variance of the leaves' depths as a phylogenetic balance index. BMC Bioinformatics 2020; 21:154. [PMID: 32326884 PMCID: PMC7181513 DOI: 10.1186/s12859-020-3405-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open
Abstract
Background The Sackin indexS of a rooted phylogenetic tree, defined as the sum of its leaves’ depths, is one of the most popular balance indices in phylogenetics, and Sackin’s paper (Syst Zool 21:225–6, 1972) is usually cited as the source for this index. However, what Sackin actually proposed in his paper as a measure of the imbalance of a rooted tree was not the sum of its leaves’ depths, but their “variation”. This proposal was later implemented as the variance of the leaves’ depths by Kirkpatrick and Slatkin in (Evolution 47:1171–81, 1993), where they also posed the problem of finding a closed formula for its expected value under the Yule model. Nowadays, Sackin’s original proposal seems to have passed into oblivion in the phylogenetics literature, replaced by the index bearing his name, which, in fact, was introduced a decade later by Sokal. Results In this paper we study the properties of the variance of the leaves’ depths, V, as a balance index. Firstly, we prove that the rooted trees with n leaves and maximum V value are exactly the combs with n leaves. But although V achieves its minimum value on every space \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\mathcal {BT}_{n}$\end{document}ℬTn of bifurcating rooted phylogenetic trees with n≤183 leaves at the so-called “maximally balanced trees” with n leaves, this property fails for almost every n≥184. We provide then an algorithm that finds the trees in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\mathcal {BT}_{n}$\end{document}ℬTn with minimum V value in time O(n log(n)). Secondly, we obtain closed formulas for the expected V value of a bifurcating rooted tree with any number n of leaves under the Yule and the uniform models and, as a by-product of the computations leading to these formulas, we also obtain closed formulas for the variance under the uniform model of the Sackin index and the total cophenetic index (Mir et al., Math Biosci 241:125–36, 2013) of a bifurcating rooted tree, as well as of their covariance, thus filling this gap in the literature. Conclusion The phylogenetics community has been wise in preferring the sum S(T) of the leaves’ depths of a phylogenetic tree T over their variance V(T) as a balance index, because the latter does not seem to capture correctly the notion of balance of large bifurcating rooted trees. But it is still a valid and useful shape index.
Collapse
Affiliation(s)
- Tomás M Coronado
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma, E-07010, Spain
| | - Arnau Mir
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, E-07122, Spain.,Balearic Islands Health Research Institute (IdISBa), Palma, E-07010, Spain
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, E-07122, Spain. .,Balearic Islands Health Research Institute (IdISBa), Palma, E-07010, Spain.
| | - Lucía Rotger
- Dept. of Mathematics and Computing, University of La Rioja, Logroño, E-26004, Spain
| |
Collapse
|
19
|
On the minimum value of the Colless index and the bifurcating trees that achieve it. J Math Biol 2020; 80:1993-2054. [DOI: 10.1007/s00285-020-01488-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 03/19/2020] [Indexed: 12/20/2022]
|
20
|
Chen W, Kenney T, Bielawski J, Gu H. Testing adequacy for DNA substitution models. BMC Bioinformatics 2019; 20:349. [PMID: 31221105 PMCID: PMC6585133 DOI: 10.1186/s12859-019-2905-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 05/17/2019] [Indexed: 12/22/2022] Open
Abstract
Background Testing model adequacy is important before a DNA substitution model is chosen for phylogenetic inference. Using a mis-specified model can negatively impact phylogenetic inference, for example, the maximum likelihood method can be inconsistent when the DNA sequences are generated under a tree topology which is in the Felsentein Zone and analyzed with a mis-specified or inadequate model. However, model adequacy testing in phylogenetics is underdeveloped. Results Here we develop a simple, general, powerful and robust model test based on Pearson’s goodness-of-fit test and binning of site patterns. We demonstrate through simulation that this test is robust in its high power to reject the inadequate models for a large range of different ways of binning site patterns while the Type I error is controlled well. In the real data analysis we discovered many cases where models chosen by another method can be rejected by this new test, in particular, our proposed test rejects the most complex DNA model (GTR+I+ Γ) while the Goldman-Cox test fails to reject the commonly used simple models. Conclusions Model adequacy testing and bootstrap should be used together to assess reliability of conclusions after model selection and model fitting have already been applied to choose the model and fit it. The new goodness-of-fit test proposed in this paper is a simple and powerful model adequacy testing method serving such a regular model checking purpose. We caution against deriving strong conclusions from analyses based on inadequate models. At a minimum, those results derived from inadequate models can now be readly flagged using the new test, and reported as such.
Collapse
Affiliation(s)
- Wei Chen
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada
| | - Joseph Bielawski
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada.,Department of Biology, Dalhousie University, Halifax, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada.
| |
Collapse
|
21
|
Qiu X, Duvvuri VR, Bahl J. Computational Approaches and Challenges to Developing Universal Influenza Vaccines. Vaccines (Basel) 2019; 7:E45. [PMID: 31141933 PMCID: PMC6631137 DOI: 10.3390/vaccines7020045] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 05/15/2019] [Accepted: 05/23/2019] [Indexed: 12/25/2022] Open
Abstract
The traditional design of effective vaccines for rapidly-evolving pathogens, such as influenza A virus, has failed to provide broad spectrum and long-lasting protection. With low cost whole genome sequencing technology and powerful computing capabilities, novel computational approaches have demonstrated the potential to facilitate the design of a universal influenza vaccine. However, few studies have integrated computational optimization in the design and discovery of new vaccines. Understanding the potential of computational vaccine design is necessary before these approaches can be implemented on a broad scale. This review summarizes some promising computational approaches under current development, including computationally optimized broadly reactive antigens with consensus sequences, phylogenetic model-based ancestral sequence reconstruction, and immunomics to compute conserved cross-reactive T-cell epitopes. Interactions between virus-host-environment determine the evolvability of the influenza population. We propose that with the development of novel technologies that allow the integration of data sources such as protein structural modeling, host antibody repertoire analysis and advanced phylodynamic modeling, computational approaches will be crucial for the development of a long-lasting universal influenza vaccine. Taken together, computational approaches are powerful and promising tools for the development of a universal influenza vaccine with durable and broad protection.
Collapse
Affiliation(s)
- Xueting Qiu
- Center for Ecology of Infectious Diseases, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA.
| | - Venkata R Duvvuri
- Center for Ecology of Infectious Diseases, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA.
| | - Justin Bahl
- Center for Ecology of Infectious Diseases, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA 30602, USA.
- Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA 30606, USA.
- Duke-NUS Graduate Medical School, Singapore 169857, Singapore.
| |
Collapse
|
22
|
Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, Heled J, Jones G, Kühnert D, De Maio N, Matschiner M, Mendes FK, Müller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2019; 15:e1006650. [PMID: 30958812 PMCID: PMC6472827 DOI: 10.1371/journal.pcbi.1006650] [Citation(s) in RCA: 1697] [Impact Index Per Article: 339.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 04/18/2019] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.
Collapse
Affiliation(s)
- Remco Bouckaert
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Timothy G. Vaughan
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joëlle Barido-Sottani
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sebastián Duchêne
- Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia
| | - Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia
| | | | | | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden
| | - Denise Kühnert
- Max Planck Institute for the Science of Human History, Jena, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK
| | - Michael Matschiner
- Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland
| | - Fábio K. Mendes
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Nicola F. Müller
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, TX 77005-1892, USA
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK
| | - Alex Popinga
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK
| | - David Rasmussen
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Chieh-Hsi Wu
- Department of Statistics, University of Oxford, OX1 3LB, UK
| | - Dong Xie
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Chi Zhang
- Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Tanja Stadler
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexei J. Drummond
- Centre of Computational Evolution, University of Auckland, Auckland, New Zealand
| |
Collapse
|