1
|
Lin Q, Goldberg EE, Leitner T, Molina-París C, King AA, Romero-Severson EO. The Number and Pattern of Viral Genomic Reassortments are not Necessarily Identifiable from Segment Trees. Mol Biol Evol 2024; 41:msae078. [PMID: 38648521 DOI: 10.1093/molbev/msae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 02/23/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024] Open
Abstract
Reassortment is an evolutionary process common in viruses with segmented genomes. These viruses can swap whole genomic segments during cellular co-infection, giving rise to novel progeny formed from the mixture of parental segments. Since large-scale genome rearrangements have the potential to generate new phenotypes, reassortment is important to both evolutionary biology and public health research. However, statistical inference of the pattern of reassortment events from phylogenetic data is exceptionally difficult, potentially involving inference of general graphs in which individual segment trees are embedded. In this paper, we argue that, in general, the number and pattern of reassortment events are not identifiable from segment trees alone, even with theoretically ideal data. We call this fact the fundamental problem of reassortment, which we illustrate using the concept of the "first-infection tree," a potentially counterfactual genealogy that would have been observed in the segment trees had no reassortment occurred. Further, we illustrate four additional problems that can arise logically in the inference of reassortment events and show, using simulated data, that these problems are not rare and can potentially distort our observation of reassortment even in small data sets. Finally, we discuss how existing methods can be augmented or adapted to account for not only the fundamental problem of reassortment, but also the four additional situations that can complicate the inference of reassortment.
Collapse
Affiliation(s)
- Qianying Lin
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Emma E Goldberg
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Thomas Leitner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Carmen Molina-París
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Aaron A King
- Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
- Department of Mathematics, University of Michigan, Ann Arbor, MI, USA
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI, USA
- Santa Fe Institute, Santa Fe, NM, USA
| | - Ethan O Romero-Severson
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
| |
Collapse
|
2
|
Shao Y, Magee AF, Vasylyeva TI, Suchard MA. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models. PLoS Comput Biol 2024; 20:e1011640. [PMID: 38551979 PMCID: PMC11006205 DOI: 10.1371/journal.pcbi.1011640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 04/10/2024] [Accepted: 03/10/2024] [Indexed: 04/09/2024] Open
Abstract
Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.
Collapse
Affiliation(s)
- Yucai Shao
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
| | - Andrew F. Magee
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- Department of Population Health and Disease Prevention, University of California Irvine, Irvine, California, United States of America
| | - Marc A. Suchard
- Department of Biostatistics, University of California, Los Angeles, California, United States of America
- Department of Biomathematics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, Universtiy of California, Los Angeles, California, United States of America
| |
Collapse
|
3
|
Shao Y, Magee AF, Vasylyeva TI, Suchard MA. Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.31.564882. [PMID: 37961423 PMCID: PMC10634968 DOI: 10.1101/2023.10.31.564882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.
Collapse
Affiliation(s)
- Yucai Shao
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States
| | - Andrew F. Magee
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, United States
- Department of Population Health and Disease Prevention, University of California Irvine, Irvine, United States
| | - Marc A. Suchard
- Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Universtiy of California, Los Angeles, United States
| |
Collapse
|
4
|
Carnegie L, Raghwani J, Fournié G, Hill SC. Phylodynamic approaches to studying avian influenza virus. Avian Pathol 2023; 52:289-308. [PMID: 37565466 DOI: 10.1080/03079457.2023.2236568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 06/23/2023] [Accepted: 07/07/2023] [Indexed: 08/12/2023]
Abstract
Avian influenza viruses can cause severe disease in domestic and wild birds and are a pandemic threat. Phylodynamics is the study of how epidemiological, evolutionary, and immunological processes can interact to shape viral phylogenies. This review summarizes how phylodynamic methods have and could contribute to the study of avian influenza viruses. Specifically, we assess how phylodynamics can be used to examine viral spread within and between wild or domestic bird populations at various geographical scales, identify factors associated with virus dispersal, and determine the order and timing of virus lineage movement between geographic regions or poultry production systems. We discuss factors that can complicate the interpretation of phylodynamic results and identify how future methodological developments could contribute to improved control of the virus.
Collapse
Affiliation(s)
- L Carnegie
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
| | - J Raghwani
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
| | - G Fournié
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
- Université de Lyon, INRAE, VetAgro Sup, UMR EPIA, Marcy l'Etoile, France
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR EPIA, Saint Genes Champanelle, France
| | - S C Hill
- Department of Pathobiology and Population Sciences, Royal Veterinary College (RVC), Hatfield, UK
| |
Collapse
|
5
|
Lin Q, Goldberg EE, Leitner T, Molina-París C, King AA, Romero-Severson EO. Modeling the evolution of segment trees reveals deficiencies in current inferential methods for genomic reassortment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.20.558687. [PMID: 37790507 PMCID: PMC10542121 DOI: 10.1101/2023.09.20.558687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Reassortment is an evolutionary process common in viruses with segmented genomes. These viruses can swap whole genomic segments during cellular co-infection, giving rise to new viral variants. Large-scale genome rearrangements, such as reassortment, have the potential to quickly generate new phenotypes, making the understanding of viral reassortment important to both evolutionary biology and public health research. In this paper, we argue that reassortment cannot be reliably inferred from incongruities between segment phylogenies using the established remove-and-rejoin or coalescent approaches. We instead show that reassortment must be considered in the context of a broader population process that includes the dynamics of the infected hosts. Using illustrative examples and simulation we identify four types of evolutionary events that are difficult or impossible to reconstruct with incongruence-based methods. Further, we show that these specific situations are very common and will likely occur even in small samples. Finally, we argue that existing methods can be augmented or modified to account for all the problematic situations that we identify in this paper. Robust assessment of the role of reassortment in viral evolution is difficult, and we hope to provide conceptual clarity on some important methodological issues that can arise in the development of the next generation of tools for studying reassortment.
Collapse
|
6
|
Cappello L, Kim J, Palacios JA. adaPop: Bayesian inference of dependent population dynamics in coalescent models. PLoS Comput Biol 2023; 19:e1010897. [PMID: 36940209 PMCID: PMC10063170 DOI: 10.1371/journal.pcbi.1010897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 03/30/2023] [Accepted: 01/25/2023] [Indexed: 03/21/2023] Open
Abstract
The coalescent is a powerful statistical framework that allows us to infer past population dynamics leveraging the ancestral relationships reconstructed from sampled molecular sequence data. In many biomedical applications, such as in the study of infectious diseases, cell development, and tumorgenesis, several distinct populations share evolutionary history and therefore become dependent. The inference of such dependence is a highly important, yet a challenging problem. With advances in sequencing technologies, we are well positioned to exploit the wealth of high-resolution biological data for tackling this problem. Here, we present adaPop, a probabilistic model to estimate past population dynamics of dependent populations and to quantify their degree of dependence. An essential feature of our approach is the ability to track the time-varying association between the populations while making minimal assumptions on their functional shapes via Markov random field priors. We provide nonparametric estimators, extensions of our base model that integrate multiple data sources, and fast scalable inference algorithms. We test our method using simulated data under various dependent population histories and demonstrate the utility of our model in shedding light on evolutionary histories of different variants of SARS-CoV-2.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Julia A. Palacios
- Departments of Statistics and Biomedical Data Science, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
7
|
Nie J, Wang Q, Jin S, Yao X, Xu L, Chang Y, Ding F, Li Z, Sun L, Shi Y, Shan Y. Self-assembled multiepitope nanovaccine based on NoV P particles induces effective and lasting protection against H3N2 influenza virus. NANO RESEARCH 2023; 16:7337-7346. [PMID: 36820263 PMCID: PMC9933037 DOI: 10.1007/s12274-023-5395-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/04/2022] [Accepted: 12/08/2022] [Indexed: 05/24/2023]
Abstract
Current seasonal influenza vaccines confer only limited coverage of virus strains due to the frequent genetic and antigenic variability of influenza virus (IV). Epitope vaccines that accurately target conserved domains provide a promising approach to increase the breadth of protection; however, poor immunogenicity greatly hinders their application. The protruding (P) domain of the norovirus (NoV), which can self-assemble into a 24-mer particle called the NoV P particle, offers an ideal antigen presentation platform. In this study, a multiepitope nanovaccine displaying influenza epitopes (HMN-PP) was constructed based on the NoV P particle nanoplatform. Large amounts of HMN-PP were easily expressed in Escherichia coli in soluble form. Animal experiments showed that the adjuvanted HMN-PP nanovaccine induced epitope-specific antibodies and haemagglutinin (HA)-specific neutralizing antibodies, and the antibodies could persist for at least three months after the last immunization. Furthermore, HMN-PP induced matrix protein 2 extracellular domain (M2e)-specific antibody-dependent cell-mediated cytotoxicity, CD4+ and CD8+ T-cell responses, and a nucleoprotein (NP)-specific cytotoxic T lymphocyte (CTL) response. These results indicated that the combination of a multiepitope vaccine and self-assembled NoV P particles may be an ideal and effective vaccine strategy for highly variable viruses such as IV and SARS-CoV-2. Electronic Supplementary Material Supplementary material is available in the online version of this article at 10.1007/s12274-023-5395-6.
Collapse
Affiliation(s)
- Jiaojiao Nie
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Qingyu Wang
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Shenghui Jin
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Xin Yao
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Lipeng Xu
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Yaotian Chang
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Fan Ding
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Zeyu Li
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Lulu Sun
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Yuhua Shi
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
| | - Yaming Shan
- National Engineering Laboratory for AIDS Vaccine, School of Life Sciences, Jilin University, Jilin, 130012 China
- Key Laboratory for Molecular Enzymology and Engineering, The Ministry of Education, School of Life Sciences, Jilin University, Jilin, 130012 China
| |
Collapse
|
8
|
Inward RPD, Parag KV, Faria NR. Using multiple sampling strategies to estimate SARS-CoV-2 epidemiological parameters from genomic sequencing data. Nat Commun 2022; 13:5587. [PMID: 36151084 PMCID: PMC9508174 DOI: 10.1038/s41467-022-32812-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 08/16/2022] [Indexed: 11/09/2022] Open
Abstract
The choice of viral sequences used in genetic and epidemiological analysis is important as it can induce biases that detract from the value of these rich datasets. This raises questions about how a set of sequences should be chosen for analysis. We provide insights on these largely understudied problems using SARS-CoV-2 genomic sequences from Hong Kong, China, and the Amazonas State, Brazil. We consider multiple sampling schemes which were used to estimate Rt and rt as well as related R0 and date of origin parameters. We find that both Rt and rt are sensitive to changes in sampling whilst R0 and the date of origin are relatively robust. Moreover, we find that analysis using unsampled datasets result in the most biased Rt and rt estimates for both our Hong Kong and Amazonas case studies. We highlight that sampling strategy choices may be an influential yet neglected component of sequencing analysis pipelines.
Collapse
Affiliation(s)
| | - Kris V Parag
- MRC Centre of Global Infectious Disease Analysis, Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK.
- NIHR Health Protection Research Unit in Behavioural Science and Evaluation, University of Bristol, Bristol, UK.
| | - Nuno R Faria
- Department of Zoology, University of Oxford, Oxford, UK.
- MRC Centre of Global Infectious Disease Analysis, Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK.
- Instituto de Medicina Tropical, Faculdade de Medicina da Universidade de Sao Paulo, Sao Paulo, Brazil.
| |
Collapse
|
9
|
Attwood SW, Hill SC, Aanensen DM, Connor TR, Pybus OG. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nat Rev Genet 2022; 23:547-562. [PMID: 35459859 PMCID: PMC9028907 DOI: 10.1038/s41576-022-00483-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2022] [Indexed: 01/05/2023]
Abstract
Determining the transmissibility, prevalence and patterns of movement of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections is central to our understanding of the impact of the pandemic and to the design of effective control strategies. Phylogenies (evolutionary trees) have provided key insights into the international spread of SARS-CoV-2 and enabled investigation of individual outbreaks and transmission chains in specific settings. Phylodynamic approaches combine evolutionary, demographic and epidemiological concepts and have helped track virus genetic changes, identify emerging variants and inform public health strategy. Here, we review and synthesize studies that illustrate how phylogenetic and phylodynamic techniques were applied during the first year of the pandemic, and summarize their contributions to our understanding of SARS-CoV-2 transmission and control.
Collapse
Affiliation(s)
- Stephen W Attwood
- Department of Zoology, University of Oxford, Oxford, UK.
- Pathogen Genomics Unit, Public Health Wales NHS Trust, Cardiff, UK.
| | - Sarah C Hill
- Department of Pathobiology and Population Sciences, Royal Veterinary College, University of London, London, UK
| | - David M Aanensen
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Thomas R Connor
- Pathogen Genomics Unit, Public Health Wales NHS Trust, Cardiff, UK
- School of Biosciences, Cardiff University, Cardiff, UK
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, UK.
- Department of Pathobiology and Population Sciences, Royal Veterinary College, University of London, London, UK.
| |
Collapse
|
10
|
Featherstone LA, Zhang JM, Vaughan TG, Duchene S. Epidemiological Inference From Pathogen Genomes: A Review of Phylodynamic Models and Applications. Virus Evol 2022; 8:veac045. [PMID: 35775026 PMCID: PMC9241095 DOI: 10.1093/ve/veac045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 05/23/2022] [Accepted: 06/02/2022] [Indexed: 11/24/2022] Open
Abstract
Phylodynamics requires an interdisciplinary understanding of phylogenetics, epidemiology, and statistical inference. It has also experienced more intense application than ever before amid the SARS-CoV-2 pandemic. In light of this, we present a review of phylodynamic models beginning with foundational models and assumptions. Our target audience is public health researchers, epidemiologists, and biologists seeking a working knowledge of the links between epidemiology, evolutionary models, and resulting epidemiological inference. We discuss the assumptions linking evolutionary models of pathogen population size to epidemiological models of the infected population size. We then describe statistical inference for phylodynamic models and list how output parameters can be rearranged for epidemiological interpretation. We go on to cover more sophisticated models and finish by highlighting future directions.
Collapse
Affiliation(s)
- Leo A Featherstone
- Peter Doherty Institute for Infection and Immunity, University of Melbourne , Australia
| | - Joshua M Zhang
- Peter Doherty Institute for Infection and Immunity, University of Melbourne , Australia
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich , Basel, Switzerland
- Swiss Institute of Bioinformatics
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, University of Melbourne , Australia
| |
Collapse
|
11
|
Andréoletti J, Zwaans A, Warnock RCM, Aguirre-Fernández G, Barido-Sottani J, Gupta A, Stadler T, Manceau M. The Occurrence Birth-Death Process for combined-evidence analysis in macroevolution and epidemiology. Syst Biol 2022; 71:1440-1452. [PMID: 35608305 PMCID: PMC9558841 DOI: 10.1093/sysbio/syac037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 11/28/2022] Open
Abstract
Phylodynamic models generally aim at jointly inferring phylogenetic relationships, model parameters, and more recently, the number of lineages through time, based on molecular sequence data. In the fields of epidemiology and macroevolution, these models can be used to estimate, respectively, the past number of infected individuals (prevalence) or the past number of species (paleodiversity) through time. Recent years have seen the development of “total-evidence” analyses, which combine molecular and morphological data from extant and past sampled individuals in a unified Bayesian inference framework. Even sampled individuals characterized only by their sampling time, that is, lacking morphological and molecular data, which we call occurrences, provide invaluable information to estimate the past number of lineages. Here, we present new methodological developments around the fossilized birth–death process enabling us to (i) incorporate occurrence data in the likelihood function; (ii) consider piecewise-constant birth, death, and sampling rates; and (iii) estimate the past number of lineages, with or without knowledge of the underlying tree. We implement our method in the RevBayes software environment, enabling its use along with a large set of models of molecular and morphological evolution, and validate the inference workflow using simulations under a wide range of conditions. We finally illustrate our new implementation using two empirical data sets stemming from the fields of epidemiology and macroevolution. In epidemiology, we infer the prevalence of the coronavirus disease 2019 outbreak on the Diamond Princess ship, by taking into account jointly the case count record (occurrences) along with viral sequences for a fraction of infected individuals. In macroevolution, we infer the diversity trajectory of cetaceans using molecular and morphological data from extant taxa, morphological data from fossils, as well as numerous fossil occurrences. The joint modeling of occurrences and trees holds the promise to further bridge the gap between traditional epidemiology and pathogen genomics, as well as paleontology and molecular phylogenetics. [Birth–death model; epidemiology; fossils; macroevolution; occurrences; phylogenetics; skyline.]
Collapse
Affiliation(s)
- Jérémy Andréoletti
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Antoine Zwaans
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Rachel C M Warnock
- GeoZentrum Nordbayern,Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| | | | - Joëlle Barido-Sottani
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, USA
| | - Ankit Gupta
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Marc Manceau
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| |
Collapse
|
12
|
Cappello L, Kim J, Liu S, Palacios JA. Statistical Challenges in Tracking the Evolution of SARS-CoV-2. Stat Sci 2022; 37:162-182. [PMID: 36034090 PMCID: PMC9409356 DOI: 10.1214/22-sts853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Lorenzo Cappello is Assistant Professor, Departments of Economics and Business, Universitat Pompeu Fabra, 08005, Spain
| | - Jaehee Kim
- Jaehee Kim is Assistant Professor, Department of Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Sifan Liu
- Sifan Liu is a Ph.D. student, Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Julia A. Palacios
- Julia A. Palacios is Assistant Professor, Departments of Statistics and Biomedical Data Sciences, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
13
|
Bouckaert RR. An Efficient Coalescent Epoch Model for Bayesian Phylogenetic Inference. Syst Biol 2022; 71:1549-1560. [PMID: 35212733 PMCID: PMC9773037 DOI: 10.1093/sysbio/syac015] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/24/2022] [Accepted: 02/22/2022] [Indexed: 12/25/2022] Open
Abstract
We present a two-headed approach called Bayesian Integrated Coalescent Epoch PlotS (BICEPS) for efficient inference of coalescent epoch models. Firstly, we integrate out population size parameters, and secondly, we introduce a set of more powerful Markov chain Monte Carlo (MCMC) proposals for flexing and stretching trees. Even though population sizes are integrated out and not explicitly sampled through MCMC, we are still able to generate samples from the population size posteriors. This allows demographic reconstruction through time and estimating the timing and magnitude of population bottlenecks and full population histories. Altogether, BICEPS can be considered a more muscular version of the popular Bayesian skyline model. We demonstrate its power and correctness by a well-calibrated simulation study. Furthermore, we demonstrate with an application to SARS-CoV-2 genomic data that some analyses that have trouble converging with the traditional Bayesian skyline prior and standard MCMC proposals can do well with the BICEPS approach. BICEPS is available as open-source package for BEAST 2 under GPL license and has a user-friendly graphical user interface.[Bayesian phylogenetics; BEAST 2; BICEPS; coalescent model.].
Collapse
Affiliation(s)
- Remco R Bouckaert
- Correspondence to be sent to: University of Auckland, Thomas
Building, Room 407 3 Symonds St Auckland 1010 New Zealand E-mail:
| |
Collapse
|
14
|
Zarebski AE, du Plessis L, Parag KV, Pybus OG. A computationally tractable birth-death model that combines phylogenetic and epidemiological data. PLoS Comput Biol 2022; 18:e1009805. [PMID: 35148311 PMCID: PMC8903285 DOI: 10.1371/journal.pcbi.1009805] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 03/08/2022] [Accepted: 01/05/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring the dynamics of pathogen transmission during an outbreak is an important problem in infectious disease epidemiology. In mathematical epidemiology, estimates are often informed by time series of confirmed cases, while in phylodynamics genetic sequences of the pathogen, sampled through time, are the primary data source. Each type of data provides different, and potentially complementary, insight. Recent studies have recognised that combining data sources can improve estimates of the transmission rate and the number of infected individuals. However, inference methods are typically highly specialised and field-specific and are either computationally prohibitive or require intensive simulation, limiting their real-time utility. We present a novel birth-death phylogenetic model and derive a tractable analytic approximation of its likelihood, the computational complexity of which is linear in the size of the dataset. This approach combines epidemiological and phylodynamic data to produce estimates of key parameters of transmission dynamics and the unobserved prevalence. Using simulated data, we show (a) that the approximation agrees well with existing methods, (b) validate the claim of linear complexity and (c) explore robustness to model misspecification. This approximation facilitates inference on large datasets, which is increasingly important as large genomic sequence datasets become commonplace.
Collapse
Affiliation(s)
| | - Louis du Plessis
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Kris Varun Parag
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
| | | |
Collapse
|
15
|
Accounting for spatial sampling patterns in Bayesian phylogeography. Proc Natl Acad Sci U S A 2021; 118:2105273118. [PMID: 34930835 PMCID: PMC8719894 DOI: 10.1073/pnas.2105273118] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/09/2021] [Indexed: 12/13/2022] Open
Abstract
Statistical phylogeography has led to substantial progress in our understanding of the pace and means by which organisms colonize their habitats. Yet, inference from these models often relies on implicit assumptions pertaining to spatial sampling design, potentially leading to biased estimation of key biological parameters. While sampled locations sometimes convey signal about the processes that shape spatial biodiversity, they do not always do so. We present a statistical approach that permits accurate estimation of dispersal rates, even in cases where spatial sampling is driven by practical motivations unrelated to the outcome of the evolutionary process. The proposed framework paves the way to further developments in phylogeography with key applications, including the efficient monitoring of pandemics and invasive species during the course of their evolution. Statistical phylogeography provides useful tools to characterize and quantify the spread of organisms during the course of evolution. Analyzing georeferenced genetic data often relies on the assumption that samples are preferentially collected in densely populated areas of the habitat. Deviation from this assumption negatively impacts the inference of the spatial and demographic dynamics. This issue is pervasive in phylogeography. It affects analyses that approximate the habitat as a set of discrete demes as well as those that treat it as a continuum. The present study introduces a Bayesian modeling approach that explicitly accommodates for spatial sampling strategies. An original inference technique, based on recent advances in statistical computing, is then described that is most suited to modeling data where sequences are preferentially collected at certain locations, independently of the outcome of the evolutionary process. The analysis of georeferenced genetic sequences from the West Nile virus in North America along with simulated data shows how assumptions about spatial sampling may impact our understanding of the forces shaping biodiversity across time and space.
Collapse
|
16
|
Cappello L, Palacios JA. Adaptive Preferential Sampling in Phylodynamics With an Application to SARS-CoV-2. J Comput Graph Stat 2021; 31:541-552. [PMID: 36035966 PMCID: PMC9409340 DOI: 10.1080/10618600.2021.1987256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Longitudinal molecular data of rapidly evolving viruses and pathogens provide information about disease spread and complement traditional surveillance approaches based on case count data. The coalescent is used to model the genealogy that represents the sample ancestral relationships. The basic assumption is that coalescent events occur at a rate inversely proportional to the effective population size Ne(t), a time-varying measure of genetic diversity. When the sampling process (collection of samples over time) depends on Ne(t), the coalescent and the sampling processes can be jointly modeled to improve estimation of Ne(t). Failing to do so can lead to bias due to model misspecification. However, the way that the sampling process depends on the effective population size may vary over time. We introduce an approach where the sampling process is modeled as an inhomogeneous Poisson process with rate equal to the product of Ne(t) and a time-varying coefficient, making minimal assumptions on their functional shapes via Markov random field priors. We provide efficient algorithms for inference, show the model performance vis-a-vis alternative methods in a simulation study, and apply our model to SARS-CoV-2 sequences from Los Angeles and Santa Clara counties. The methodology is implemented and available in the R package adapref. Supplementary files for this article are available online.
Collapse
Affiliation(s)
| | - Julia A. Palacios
- Department of Statistics, Stanford University, Stanford, CA
- Department of Biomedical Data Science, Stanford Medicine, Stanford, CA
| |
Collapse
|
17
|
Duchêne S, Ho SYW, Carmichael AG, Holmes EC, Poinar H. The Recovery, Interpretation and Use of Ancient Pathogen Genomes. Curr Biol 2021; 30:R1215-R1231. [PMID: 33022266 PMCID: PMC7534838 DOI: 10.1016/j.cub.2020.08.081] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The ability to sequence genomes from ancient biological material has provided a rich source of information for evolutionary biology and engaged considerable public interest. Although most studies of ancient genomes have focused on vertebrates, particularly archaic humans, newer technologies allow the capture of microbial pathogens and microbiomes from ancient and historical human and non-human remains. This coming of age has been made possible by techniques that allow the preferential capture and amplification of discrete genomes from a background of predominantly host and environmental DNA. There are now near-complete ancient genome sequences for three pathogens of considerable historical interest — pre-modern bubonic plague (Yersinia pestis), smallpox (Variola virus) and cholera (Vibrio cholerae) — and for three equally important endemic human disease agents — Mycobacterium tuberculosis (tuberculosis), Mycobacterium leprae (leprosy) and Treponema pallidum pallidum (syphilis). Genomic data from these pathogens have extended earlier work by paleopathologists. There have been efforts to sequence the genomes of additional ancient pathogens, with the potential to broaden our understanding of the infectious disease burden common to past populations from the Bronze Age to the early 20th century. In this review we describe the state-of-the-art of this rapidly developing field, highlight the contributions of ancient pathogen genomics to multidisciplinary endeavors and describe some of the limitations in resolving questions about the emergence and long-term evolution of pathogens.
Collapse
Affiliation(s)
- Sebastián Duchêne
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia.
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and Environmental Sciences and School of Medical Sciences, University of Sydney, Sydney, NSW 2006, Australia.
| | - Hendrik Poinar
- McMaster Ancient DNA Centre, Departments of Anthropology and Biochemistry, McMaster University, 1280 Main St. W., Hamilton, ON L8S 4L9, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, 1280 Main St. W., Hamilton, ON L8S 4L8, Canada; Humans and the Microbiome Program, Canadian Institute for Advanced Research, Toronto, Canada.
| |
Collapse
|
18
|
Featherstone LA, Di Giallonardo F, Holmes EC, Vaughan TG, Duchêne S. Infectious disease phylodynamics with occurrence data. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13620] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Leo A. Featherstone
- Department of Microbiology and Immunology Peter Doherty Institute for Infection and Immunity University of Melbourne Melbourne Vic. Australia
| | | | - Edward C. Holmes
- Marie Bashir Institute for Infectious Diseases and BiosecurityThe University of Sydney Sydney NSW Australia
- Charles Perkins Centre School of Life and Environmental Sciences The University of Sydney Sydney NSW Australia
- School of Medical Sciences The University of Sydney Sydney NSW Australia
| | - Timothy G. Vaughan
- Department of Biosystems Science and Engineering ETH Zurich Basel Switzerland
- Swiss Institute of Bioinformatics (SIB) Lausanne Switzerland
| | - Sebastián Duchêne
- Department of Microbiology and Immunology Peter Doherty Institute for Infection and Immunity University of Melbourne Melbourne Vic. Australia
| |
Collapse
|
19
|
Parag KV, Pybus OG, Wu CH. Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions? Syst Biol 2021; 71:121-138. [PMID: 33989428 PMCID: PMC8677568 DOI: 10.1093/sysbio/syab037] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 05/07/2021] [Accepted: 05/08/2021] [Indexed: 11/13/2022] Open
Abstract
In Bayesian phylogenetics, the coalescent process provides an informative framework for inferring changes in the effective size of a population from a phylogeny (or tree) of sequences sampled from that population. Popular coalescent inference approaches such as the Bayesian Skyline Plot, Skyride, and Skygrid all model these population size changes with a discontinuous, piecewise-constant function but then apply a smoothing prior to ensure that their posterior population size estimates transition gradually with time. These prior distributions implicitly encode extra population size information that is not available from the observed coalescent data or tree. Here, we present a novel statistic, $\Omega$, to quantify and disaggregate the relative contributions of the coalescent data and prior assumptions to the resulting posterior estimate precision. Our statistic also measures the additional mutual information introduced by such priors. Using $\Omega$ we show that, because it is surprisingly easy to overparametrize piecewise-constant population models, common smoothing priors can lead to overconfident and potentially misleading inference, even under robust experimental designs. We propose $\Omega$ as a useful tool for detecting when effective population size estimates are overly reliant on prior assumptions and for improving quantification of the uncertainty in those estimates.[Coalescent processes; effective population size; information theory; phylodynamics; prior assumptions; skyline plots.].
Collapse
Affiliation(s)
- Kris V Parag
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London W2 1PG, UK,Department of Zoology, University of Oxford, Oxford OX1 3SY, UK,Correspondence to be sent to: MRC Centre for Global Infectious Disease Analysis, Imperial College London, London W2 1PG, UK; e-mail:
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford OX1 3SY, UK
| | - Chieh-Hsi Wu
- Mathematical Sciences, University of Southampton, Highfield, Southampton SO17 1BJ, UK
| |
Collapse
|
20
|
Abstract
Genealogical tree modeling is essential for estimating evolutionary parameters in population genetics and phylogenetics. Recent mathematical results concerning ranked genealogies without leaf labels unlock opportunities in the analysis of evolutionary trees. In particular, comparisons between ranked genealogies facilitate the study of evolutionary processes of different organisms sampled at multiple time periods. We propose metrics on ranked tree shapes and ranked genealogies for lineages isochronously and heterochronously sampled. Our proposed tree metrics make it possible to conduct statistical analyses of ranked tree shapes and timed ranked tree shapes or ranked genealogies. Such analyses allow us to assess differences in tree distributions, quantify estimation uncertainty, and summarize tree distributions. We show the utility of our metrics via simulations and an application in infectious diseases.
Collapse
Affiliation(s)
- Jaehee Kim
- Department of Biology, Stanford University, Stanford, CA 94305
| | | | - Julia A Palacios
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA 94305
| |
Collapse
|
21
|
Parag KV, Donnelly CA, Jha R, Thompson RN. An exact method for quantifying the reliability of end-of-epidemic declarations in real time. PLoS Comput Biol 2020; 16:e1008478. [PMID: 33253158 PMCID: PMC7717584 DOI: 10.1371/journal.pcbi.1008478] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 12/04/2020] [Accepted: 10/28/2020] [Indexed: 12/13/2022] Open
Abstract
We derive and validate a novel and analytic method for estimating the probability that an epidemic has been eliminated (i.e. that no future local cases will emerge) in real time. When this probability crosses 0.95 an outbreak can be declared over with 95% confidence. Our method is easy to compute, only requires knowledge of the incidence curve and the serial interval distribution, and evaluates the statistical lifetime of the outbreak of interest. Using this approach, we show how the time-varying under-reporting of infected cases will artificially inflate the inferred probability of elimination, leading to premature (false-positive) end-of-epidemic declarations. Contrastingly, we prove that incorrectly identifying imported cases as local will deceptively decrease this probability, resulting in delayed (false-negative) declarations. Failing to sustain intensive surveillance during the later phases of an epidemic can therefore substantially mislead policymakers on when it is safe to remove travel bans or relax quarantine and social distancing advisories. World Health Organisation guidelines recommend fixed (though disease-specific) waiting times for end-of-epidemic declarations that cannot accommodate these variations. Consequently, there is an unequivocal need for more active and specialised metrics for reliably identifying the conclusion of an epidemic.
Collapse
Affiliation(s)
- Kris V. Parag
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, UK
| | - Christl A. Donnelly
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Rahul Jha
- Department of Applied Math and Theoretical Physics, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
22
|
Parag KV, Donnelly CA. Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models. Syst Biol 2020; 69:1163-1179. [PMID: 32333789 PMCID: PMC7584150 DOI: 10.1093/sysbio/syaa035] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 04/14/2020] [Accepted: 04/16/2020] [Indexed: 11/12/2022] Open
Abstract
Estimating temporal changes in a target population from phylogenetic or count data is an important problem in ecology and epidemiology. Reliable estimates can provide key insights into the climatic and biological drivers influencing the diversity or structure of that population and evidence hypotheses concerning its future growth or decline. In infectious disease applications, the individuals infected across an epidemic form the target population. The renewal model estimates the effective reproduction number, R, of the epidemic from counts of observed incident cases. The skyline model infers the effective population size, N, underlying a phylogeny of sequences sampled from that epidemic. Practically, R measures ongoing epidemic growth while N informs on historical caseload. While both models solve distinct problems, the reliability of their estimates depends on p-dimensional piecewise-constant functions. If p is misspecified, the model might underfit significant changes or overfit noise and promote a spurious understanding of the epidemic, which might misguide intervention policies or misinform forecasts. Surprisingly, no transparent yet principled approach for optimizing p exists. Usually, p is heuristically set, or obscurely controlled via complex algorithms. We present a computable and interpretable p-selection method based on the minimum description length (MDL) formalism of information theory. Unlike many standard model selection techniques, MDL accounts for the additional statistical complexity induced by how parameters interact. As a result, our method optimizes p so that R and N estimates properly and meaningfully adapt to available data. It also outperforms comparable Akaike and Bayesian information criteria on several classification problems, given minimal knowledge of the parameter space, and exposes statistical similarities among renewal, skyline, and other models in biology. Rigorous and interpretable model selection is necessary if trustworthy and justifiable conclusions are to be drawn from piecewise models. [Coalescent processes; epidemiology; information theory; model selection; phylodynamics; renewal models; skyline plots].
Collapse
Affiliation(s)
- Kris V Parag
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, W2 1PG, UK
| | - Christl A Donnelly
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, W2 1PG, UK
- Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| |
Collapse
|