1
|
Xiao S, Abade A, Boru W, Kasambara W, Mwaba J, Ongole F, Mmanywa M, Trovão NS, Chilengi R, Kwenda G, Orach CG, Chibwe I, Bwire G, Stine OC, Milstone AM, Lessler J, Azman AS, Luo W, Murt K, Sack DA, Debes AK, Wohl S. New Vibrio cholerae sequences from Eastern and Southern Africa alter our understanding of regional cholera transmission. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.28.24302717. [PMID: 38585829 PMCID: PMC10996759 DOI: 10.1101/2024.03.28.24302717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Despite ongoing containment and vaccination efforts, cholera remains prevalent in many countries in sub-Saharan Africa. Part of the difficulty in containing cholera comes from our lack of understanding of how it circulates throughout the region. To better characterize regional transmission, we generated and analyzed 118 Vibrio cholerae genomes collected between 2007-2019 from five different countries in Southern and Eastern Africa. We showed that V. cholerae sequencing can be successful from a variety of sample types and filled in spatial and temporal gaps in our understanding of circulating lineages, including providing some of the first sequences from the 2018-2019 outbreaks in Uganda, Kenya, Tanzania, Zambia, and Malawi. Our results present a complex picture of cholera transmission in the region, with multiple lineages found to be co-circulating within several countries. We also find evidence that previously identified sporadic cases may be from larger, undersampled outbreaks, highlighting the need for careful examination of sampling biases and underscoring the need for continued and expanded cholera surveillance across the African continent.
Collapse
Affiliation(s)
- Shaoming Xiao
- Division of Pediatric Infectious Disease, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Ahmed Abade
- Ministry of Health, Dar es Salaam, Tanzania
- Field Epidemiology and Laboratory Training Program, Nairobi, Kenya
| | - Waqo Boru
- Field Epidemiology and Laboratory Training Program, Nairobi, Kenya
| | | | - John Mwaba
- Center for Infectious Disease Research, Zambia
- Department of Pathology and Microbiology, University Teaching Hospital, Lusaka, Zambia
| | | | | | | | - Roma Chilengi
- Zambia National Public Health Institute, Lusaka, Zambia
| | | | | | | | | | - O Colin Stine
- University of Maryland School of Medicine, Baltimore, USA
| | - Aaron M Milstone
- Division of Pediatric Infectious Disease, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Justin Lessler
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Epidemiology, Gillings School of Public Health, University of North Carolina, Chapel Hill, NC, USA
- Carolina Population Center, University of North Carolina, Chapel Hill, NC, USA
| | - Andrew S Azman
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Division of Tropical and Humanitarian Medicine, Geneva University Hospitals, Geneva, Switzerland
- Geneva Centre for Emerging Viral Diseases, Geneva University Hospitals, Geneva, Switzerland
| | - Wensheng Luo
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kelsey Murt
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biomedical Sciences, School of Health Sciences, University of Zambia, Lusaka, Zambia
| | - David A Sack
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Amanda K Debes
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Shirlee Wohl
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA, USA
| |
Collapse
|
2
|
Collienne L, Whidden C, Gavryushkin A. Ranked Subtree Prune and Regraft. Bull Math Biol 2024; 86:24. [PMID: 38294587 PMCID: PMC10830682 DOI: 10.1007/s11538-023-01244-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 12/06/2023] [Indexed: 02/01/2024]
Abstract
Phylogenetic trees are a mathematical formalisation of evolutionary histories between organisms, species, genes, cancer cells, etc. For many applications, e.g. when analysing virus transmission trees or cancer evolution, (phylogenetic) time trees are of interest, where branch lengths represent times. Computational methods for reconstructing time trees from (typically molecular) sequence data, for example Bayesian phylogenetic inference using Markov Chain Monte Carlo (MCMC) methods, rely on algorithms that sample the treespace. They employ tree rearrangement operations such as [Formula: see text] (Subtree Prune and Regraft) and [Formula: see text] (Nearest Neighbour Interchange) or, in the case of time tree inference, versions of these that take times of internal nodes into account. While the classic [Formula: see text] tree rearrangement is well-studied, its variants for time trees are less understood, limiting comparative analysis for time tree methods. In this paper we consider a modification of the classical [Formula: see text] rearrangement on the space of ranked phylogenetic trees, which are trees equipped with a ranking of all internal nodes. This modification results in two novel treespaces, which we propose to study. We begin this study by discussing algorithmic properties of these treespaces, focusing on those relating to the complexity of computing distances under the ranked [Formula: see text] operations as well as similarities and differences to known tree rearrangement based treespaces. Surprisingly, we show the counterintuitive result that adding leaves to trees can actually decrease their ranked [Formula: see text] distance, which may have an impact on the results of time tree sampling algorithms given uncertain "rogue taxa".
Collapse
Affiliation(s)
- Lena Collienne
- Biological Data Science Laboratory, School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| | - Chris Whidden
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
| | - Alex Gavryushkin
- Biological Data Science Laboratory, School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
3
|
Kramer AM, Thornlow B, Ye C, De Maio N, McBroome J, Hinrichs AS, Lanfear R, Turakhia Y, Corbett-Detig R. Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations. Syst Biol 2023; 72:1039-1051. [PMID: 37232476 PMCID: PMC10627557 DOI: 10.1093/sysbio/syad031] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 05/14/2023] [Accepted: 06/22/2023] [Indexed: 05/27/2023] Open
Abstract
Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.
Collapse
Affiliation(s)
- Alexander M Kramer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cheng Ye
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
4
|
Truszkowski J, Perrigo A, Broman D, Ronquist F, Antonelli A. Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics. Syst Biol 2023; 72:1199-1206. [PMID: 37498209 PMCID: PMC10627553 DOI: 10.1093/sysbio/syad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 06/22/2023] [Accepted: 07/11/2023] [Indexed: 07/28/2023] Open
Abstract
Bayesian phylogenetics is now facing a critical point. Over the last 20 years, Bayesian methods have reshaped phylogenetic inference and gained widespread popularity due to their high accuracy, the ability to quantify the uncertainty of inferences and the possibility of accommodating multiple aspects of evolutionary processes in the models that are used. Unfortunately, Bayesian methods are computationally expensive, and typical applications involve at most a few hundred sequences. This is problematic in the age of rapidly expanding genomic data and increasing scope of evolutionary analyses, forcing researchers to resort to less accurate but faster methods, such as maximum parsimony and maximum likelihood. Does this spell doom for Bayesian methods? Not necessarily. Here, we discuss some recently proposed approaches that could help scale up Bayesian analyses of evolutionary problems considerably. We focus on two particular aspects: online phylogenetics, where new data sequences are added to existing analyses, and alternatives to Markov chain Monte Carlo (MCMC) for scalable Bayesian inference. We identify 5 specific challenges and discuss how they might be overcome. We believe that online phylogenetic approaches and Sequential Monte Carlo hold great promise and could potentially speed up tree inference by orders of magnitude. We call for collaborative efforts to speed up the development of methods for real-time tree expansion through online phylogenetics.
Collapse
Affiliation(s)
- Jakub Truszkowski
- Department of Biological and Environmental Sciences, University of Gothenburg, P. O. Box 461, SE.405 30 Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, 405 30 Gothenburg, Sweden
| | - Allison Perrigo
- Department of Biological and Environmental Sciences, University of Gothenburg, P. O. Box 461, SE.405 30 Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, 405 30 Gothenburg, Sweden
| | - David Broman
- Department of Computer Science and Digital Futures, KTH Royal Institute of Technology, SE.100 44 Stockholm, Sweden
| | - Fredrik Ronquist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, P. O. Box 50007, SE.104 05 Stockholm, Sweden
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, P. O. Box 461, SE.405 30 Gothenburg, Sweden
- Gothenburg Global Biodiversity Centre, Box 461, 405 30 Gothenburg, Sweden
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3 RB, UK
| |
Collapse
|
5
|
Orf GS, Pérez LJ, Ciuoderis K, Cardona A, Villegas S, Hernández-Ortiz JP, Baele G, Mohaimani A, Osorio JE, Berg MG, Cloherty GA. The Principles of SARS-CoV-2 Intervariant Competition Are Exemplified in the Pre-Omicron Era of the Colombian Epidemic. Microbiol Spectr 2023; 11:e0534622. [PMID: 37191534 PMCID: PMC10269686 DOI: 10.1128/spectrum.05346-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/25/2023] [Indexed: 05/17/2023] Open
Abstract
The first 18 months of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections in Colombia were characterized by three epidemic waves. During the third wave, from March through August 2021, intervariant competition resulted in Mu replacing Alpha and Gamma. We employed Bayesian phylodynamic inference and epidemiological modeling to characterize the variants in the country during this period of competition. Phylogeographic analysis indicated that Mu did not emerge in Colombia but acquired increased fitness there through local transmission and diversification, contributing to its export to North America and Europe. Despite not having the highest transmissibility, Mu's genetic composition and ability to evade preexisting immunity facilitated its domination of the Colombian epidemic landscape. Our results support previous modeling studies demonstrating that both intrinsic factors (transmissibility and genetic diversity) and extrinsic factors (time of introduction and acquired immunity) influence the outcome of intervariant competition. This analysis will help set practical expectations about the inevitable emergences of new variants and their trajectories. IMPORTANCE Before the appearance of the Omicron variant in late 2021, numerous SARS-CoV-2 variants emerged, were established, and declined, often with different outcomes in different geographic areas. In this study, we considered the trajectory of the Mu variant, which only successfully dominated the epidemic landscape of a single country: Colombia. We demonstrate that Mu competed successfully there due to its early and opportune introduction time in late 2020, combined with its ability to evade immunity granted by prior infection or the first generation of vaccines. Mu likely did not effectively spread outside of Colombia because other immune-evading variants, such as Delta, had arrived in those locales and established themselves first. On the other hand, Mu's early spread within Colombia may have prevented the successful establishment of Delta there. Our analysis highlights the geographic heterogeneity of early SARS-CoV-2 variant spread and helps to reframe the expectations for the competition behaviors of future variants.
Collapse
Affiliation(s)
- Gregory S. Orf
- Infectious Disease Research, Abbott Diagnostics Division, Abbott Laboratories, Abbott Park, Illinois, USA
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
| | - Lester J. Pérez
- Infectious Disease Research, Abbott Diagnostics Division, Abbott Laboratories, Abbott Park, Illinois, USA
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
| | - Karl Ciuoderis
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
- UW-GHI One Health Colombia, Universidad Nacional de Colombia Sede en Medellín, Medellín, Colombia
| | - Andrés Cardona
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
- UW-GHI One Health Colombia, Universidad Nacional de Colombia Sede en Medellín, Medellín, Colombia
| | - Simón Villegas
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
- UW-GHI One Health Colombia, Universidad Nacional de Colombia Sede en Medellín, Medellín, Colombia
| | - Juan P. Hernández-Ortiz
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
- UW-GHI One Health Colombia, Universidad Nacional de Colombia Sede en Medellín, Medellín, Colombia
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Laboratory of Clinical and Evolutionary Virology, Rega Institute, KU Leuven, Leuven, Belgium
| | - Aurash Mohaimani
- Infectious Disease Research, Abbott Diagnostics Division, Abbott Laboratories, Abbott Park, Illinois, USA
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
| | - Jorge E. Osorio
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
- UW-GHI One Health Colombia, Universidad Nacional de Colombia Sede en Medellín, Medellín, Colombia
- UW-GHI One Health Colombia, University of Wisconsin—Madison, Madison, Wisconsin, USA
| | - Michael G. Berg
- Infectious Disease Research, Abbott Diagnostics Division, Abbott Laboratories, Abbott Park, Illinois, USA
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
| | - Gavin A. Cloherty
- Infectious Disease Research, Abbott Diagnostics Division, Abbott Laboratories, Abbott Park, Illinois, USA
- Abbott Pandemic Defense Coalition (APDC), Abbott Park, Illinois, USA
| |
Collapse
|
6
|
Yu X. Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences. BMC Bioinformatics 2023; 24:218. [PMID: 37254048 DOI: 10.1186/s12859-023-05356-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 05/25/2023] [Indexed: 06/01/2023] Open
Abstract
BACKGROUND Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extracting genomic information are becoming increasingly important for answering key research questions. RESULTS Here we present Genofunc, a toolkit offering a range of command line orientated functions for processing of raw virus genome sequences into aligned and annotated data ready for analysis. The tool contains functions such as genome annotation, feature extraction etc. for processing of large genomic datasets both manual or as part of pipeline such as Snakemake or Nextflow ready for down-stream phylogenetic analysis. Originally designed for a large-scale HIV sequencing project, Genofunc has been benchmarked against annotated sequence gene coordinates from the Los Alamos HIV database as validation with downstream phylogenetic analysis result comparable to past literature as case study. CONCLUSION Genofunc is implemented fully in Python and licensed under the MIT license. Source code and documentation is available at: https://github.com/xiaoyu518/genofunc .
Collapse
Affiliation(s)
- Xiaoyu Yu
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, Scotland, UK.
| |
Collapse
|
7
|
Fisher AA, Hassler GW, Ji X, Baele G, Suchard MA, Lemey P. Scalable Bayesian phylogenetics. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210242. [PMID: 35989603 PMCID: PMC9393558 DOI: 10.1098/rstb.2021.0242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Recent advances in Bayesian phylogenetics offer substantial computational savings to accommodate increased genomic sampling that challenges traditional inference methods. In this review, we begin with a brief summary of the Bayesian phylogenetic framework, and then conceptualize a variety of methods to improve posterior approximations via Markov chain Monte Carlo (MCMC) sampling. Specifically, we discuss methods to improve the speed of likelihood calculations, reduce MCMC burn-in, and generate better MCMC proposals. We apply several of these techniques to study the evolution of HIV virulence along a 1536-tip phylogeny and estimate the internal node heights of a 1000-tip SARS-CoV-2 phylogenetic tree in order to illustrate the speed-up of such analyses using current state-of-the-art approaches. We conclude our review with a discussion of promising alternatives to MCMC that approximate the phylogenetic posterior. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
| | - Gabriel W. Hassler
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
| | - Xiang Ji
- Department of Mathematics, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| | - Marc A. Suchard
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA,Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, CA 90095, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
8
|
Ye C, Thornlow B, Hinrichs A, Kramer A, Mirchandani C, Torvi D, Lanfear R, Corbett-Detig R, Turakhia Y. matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2. Bioinformatics 2022; 38:3734-3740. [PMID: 35731204 PMCID: PMC9344837 DOI: 10.1093/bioinformatics/btac401] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 05/21/2022] [Accepted: 06/16/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic. RESULTS Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences. AVAILABILITY AND IMPLEMENTATION The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng Ye
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA 92093, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie Hinrichs
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexander Kramer
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cade Mirchandani
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Devika Torvi
- Department of Bioengineering, University of California, San Diego, San Diego, CA 92093, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA 92093, USA
| |
Collapse
|
9
|
McBroome J, Martin J, de Bernardi Schneider A, Turakhia Y, Corbett-Detig R. Identifying SARS-CoV-2 regional introductions and transmission clusters in real time. Virus Evol 2022; 8:veac048. [PMID: 35769891 PMCID: PMC9214145 DOI: 10.1093/ve/veac048] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/04/2022] [Accepted: 06/13/2022] [Indexed: 12/31/2022] Open
Abstract
The unprecedented severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) global sequencing effort has suffered from an analytical bottleneck. Many existing methods for phylogenetic analysis are designed for sparse, static datasets and are too computationally expensive to apply to densely sampled, rapidly expanding datasets when results are needed immediately to inform public health action. For example, public health is often concerned with identifying clusters of closely related samples, but the sheer scale of the data prevents manual inspection and the current computational models are often too expensive in time and resources. Even when results are available, intuitive data exploration tools are of critical importance to effective public health interpretation and action. To help address this need, we present a phylogenetic heuristic that quickly and efficiently identifies newly introduced strains in a region, resulting in clusters of infected individuals, and their putative geographic origins. We show that this approach performs well on simulated data and yields results largely congruent with more sophisticated Bayesian phylogeographic modeling approaches. We also introduce Cluster-Tracker (https://clustertracker.gi.ucsc.edu/), a novel interactive web-based tool to facilitate effective and intuitive SARS-CoV-2 geographic data exploration and visualization across the USA. Cluster-Tracker is updated daily and automatically identifies and highlights groups of closely related SARS-CoV-2 infections resulting from the transmission of the virus between two geographic areas by travelers, streamlining public health tracking of local viral diversity and emerging infection clusters. The site is open-source and designed to be easily configured to analyze any chosen region, making it a useful resource globally. The combination of these open-source tools will empower detailed investigations of the geographic origins and spread of SARS-CoV-2 and other densely sampled pathogens.
Collapse
Affiliation(s)
- Jakob McBroome
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| | - Jennifer Martin
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| | - Adriano de Bernardi Schneider
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| | - Yatish Turakhia
- Electrical and Computer Engineering, University of California, San Diego 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Russell Corbett-Detig
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| |
Collapse
|
10
|
Featherstone LA, Zhang JM, Vaughan TG, Duchene S. Epidemiological Inference From Pathogen Genomes: A Review of Phylodynamic Models and Applications. Virus Evol 2022; 8:veac045. [PMID: 35775026 PMCID: PMC9241095 DOI: 10.1093/ve/veac045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 05/23/2022] [Accepted: 06/02/2022] [Indexed: 11/24/2022] Open
Abstract
Phylodynamics requires an interdisciplinary understanding of phylogenetics, epidemiology, and statistical inference. It has also experienced more intense application than ever before amid the SARS-CoV-2 pandemic. In light of this, we present a review of phylodynamic models beginning with foundational models and assumptions. Our target audience is public health researchers, epidemiologists, and biologists seeking a working knowledge of the links between epidemiology, evolutionary models, and resulting epidemiological inference. We discuss the assumptions linking evolutionary models of pathogen population size to epidemiological models of the infected population size. We then describe statistical inference for phylodynamic models and list how output parameters can be rearranged for epidemiological interpretation. We go on to cover more sophisticated models and finish by highlighting future directions.
Collapse
Affiliation(s)
- Leo A Featherstone
- Peter Doherty Institute for Infection and Immunity, University of Melbourne , Australia
| | - Joshua M Zhang
- Peter Doherty Institute for Infection and Immunity, University of Melbourne , Australia
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zurich , Basel, Switzerland
- Swiss Institute of Bioinformatics
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, University of Melbourne , Australia
| |
Collapse
|
11
|
Thornlow B, Kramer A, Ye C, De Maio N, McBroome J, Hinrichs AS, Lanfear R, Turakhia Y, Corbett-Detig R. Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2021.12.02.471004. [PMID: 35611334 PMCID: PMC9128781 DOI: 10.1101/2021.12.02.471004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 10 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo , we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored.
Collapse
Affiliation(s)
- Bryan Thornlow
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Alexander Kramer
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Cheng Ye
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus; Cambridge CB10 1SD, UK
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University; Canberra, ACT 2601, Australia
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| |
Collapse
|
12
|
Wirth W, Duchene S. Real-time and remote MCMC trace inspection with Beastiary. Mol Biol Evol 2022; 39:6584747. [PMID: 35552742 PMCID: PMC9156035 DOI: 10.1093/molbev/msac095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Bayesian phylogenetics has gained substantial popularity in the last decade, with most implementations relying on Markov chain Monte Carlo (MCMC). The computational demands of MCMC mean that remote servers are increasingly used. We present Beastiary, a package for real-time and remote inspection of log files generated by MCMC analyses. Beastiary is an easily deployed web-app that can be used to summarize and visualize the output of many popular software packages including BEAST, BEAST2, RevBayes, and MrBayes via a web browser. We describe the design and implementation of Beastiary and some typical use-cases, with a focus on real-time remote monitoring.
Collapse
Affiliation(s)
- Wytamma Wirth
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
| | - Sebastian Duchene
- Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia
| |
Collapse
|
13
|
Cappello L, Kim J, Liu S, Palacios JA. Statistical Challenges in Tracking the Evolution of SARS-CoV-2. Stat Sci 2022; 37:162-182. [PMID: 36034090 PMCID: PMC9409356 DOI: 10.1214/22-sts853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Lorenzo Cappello is Assistant Professor, Departments of Economics and Business, Universitat Pompeu Fabra, 08005, Spain
| | - Jaehee Kim
- Jaehee Kim is Assistant Professor, Department of Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Sifan Liu
- Sifan Liu is a Ph.D. student, Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Julia A. Palacios
- Julia A. Palacios is Assistant Professor, Departments of Statistics and Biomedical Data Sciences, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
14
|
Progress and challenges in virus genomic epidemiology. Trends Parasitol 2021; 37:1038-1049. [PMID: 34620561 DOI: 10.1016/j.pt.2021.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/24/2021] [Accepted: 08/26/2021] [Indexed: 12/18/2022]
Abstract
Genomic epidemiology, which links pathogen genomes with associated metadata to understand disease transmission, has become a key component of outbreak response. Decreasing costs of genome sequencing and increasing computational power provide opportunities to generate and analyse large viral genomic datasets that aim to uncover the spatial scales of transmission, the demographics contributing to transmission patterns, and to forecast epidemic trends. Emerging sources of genomic data and associated metadata provide new opportunities to further unravel transmission patterns. Key challenges include how to integrate genomic data with metadata from multiple sources, how to generate efficient computational algorithms to cope with large datasets, and how to establish sampling frameworks to enable robust conclusions.
Collapse
|
15
|
Exploiting genomic surveillance to map the spatio-temporal dispersal of SARS-CoV-2 spike mutations in Belgium across 2020. Sci Rep 2021; 11:18580. [PMID: 34535691 PMCID: PMC8448849 DOI: 10.1038/s41598-021-97667-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 08/24/2021] [Indexed: 11/21/2022] Open
Abstract
At the end of 2020, several new variants of SARS-CoV-2—designated variants of concern—were detected and quickly suspected to be associated with a higher transmissibility and possible escape of vaccine-induced immunity. In Belgium, this discovery has motivated the initiation of a more ambitious genomic surveillance program, which is drastically increasing the number of SARS-CoV-2 genomes to analyse for monitoring the circulation of viral lineages and variants of concern. In order to efficiently analyse the massive collection of genomic data that are the result of such increased sequencing efforts, streamlined analytical strategies are crucial. In this study, we illustrate how to efficiently map the spatio-temporal dispersal of target mutations at a regional level. As a proof of concept, we focus on the Belgian province of Liège that has been consistently sampled throughout 2020, but was also one of the main epicenters of the second European epidemic wave. Specifically, we employ a recently developed phylogeographic workflow to infer the regional dispersal history of viral lineages associated with three specific mutations on the spike protein (S98F, A222V and S477N) and to quantify their relative importance through time. Our analytical pipeline enables analysing large data sets and has the potential to be quickly applied and updated to track target mutations in space and time throughout the course of an epidemic.
Collapse
|
16
|
Abidi SH, Nduva GM, Siddiqui D, Rafaqat W, Mahmood SF, Siddiqui AR, Nathwani AA, Hotwani A, Shah SA, Memon S, Sheikh SA, Khan P, Esbjörnsson J, Ferrand RA, Mir F. Phylogenetic and Drug-Resistance Analysis of HIV-1 Sequences From an Extensive Paediatric HIV-1 Outbreak in Larkana, Pakistan. Front Microbiol 2021; 12:658186. [PMID: 34484134 PMCID: PMC8415901 DOI: 10.3389/fmicb.2021.658186] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 07/21/2021] [Indexed: 12/01/2022] Open
Abstract
Introduction In April 2019, an HIV-1 outbreak among children occurred in Larkana, Pakistan, affecting more than a thousand children. It was assumed that the outbreak originated from a single source, namely a doctor at a private health facility. In this study, we performed subtype distribution, phylogenetic and drug-resistance analysis of HIV-1 sequences from 2019 outbreak in Larkana, Pakistan. Methods A total of 401 blood samples were collected between April–June 2019, from children infected with HIV-1 aged 0–15 years recruited into a case-control study to investigate the risk factors for HIV-1 transmission. Partial HIV-1 pol sequences were generated from 344 blood plasma samples to determine HIV-1 subtype and drug resistance mutations (DRM). Maximum-likelihood phylogenetics based on outbreak and reference sequences was used to identify transmission clusters and assess the relationship between outbreak and key population sequences between and within the determined clusters. Bayesian analysis was employed to identify the time to the most recent common recent ancestor (tMRCA) of the main Pakistani clusters. Results The HIV-1 circulating recombinant form (CRF) 02_AG and subtype A1 were most common among the outbreak sequences. Of the treatment-naïve participants, the two most common mutations were RT: E138A (8%) and RT: K219Q (8%). Four supported clusters within the outbreak were identified, and the median tMRCAs of the Larkana outbreak sequences were estimated to 2016 for both the CRF02_AG and the subtype A1 clusters. Furthermore, outbreak sequences exhibited no phylogenetic mixing with sequences from other high-risk groups of Pakistan. Conclusion The presence of multiple clusters indicated a multi-source outbreak, rather than a single source outbreak from a single health practitioner as previously suggested. The multiple introductions were likely a consequence of ongoing transmission within the high-risk groups of Larkana, and it is possible that the so-called Larkana strain was introduced into the general population through poor infection prevention control practices in healthcare settings. The study highlights the need to scale up HIV-1 prevention programmes among key population groups and improving infection prevention control in Pakistan.
Collapse
Affiliation(s)
- Syed Hani Abidi
- Department of Biological and Biomedical Sciences, Aga Khan University, Karachi, Pakistan
| | - George Makau Nduva
- Department of Translational Medicine, Lund University, Lund, Sweden.,Kenya Medical Research Institute-Wellcome Trust Research Programme, Kilifi, Kenya
| | - Dilsha Siddiqui
- Department of Biological and Biomedical Sciences, Aga Khan University, Karachi, Pakistan
| | | | | | | | - Apsara Ali Nathwani
- Department of Pediatrics and Child Health, Aga Khan University, Karachi, Pakistan
| | - Aneeta Hotwani
- Department of Pediatrics and Child Health, Aga Khan University, Karachi, Pakistan
| | | | - Sikander Memon
- Sindh AIDS Control Program, Ministry of Health, Karachi, Pakistan
| | - Saqib Ali Sheikh
- Sindh AIDS Control Program, Ministry of Health, Karachi, Pakistan
| | - Palwasha Khan
- Department of Clinical Research, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Joakim Esbjörnsson
- Department of Translational Medicine, Lund University, Lund, Sweden.,The Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Rashida Abbas Ferrand
- Department of Pediatrics and Child Health, Aga Khan University, Karachi, Pakistan.,Department of Clinical Research, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Fatima Mir
- Department of Pediatrics and Child Health, Aga Khan University, Karachi, Pakistan
| |
Collapse
|
17
|
Lemey P, Ruktanonchai N, Hong SL, Colizza V, Poletto C, Van den Broeck F, Gill MS, Ji X, Levasseur A, Oude Munnink BB, Koopmans M, Sadilek A, Lai S, Tatem AJ, Baele G, Suchard MA, Dellicour S. Untangling introductions and persistence in COVID-19 resurgence in Europe. Nature 2021; 595:713-717. [PMID: 34192736 PMCID: PMC8324533 DOI: 10.1038/s41586-021-03754-2] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 06/22/2021] [Indexed: 11/09/2022]
Abstract
After the first wave of SARS-CoV-2 infections in spring 2020, Europe experienced a resurgence of the virus starting in late summer 2020 that was deadlier and more difficult to contain1. Relaxed intervention measures and summer travel have been implicated as drivers of the second wave2. Here we build a phylogeographical model to evaluate how newly introduced lineages, as opposed to the rekindling of persistent lineages, contributed to the resurgence of COVID-19 in Europe. We inform this model using genomic, mobility and epidemiological data from 10 European countries and estimate that in many countries more than half of the lineages circulating in late summer resulted from new introductions since 15 June 2020. The success in onward transmission of newly introduced lineages was negatively associated with the local incidence of COVID-19 during this period. The pervasive spread of variants in summer 2020 highlights the threat of viral dissemination when restrictions are lifted, and this needs to be carefully considered in strategies to control the current spread of variants that are more transmissible and/or evade immunity. Our findings indicate that more effective and coordinated measures are required to contain the spread through cross-border travel even as vaccination is reducing disease burden.
Collapse
Affiliation(s)
- Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium.
- Global Virus Network (GVN), Baltimore, MD, USA.
| | - Nick Ruktanonchai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton, UK
- Population Health Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Samuel L Hong
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Vittoria Colizza
- INSERM, Sorbonne Université, Institut Pierre Louis d'Epidémiologie et de Santé Publique IPLESP, Paris, France
| | - Chiara Poletto
- INSERM, Sorbonne Université, Institut Pierre Louis d'Epidémiologie et de Santé Publique IPLESP, Paris, France
| | - Frederik Van den Broeck
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
- Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Mandev S Gill
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Xiang Ji
- Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA, USA
| | - Anthony Levasseur
- UMR MEPHI (Microbes, Evolution, Phylogeny and Infections), Aix-Marseille Université (AMU) and Institut Universitaire de France (IUF), Marseille, France
| | - Bas B Oude Munnink
- Department of Viroscience, WHO Collaborating Centre for Arbovirus and Viral Hemorrhagic Fever Reference and Research, Erasmus MC, Rotterdam, The Netherlands
| | - Marion Koopmans
- Department of Viroscience, WHO Collaborating Centre for Arbovirus and Viral Hemorrhagic Fever Reference and Research, Erasmus MC, Rotterdam, The Netherlands
| | | | - Shengjie Lai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton, UK
| | - Andrew J Tatem
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton, UK
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Simon Dellicour
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium.
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, Bruxelles, Belgium.
| |
Collapse
|
18
|
Didelot X, Siveroni I, Volz EM. Additive Uncorrelated Relaxed Clock Models for the Dating of Genomic Epidemiology Phylogenies. Mol Biol Evol 2021; 38:307-317. [PMID: 32722797 PMCID: PMC8480190 DOI: 10.1093/molbev/msaa193] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences, University of Warwick, Coventry, United Kingdom.,Department of Statistics, University of Warwick, Coventry, United Kingdom
| | - Igor Siveroni
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Erik M Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| |
Collapse
|
19
|
Yang J, Wang H, Zhang X, Yang S, Xu H, Zhang W. Viral metagenomic identification of a novel anellovirus in blood sample of a child with atopic dermatitis. J Med Virol 2021; 93:4038-4041. [PMID: 33058155 DOI: 10.1002/jmv.26603] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/11/2020] [Accepted: 10/12/2020] [Indexed: 12/15/2022]
Abstract
Here, using viral metagenomics, a novel anellovirus with strain name HuAV-zj-ad1 was detected in blood sample from a child with atopic dermatitis. The complete genome sequence of HuAV-zj-ad1 was determined and fully characterized. The circular genome of HuAV-zj-ad1 is 2841 nt in length and includes four polyprotein ORFs. Phylogenetic analysis and pairwise sequence comparisons based on the amino acid sequences of ORF1, ORF2, ORF3, ORF4 indicated that HuAV-zj-ad1 belonged to a novel species within the genus Betatorquevirus. Polymerase chain reaction screening results showed this anellovirus was not present 50 blood samples from normal children. Whether this novel species of anellovirus has association with a certain disease needs further study.
Collapse
Affiliation(s)
- Jie Yang
- Department of Dermatology, The Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
- Department of Microbiology, School of Medicine, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Hao Wang
- Department of Clinical Laboratory, Huai'an Hospital, Xuzhou Medical University, Huai'an, Jiangsu, China
| | - Xiaodan Zhang
- Zhenjiang Center for Disease Prevention and Control, Zhenjiang, Jiangsu, China
| | - Shixing Yang
- Department of Microbiology, School of Medicine, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Hui Xu
- Department of Dermatology, The Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu, China
| | - Wen Zhang
- Department of Microbiology, School of Medicine, Jiangsu University, Zhenjiang, Jiangsu, China
| |
Collapse
|
20
|
Hong SL, Lemey P, Suchard MA, Baele G. Bayesian Phylogeographic Analysis Incorporating Predictors and Individual Travel Histories in BEAST. Curr Protoc 2021; 1:e98. [PMID: 33836121 DOI: 10.1002/cpz1.98] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Advances in sequencing technologies have tremendously reduced the time and costs associated with sequence generation, making genomic data an important asset for routine public health practices. Within this context, phylogenetic and phylogeographic inference has become a popular method to study disease transmission. In a Bayesian context, these approaches have the benefit of accommodating phylogenetic uncertainty, and popular implementations provide the possibility to parameterize the transition rates between locations as a function of epidemiological and ecological data to reconstruct spatial spread while simultaneously identifying the main factors impacting the spatial spread dynamics. Recent developments enable researchers to make use of travel history data of infected individuals in the reconstruction of pathogen spread, offering increased inference accuracy and mitigating sampling bias. Here, we describe a detailed workflow to reconstruct the spatial spread of a pathogen through Bayesian phylogeographic analysis in discrete space using these novel approaches, implemented in BEAST. The individual protocols focus on how to incorporate molecular data, covariates of spread, and individual travel history data into the analysis. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Creating a SARS-CoV-2 MSA using sequences from GISAID Basic Protocol 2: Setting up a discrete trait phylogeographic reconstruction in BEAUti Basic Protocol 3: Phylogeographic reconstruction incorporating travel history information Basic Protocol 4: Visualizing ancestral spatial trajectories for specific taxa.
Collapse
Affiliation(s)
- Samuel L Hong
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California.,Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, California.,Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| |
Collapse
|
21
|
Lemey P, Ruktanonchai N, Hong SL, Colizza V, Poletto C, Van den Broeck F, Gill MS, Ji X, Levasseur A, Sadilek A, Lai S, Tatem AJ, Baele G, Suchard MA, Dellicour S. SARS-CoV-2 European resurgence foretold: interplay of introductions and persistence by leveraging genomic and mobility data. RESEARCH SQUARE 2021:rs.3.rs-208849. [PMID: 33594355 PMCID: PMC7885927 DOI: 10.21203/rs.3.rs-208849/v1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Following the first wave of SARS-CoV-2 infections in spring 2020, Europe experienced a resurgence of the virus starting late summer that was deadlier and more difficult to contain. Relaxed intervention measures and summer travel have been implicated as drivers of the second wave. Here, we build a phylogeographic model to evaluate how newly introduced lineages, as opposed to the rekindling of persistent lineages, contributed to the COVID-19 resurgence in Europe. We inform this model using genomic, mobility and epidemiological data from 10 West European countries and estimate that in many countries more than 50% of the lineages circulating in late summer resulted from new introductions since June 15th. The success in onwards transmission of these lineages is predicted by SARS-CoV-2 incidence during this period. Relatively early introductions from Spain into the United Kingdom contributed to the successful spread of the 20A.EU1/B.1.177 variant. The pervasive spread of variants that have not been associated with an advantage in transmissibility highlights the threat of novel variants of concern that emerged more recently and have been disseminated by holiday travel. Our findings indicate that more effective and coordinated measures are required to contain spread through cross-border travel.
Collapse
Affiliation(s)
- Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
- Global Virus Network (GVN), Baltimore, MD, USA
| | - Nick Ruktanonchai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton SO17 1BJ, UK
- Population Health Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Samuel L Hong
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Vittoria Colizza
- INSERM, Sorbonne Université, Institut Pierre Louis d'Epidémiologie et de Santé Publique IPLESP, F75012 Paris, France
| | - Chiara Poletto
- INSERM, Sorbonne Université, Institut Pierre Louis d'Epidémiologie et de Santé Publique IPLESP, F75012 Paris, France
| | - Frederik Van den Broeck
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
- Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Mandev S Gill
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Xiang Ji
- Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, LA, USA
| | - Anthony Levasseur
- Microbes, Evolution, Phylogeny and Infection, Aix-Marseille Université and Marseille Institut Universitaire de France, Marseille, France
| | | | - Shengjie Lai
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton SO17 1BJ, UK
| | - Andrew J Tatem
- WorldPop, School of Geography and Environmental Science, University of Southampton, Southampton SO17 1BJ, UK
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
| | - Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Simon Dellicour
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven, Belgium
- Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, CP160/12, 50 av. FD Roosevelt, 1050 Bruxelles, Belgium
| |
Collapse
|
22
|
Vrancken B, Mehta SR, Ávila-Ríos S, García-Morales C, Tapia-Trejo D, Reyes-Terán G, Navarro-Álvarez S, Little SJ, Hoenigl M, Pines HA, Patterson T, Strathdee SA, Smith DM, Dellicour S, Chaillon A. Dynamics and Dispersal of Local HIV Epidemics Within San Diego and Across The San Diego-Tijuana Border. Clin Infect Dis 2020; 73:e2018-e2025. [PMID: 33079188 DOI: 10.1093/cid/ciaa1588] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Evolutionary analyses of well-annotated HIV sequence data can provide insights into viral transmission patterns and associated factors. Here, we explored the transmission dynamics of the HIV-1 subtype B epidemic across the San Diego (US) - Tijuana (Mexico) border region to identify factors that could help guide public health policy. METHODS HIV pol sequences were collected from people with HIV in San Diego County and from Tijuana between 1996-2018. A multistep phylogenetic approach was used to characterize the dynamics of spread. The contribution of geospatial factors and HIV risk group to the local dynamics were evaluated. RESULTS Phylogeographic analyses of the 2,034 sequences revealed an important contribution of local transmission in sustaining the epidemic, as well as a complex viral migration network across the region. Geospatial viral dispersal between San Diego communities occurred predominantly among men-who-have-sex with-men with central San Diego being the main source (34.9%) and recipient (39.5%) of migration events. HIV migration was more frequent from San Diego county towards Tijuana than vice versa. Migrations were best explained by driving time between locations. CONCLUSION The US-Mexico border may not be a major barrier to the spread of HIV, which may stimulate coordinated transnational intervention approaches. Whereas a focus on central San Diego has the potential to avert most spread, the substantial viral migration independent of central San Diego shows that county-wide efforts will be more effective. Combined, this work shows that epidemiological information gleaned from pathogen genomes can uncover mechanisms that underlie sustained spread and, in turn, can be a building block of public health decision making.
Collapse
Affiliation(s)
- Bram Vrancken
- Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory for Computational and Evolutionary Virology, KU Leuven, Herestraat, Leuven, Belgium
| | - Sanjay R Mehta
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Santiago Ávila-Ríos
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Calzada de Tlalpan, Colonia Sección XVI, CP, Mexico City, Mexico
| | - Claudia García-Morales
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Calzada de Tlalpan, Colonia Sección XVI, CP, Mexico City, Mexico
| | - Daniela Tapia-Trejo
- Centre for Research in Infectious Diseases, National Institute of Respiratory Diseases, Calzada de Tlalpan, Colonia Sección XVI, CP, Mexico City, Mexico
| | - Gustavo Reyes-Terán
- Coordinating Commission of the Mexican National Institutes of Health, Periférico Sur, Arenal Tepepan, Mexico City, Mexico
| | | | - Susan J Little
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Martin Hoenigl
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Heather A Pines
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Thomas Patterson
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Steffanie A Strathdee
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Davey M Smith
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| | - Simon Dellicour
- Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory for Computational and Evolutionary Virology, KU Leuven, Herestraat, Leuven, Belgium.,Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, av. FD Roosevelt, Bruxelles, Belgium
| | - Antoine Chaillon
- Division of Infectious Diseases and Global Public Health, University of California San Diego, CA
| |
Collapse
|