151
|
Ye C, Thornlow B, Hinrichs A, Kramer A, Mirchandani C, Torvi D, Lanfear R, Corbett-Detig R, Turakhia Y. matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2. Bioinformatics 2022; 38:3734-3740. [PMID: 35731204 PMCID: PMC9344837 DOI: 10.1093/bioinformatics/btac401] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 05/21/2022] [Accepted: 06/16/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic. RESULTS Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences. AVAILABILITY AND IMPLEMENTATION The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng Ye
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA 92093, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie Hinrichs
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alexander Kramer
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cade Mirchandani
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Devika Torvi
- Department of Bioengineering, University of California, San Diego, San Diego, CA 92093, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA 92093, USA
| |
Collapse
|
152
|
Hare D, Meaney C, Powell J, Slevin B, O' Brien B, Power L, O' Connell NH, De Gascun CF, Dunne CP, Stapleton PJ. Repeated transmission of SARS-CoV-2 in an overcrowded Irish emergency department elucidated by whole-genome sequencing. J Hosp Infect 2022; 126:1-9. [PMID: 35562074 PMCID: PMC9088210 DOI: 10.1016/j.jhin.2022.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 04/20/2022] [Accepted: 04/28/2022] [Indexed: 01/11/2023]
Abstract
AIM To provide a detailed genomic-epidemiological description of a complex multi-ward SARS-CoV-2 outbreak, which originated in the crowded emergency department (ED) in our hospital during the third wave of the COVID-19 pandemic, and was elucidated promptly by local whole-genome sequencing (WGS). METHODS SARS-CoV-2 was detected by reverse transcriptase real-time polymerase chain reaction on viral RNA extracted from nasopharyngeal swabs. WGS was performed using an Oxford MinION Mk1C instrument following the ARTIC v3 sequencing protocol. High-quality consensus genomes were assembled with the artic-ncov2019 bioinformatics pipeline and viral phylogenetic trees were built, inferred by maximum-likelihood. Clusters were defined using a threshold of 0-1 single nucleotide polymorphisms (SNPs) between epidemiologically linked sequences. RESULTS In April 2021, outbreaks of COVID-19 were declared on two wards at University Hospital Limerick after 4 healthcare-associated SARS-CoV-2 infections were detected by post-admission surveillance testing. Contact tracing identified 12 further connected cases; all with direct or indirect links to the ED 'COVID Zone'. All sequences were assigned to the Pangolin B.1.1.7 lineage by WGS, and SNP-level analysis revealed two distinct but simultaneous clusters of infections. Repeated transmission in the ED was demonstrated, involving patients accommodated on trolleys in crowded areas, resulting in multiple generations of infections across three inpatient hospital wards and subsequently to the local community. These findings informed mitigation efforts to prevent cross-transmission in the ED. CONCLUSION Cross-transmission of SARS-CoV-2 occurred repeatedly in an overcrowded emergency department. Viral WGS elucidated complex viral transmission networks in our hospital and informed infection, prevention and control practice.
Collapse
Affiliation(s)
- D Hare
- Department of Clinical Microbiology, University Hospital Limerick, St Nessan's Road, Dooradoyle, Limerick, Ireland; School of Medicine, University of Limerick, Limerick, Ireland; UCD National Virus Reference Laboratory, University College Dublin, Dublin, Ireland.
| | - C Meaney
- Department of Clinical Microbiology, University Hospital Limerick, St Nessan's Road, Dooradoyle, Limerick, Ireland
| | - J Powell
- Department of Clinical Microbiology, University Hospital Limerick, St Nessan's Road, Dooradoyle, Limerick, Ireland; Centre for Interventions in Infection, Inflammation & Immunity (4i), University of Limerick, Limerick, Ireland
| | - B Slevin
- Department of Infection, Prevention and Control, University Hospital Limerick, Limerick, Ireland
| | - B O' Brien
- Department of Infection, Prevention and Control, University Hospital Limerick, Limerick, Ireland
| | - L Power
- Department of Clinical Microbiology, University Hospital Limerick, St Nessan's Road, Dooradoyle, Limerick, Ireland
| | - N H O' Connell
- Department of Clinical Microbiology, University Hospital Limerick, St Nessan's Road, Dooradoyle, Limerick, Ireland; School of Medicine, University of Limerick, Limerick, Ireland; Centre for Interventions in Infection, Inflammation & Immunity (4i), University of Limerick, Limerick, Ireland
| | - C F De Gascun
- UCD National Virus Reference Laboratory, University College Dublin, Dublin, Ireland
| | - C P Dunne
- School of Medicine, University of Limerick, Limerick, Ireland; Centre for Interventions in Infection, Inflammation & Immunity (4i), University of Limerick, Limerick, Ireland
| | - P J Stapleton
- Department of Clinical Microbiology, University Hospital Limerick, St Nessan's Road, Dooradoyle, Limerick, Ireland; School of Medicine, University of Limerick, Limerick, Ireland
| |
Collapse
|
153
|
Yang Y, Dufault-Thompson K, Salgado Fontenele R, Jiang X. Putative Host-Derived Insertions in the Genomes of Circulating SARS-CoV-2 Variants. mSystems 2022; 7:e0017922. [PMID: 35582907 PMCID: PMC9239191 DOI: 10.1128/msystems.00179-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/02/2022] [Indexed: 11/20/2022] Open
Abstract
Insertions in the SARS-CoV-2 genome have the potential to drive viral evolution, but the source of the insertions is often unknown. Recent proposals have suggested that human RNAs could be a source of some insertions, but the small size of many insertions makes this difficult to confirm. Through an analysis of available direct RNA sequencing data from SARS-CoV-2-infected cells, we show that viral-host chimeric RNAs are formed through what are likely stochastic RNA-dependent RNA polymerase template-switching events. Through an analysis of the publicly available GISAID SARS-CoV-2 genome collection, we identified two genomic insertions in circulating SARS-CoV-2 variants that are identical to regions of the human 18S and 28S rRNAs. These results provide direct evidence of the formation of viral-host chimeric sequences and the integration of host genetic material into the SARS-CoV-2 genome, highlighting the potential importance of host-derived insertions in viral evolution. IMPORTANCE Throughout the COVID-19 pandemic, the sequencing of SARS-CoV-2 genomes has revealed the presence of insertions in multiple globally circulating lineages of SARS-CoV-2, including the Omicron variant. The human genome has been suggested to be the source of some of the larger insertions, but evidence for this kind of event occurring is still lacking. Here, we leverage direct RNA sequencing data and SARS-CoV-2 genomes to show that host-viral chimeric RNAs are generated in infected cells and two large genomic insertions have likely been formed through the incorporation of host rRNA fragments into the SARS-CoV-2 genome. These host-derived insertions may increase the genetic diversity of SARS-CoV-2 and expand its strategies to acquire genetic material, potentially enhancing its adaptability, virulence, and spread.
Collapse
Affiliation(s)
- Yiyan Yang
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | - Xiaofang Jiang
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
154
|
McBroome J, Martin J, de Bernardi Schneider A, Turakhia Y, Corbett-Detig R. Identifying SARS-CoV-2 regional introductions and transmission clusters in real time. Virus Evol 2022; 8:veac048. [PMID: 35769891 PMCID: PMC9214145 DOI: 10.1093/ve/veac048] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/04/2022] [Accepted: 06/13/2022] [Indexed: 12/31/2022] Open
Abstract
The unprecedented severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) global sequencing effort has suffered from an analytical bottleneck. Many existing methods for phylogenetic analysis are designed for sparse, static datasets and are too computationally expensive to apply to densely sampled, rapidly expanding datasets when results are needed immediately to inform public health action. For example, public health is often concerned with identifying clusters of closely related samples, but the sheer scale of the data prevents manual inspection and the current computational models are often too expensive in time and resources. Even when results are available, intuitive data exploration tools are of critical importance to effective public health interpretation and action. To help address this need, we present a phylogenetic heuristic that quickly and efficiently identifies newly introduced strains in a region, resulting in clusters of infected individuals, and their putative geographic origins. We show that this approach performs well on simulated data and yields results largely congruent with more sophisticated Bayesian phylogeographic modeling approaches. We also introduce Cluster-Tracker (https://clustertracker.gi.ucsc.edu/), a novel interactive web-based tool to facilitate effective and intuitive SARS-CoV-2 geographic data exploration and visualization across the USA. Cluster-Tracker is updated daily and automatically identifies and highlights groups of closely related SARS-CoV-2 infections resulting from the transmission of the virus between two geographic areas by travelers, streamlining public health tracking of local viral diversity and emerging infection clusters. The site is open-source and designed to be easily configured to analyze any chosen region, making it a useful resource globally. The combination of these open-source tools will empower detailed investigations of the geographic origins and spread of SARS-CoV-2 and other densely sampled pathogens.
Collapse
Affiliation(s)
- Jakob McBroome
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| | - Jennifer Martin
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| | - Adriano de Bernardi Schneider
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| | - Yatish Turakhia
- Electrical and Computer Engineering, University of California, San Diego 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Russell Corbett-Detig
- Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz 1156 High St, Santa Cruz, CA 95064, USA
| |
Collapse
|
155
|
Cuddihy T, Harris PNA, Permana B, Beatson SA, Forde BM. CATHAI: cluster analysis tool for healthcare-associated infections. BIOINFORMATICS ADVANCES 2022; 2:vbac040. [PMID: 36699387 PMCID: PMC9710666 DOI: 10.1093/bioadv/vbac040] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 04/26/2022] [Accepted: 05/24/2022] [Indexed: 01/28/2023]
Abstract
Motivation Whole genome sequencing (WGS) is revolutionizing disease surveillance where it facilitates high-resolution clustering of related organism and outbreak detection. However, visualizing and efficiently communicating genomic data back to clinical staff is crucial for the successful deployment of a targeted infection control response. Results CATHAI (cluster analysis tool for healthcare-associated infections) is an interactive web-based visualization platform that couples WGS informed clustering with associated metadata, thereby converting sequencing data into informative and accessible clinical information for the management of healthcare-associated infections (HAI) and nosocomial outbreaks. Availability and implementation All code associated with this application are free available from https://github.com/FordeGenomics/cathai. A demonstration version of CATHAI is available online at https://cathai.fordelab.com.
Collapse
Affiliation(s)
- Thom Cuddihy
- University of Queensland, UQ Centre for Clinical Research, Brisbane, QLD 4029, Australia
| | - Patrick N A Harris
- University of Queensland, UQ Centre for Clinical Research, Brisbane, QLD 4029, Australia,Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia,Central Microbiology, Pathology Queensland, Royal Brisbane & Women’s Hospital, Herston, Brisbane, QLD 4029, Australia
| | - Budi Permana
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
| | - Scott A Beatson
- Australian Infectious Diseases Research Centre, University of Queensland, Brisbane, QLD 4072, Australia,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
| | | |
Collapse
|
156
|
Tan CCS, Lam SD, Richard D, Owen CJ, Berchtold D, Orengo C, Nair MS, Kuchipudi SV, Kapur V, van Dorp L, Balloux F. Transmission of SARS-CoV-2 from humans to animals and potential host adaptation. Nat Commun 2022; 13:2988. [PMID: 35624123 PMCID: PMC9142586 DOI: 10.1038/s41467-022-30698-6] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/13/2022] [Indexed: 12/16/2022] Open
Abstract
SARS-CoV-2, the causative agent of the COVID-19 pandemic, can infect a wide range of mammals. Since its spread in humans, secondary host jumps of SARS-CoV-2 from humans to multiple domestic and wild populations of mammals have been documented. Understanding the extent of adaptation to these animal hosts is critical for assessing the threat that the spillback of animal-adapted SARS-CoV-2 into humans poses. We compare the genomic landscapes of SARS-CoV-2 isolated from animal species to that in humans, profiling the mutational biases indicative of potentially different selective pressures in animals. We focus on viral genomes isolated from mink (Neovison vison) and white-tailed deer (Odocoileus virginianus) for which multiple independent outbreaks driven by onward animal-to-animal transmission have been reported. We identify five candidate mutations for animal-specific adaptation in mink (NSP9_G37E, Spike_F486L, Spike_N501T, Spike_Y453F, ORF3a_L219V), and one in deer (NSP3a_L1035F), though they appear to confer a minimal advantage for human-to-human transmission. No considerable changes to the mutation rate or evolutionary trajectory of SARS-CoV-2 has resulted from circulation in mink and deer thus far. Our findings suggest that minimal adaptation was required for onward transmission in mink and deer following human-to-animal spillover, highlighting the 'generalist' nature of SARS-CoV-2 as a mammalian pathogen.
Collapse
Affiliation(s)
- Cedric C S Tan
- UCL Genetics Institute, University College London, London, UK.
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.
| | - Su Datt Lam
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Damien Richard
- UCL Genetics Institute, University College London, London, UK
- Division of Infection and Immunity, University College London, London, UK
| | | | | | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Meera Surendran Nair
- Animal Diagnostic Laboratory, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, PA, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, PA, Pennsylvania, USA
| | - Suresh V Kuchipudi
- Animal Diagnostic Laboratory, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, PA, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, PA, Pennsylvania, USA
| | - Vivek Kapur
- Huck Institutes of the Life Sciences, The Pennsylvania State University, PA, Pennsylvania, USA
- Department of Animal Science, The Pennsylvania State University, PA, Pennsylvania, USA
| | - Lucy van Dorp
- UCL Genetics Institute, University College London, London, UK
| | | |
Collapse
|
157
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
158
|
Obermeyer F, Jankowiak M, Barkas N, Schaffner SF, Pyle JD, Yurkovetskiy L, Bosso M, Park DJ, Babadi M, MacInnis BL, Luban J, Sabeti PC, Lemieux JE. Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. Science 2022; 376:1327-1332. [PMID: 35608456 PMCID: PMC9161372 DOI: 10.1126/science.abm1208] [Citation(s) in RCA: 122] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Repeated emergence of SARS-CoV-2 variants with increased fitness underscores the value of rapid detection and characterization of new lineages. We have developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization.
Collapse
|
159
|
Thornlow B, Kramer A, Ye C, De Maio N, McBroome J, Hinrichs AS, Lanfear R, Turakhia Y, Corbett-Detig R. Online Phylogenetics using Parsimony Produces Slightly Better Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Approaches. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2021.12.02.471004. [PMID: 35611334 PMCID: PMC9128781 DOI: 10.1101/2021.12.02.471004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 datasets do not fit this mould. There are currently over 10 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) methods are more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger datasets. Here, we evaluate the performance of de novo and online phylogenetic approaches, and ML and MP frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimizations produce more accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo , we therefore propose that, in the context of comprehensive genomic epidemiology of SARS-CoV-2, MP online phylogenetics approaches should be favored.
Collapse
Affiliation(s)
- Bryan Thornlow
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Alexander Kramer
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Cheng Ye
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus; Cambridge CB10 1SD, UK
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University; Canberra, ACT 2601, Australia
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| |
Collapse
|
160
|
Nilgiriwala K, Kadam P, Patel G, Shaikh A, Mestry T, Vaswani S, Sakthivel S, Poojary A, Gandhi B, Rohra S, Udwadia Z, Oswal V, Shah D, Gomare M, Sriraman K, Mistry N. Genomics of Post-Vaccination SARS-CoV-2 Infections During the Delta Dominated Second Wave of COVID-19 Pandemic, from Mumbai Metropolitan Region (MMR), India. J Med Virol 2022; 94:4206-4215. [PMID: 35578378 PMCID: PMC9348366 DOI: 10.1002/jmv.27861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/10/2022] [Accepted: 05/12/2022] [Indexed: 11/05/2022]
Abstract
The present study was initiated to understand the proportion of predominant variants of SARS-CoV-2 in post-vaccination infections during the Delta dominated second wave of COVID-19 in the Mumbai Metropolitan Region (MMR) in India and to understand any mutations selected in the post-vaccination infections or showing association with any patient demographics. Samples were collected (n=166) from severe/moderate/mild COVID-19 patients who were either vaccinated (COVISHIELD/COVAXIN - partial/fully vaccinated) or unvaccinated, from a city hospital and from home isolation patients in MMR. A total of 150 viral genomes were sequenced by Oxford Nanopore sequencing and the data of 136 viral genomes were analyzed for clade/lineage and for identifying mutations. The sequences belonged to three clades (21A, 21I and 21J) and their lineage was identified as either Delta (B.1.617.2) or Delta+ (B.1.617.2 + K417N) or sub-lineages of Delta variant (AY.120/AY.38/AY.99). A total of 620 mutations were identified of which 10 mutations showed an increase in trend with time (May-Oct 2021). Associations of 6 mutations (2 in spike, 3 in orf1a and 1 in nucleocapsid) were shown with milder forms of the disease and one mutation (in orf1a) with partial vaccination status. The results indicate a trend towards reduction in disease severity as the wave progressed. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Kayzad Nilgiriwala
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Pratibha Kadam
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Grishma Patel
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Ambreen Shaikh
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Tejal Mestry
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Smriti Vaswani
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Shalini Sakthivel
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Aruna Poojary
- Breach Candy Hospital (BCH) Trust, 60 A Bhulabhai Desai Road, Mumbai, 400 026
| | - Bhavesh Gandhi
- Breach Candy Hospital (BCH) Trust, 60 A Bhulabhai Desai Road, Mumbai, 400 026
| | - Seema Rohra
- Breach Candy Hospital (BCH) Trust, 60 A Bhulabhai Desai Road, Mumbai, 400 026
| | - Zarir Udwadia
- Breach Candy Hospital (BCH) Trust, 60 A Bhulabhai Desai Road, Mumbai, 400 026
| | | | - Daksha Shah
- Municipal Corporation of Greater Mumbai (MCGM), Mumbai
| | | | - Kalpana Sriraman
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| | - Nerges Mistry
- The Foundation for Medical Research, Dr. Kantilal J. Sheth Memorial Building, 84-A, R. G. Thadani Marg, Worli, Mumbai, 400 018
| |
Collapse
|
161
|
Liu LT, Tsai JJ, Chang K, Chen CH, Lin PC, Tsai CY, Tsai YY, Hsu MC, Chuang WL, Chang JM, Hwang SJ, Chong IW. Identification and Analysis of SARS-CoV-2 Alpha Variants in the Largest Taiwan COVID-19 Outbreak in 2021. Front Med (Lausanne) 2022; 9:869818. [PMID: 35547225 PMCID: PMC9081839 DOI: 10.3389/fmed.2022.869818] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 03/23/2022] [Indexed: 12/23/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is believed to have originated in Wuhan City, Hubei Province, China, in December 2019. Infection with this highly dangerous human-infecting coronavirus via inhalation of respiratory droplets from SARS-CoV-2 carriers results in coronavirus disease 2019 (COVID-19), which features clinical symptoms such as fever, dry cough, shortness of breath, and life-threatening pneumonia. Several COVID-19 waves arose in Taiwan from January 2020 to March 2021, with the largest outbreak ever having a high case fatality rate (CFR) (5.95%) between May and June 2021. In this study, we identified five 20I (alpha, V1)/B.1.1.7/GR SARS-CoV-2 (KMUH-3 to 7) lineage viruses from COVID-19 patients in this largest COVID-19 outbreak. Sequence placement analysis using the existing SARS-CoV-2 phylogenetic tree revealed that KMUH-3 originated from Japan and that KMUH-4 to KMUH-7 possibly originated via local transmission. Spike mutations M1237I and D614G were identified in KMUH-4 to KMUH-7 as well as in 43 other alpha/B.1.1.7 sequences of 48 alpha/B.1.1.7 sequences deposited in GISAID derived from clinical samples collected in Taiwan between 20 April and July. However, M1237I mutation was not observed in the other 12 alpha/B.1.1.7 sequences collected between 26 December 2020, and 12 April 2021. We conclude that the largest COVID-19 outbreak in Taiwan between May and June 2021 was initially caused by the alpha/B.1.1.7 variant harboring spike D614G + M1237I mutations, which was introduced to Taiwan by China Airlines cargo crew members. To our knowledge, this is the first documented COVID-19 outbreak caused by alpha/B.1.1.7 variant harboring spike M1237I mutation thus far. The largest COVID-19 outbreak in Taiwan resulted in 13,795 cases and 820 deaths, with a high CFR, at 5.95%, accounting for 80.90% of all cases and 96.47% of all deaths during the first 2 years. The high CFR caused by SARS-CoV-2 alpha variants in Taiwan can be attributable to comorbidities and low herd immunity. We also suggest that timely SARS-CoV-2 isolation and/or sequencing are of importance in real-time epidemiological investigations and in epidemic prevention. The impact of G614G + M1237I mutations in the spike gene on the SARS-CoV-2 virus spreading as well as on high CFR remains to be elucidated.
Collapse
Affiliation(s)
- Li-Teh Liu
- Department of Medical Laboratory Science and Biotechnology, College of Medical Technology, Chung Hwa University of Medical Technology, Tainan, Taiwan
| | - Jih-Jin Tsai
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,Division of Infectious Diseases, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Ko Chang
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Department of Internal Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Chun-Hong Chen
- National Mosquito-Borne Diseases Control Research Center, National Health Research Institutes, Zhunan, Taiwan.,National Institute of Infectious Diseases and Vaccinology, National Health Research Institutes, Zhunan, Taiwan
| | - Ping-Chang Lin
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Ching-Yi Tsai
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Yan-Yi Tsai
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Miao-Chen Hsu
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Wan-Long Chuang
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Hepatobiliary and Pancreatic, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Jer-Ming Chang
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Shang-Jyh Hwang
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Inn-Wen Chong
- Department of Internal Medicine and Graduate Institute of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Department of Pulmonary Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| |
Collapse
|
162
|
Lo CC, Shakya M, Connor R, Davenport K, Flynn M, Gutiérrez AMY, Hu B, Li PE, Jackson EP, Xu Y, Chain PSG. EDGE COVID-19: a web platform to generate submission-ready genomes from SARS-CoV-2 sequencing efforts. Bioinformatics 2022; 38:2700-2704. [PMID: 35561186 PMCID: PMC9113274 DOI: 10.1093/bioinformatics/btac176] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 01/14/2022] [Accepted: 03/23/2022] [Indexed: 11/22/2022] Open
Abstract
SUMMARY Genomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high-quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina versus ONT technologies on workflow congruence and differences. AVAILABILITY AND IMPLEMENTATION https://edge-covid19.edgebioinformatics.org, and https://github.com/LANL-Bioinformatics/EDGE/tree/SARS-CoV2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Ryan Connor
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Karen Davenport
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Mark Flynn
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Po-E Li
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | - Yan Xu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | |
Collapse
|
163
|
Caraballo-Ortiz MA, Miura S, Sanderford M, Dolker T, Tao Q, Weaver S, Pond SLK, Kumar S. TopHap: rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity. Bioinformatics 2022; 38:2719-2726. [PMID: 35561179 PMCID: PMC9113349 DOI: 10.1093/bioinformatics/btac186] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 03/15/2022] [Accepted: 03/23/2022] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites but millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate and fast phylogenetic inference of resolvable phylogenetic features. RESULTS We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. We develop a bootstrap strategy that resamples genomes spatiotemporally to assess topological robustness. The application of TopHap to build a phylogeny of 68 057 SARS-CoV-2 genomes (68KG) from the first year of the pandemic produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million SARS-CoV-2 genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major and recent variants of concern. AVAILABILITY AND IMPLEMENTATION TopHap is available at https://github.com/SayakaMiura/TopHap. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcos A Caraballo-Ortiz
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Tenzin Dolker
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sergei L K Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
164
|
Majumdar S, Sarkar R. Mutational and phylogenetic analyses of the two lineages of the Omicron variant. J Med Virol 2022; 94:1777-1779. [PMID: 34964502 PMCID: PMC9015627 DOI: 10.1002/jmv.27558] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 12/25/2021] [Accepted: 12/27/2021] [Indexed: 11/16/2022]
Affiliation(s)
- Swagata Majumdar
- Center for Liver Research, School of Digestive and Liver DiseasesInstitute of Post Graduate Medical Education and ResearchKolkataWest BengalIndia
| | - Rakesh Sarkar
- ICMR‐National Institute of Cholera and Enteric DiseasesKolkataWest BengalIndia
| |
Collapse
|
165
|
Yang Y, Dufault-Thompson K, Fontenele RS, Jiang X. Putative host-derived insertions in the genomes of circulating SARS-CoV-2 variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.01.04.474799. [PMID: 35043112 PMCID: PMC8764720 DOI: 10.1101/2022.01.04.474799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Insertions in the SARS-CoV-2 genome have the potential to drive viral evolution, but the source of the insertions is often unknown. Recent proposals have suggested that human RNAs could be a source of some insertions, but the small size of many insertions makes this difficult to confirm. Through an analysis of available direct RNA sequencing data from SARS-CoV-2 infected cells, we show that viral-host chimeric RNAs are formed through what are likely stochastic RNA-dependent RNA polymerase template switching events. Through an analysis of the publicly available GISAID SARS-CoV-2 genome collection, we identified two genomic insertions in circulating SARS-CoV-2 variants that are identical to regions of the human 18S and 28S rRNAs. These results provide direct evidence of the formation of viral-host chimeric sequences and the integration of host genetic material into the SARS-CoV-2 genome, highlighting the potential importance of host-derived insertions in viral evolution. IMPORTANCE Throughout the COVID-19 pandemic, the sequencing of SARS-CoV-2 genomes has revealed the presence of insertions in multiple globally circulating lineages of SARS-CoV-2, including the Omicron variant. The human genome has been suggested to be the source of some of the larger insertions, but evidence for this kind of event occurring is still lacking. Here, we leverage direct RNA sequencing data and SARS-CoV-2 genomes to show host-viral chimeric RNAs are generated in infected cells and two large genomic insertions have likely been formed through the incorporation of host rRNA fragments into the SARS-CoV-2 genome. These host-derived insertions may increase the genetic diversity of SARS-CoV-2 and expand its strategies to acquire genetic materials, potentially enhancing its adaptability, virulence, and spread.
Collapse
Affiliation(s)
- Yiyan Yang
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | - Xiaofang Jiang
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
166
|
Karthikeyan S, Levy JI, De Hoff P, Humphrey G, Birmingham A, Jepsen K, Farmer S, Tubb HM, Valles T, Tribelhorn CE, Tsai R, Aigner S, Sathe S, Moshiri N, Henson B, Mark AM, Hakim A, Baer NA, Barber T, Belda-Ferre P, Chacón M, Cheung W, Cresini ES, Eisner ER, Lastrella AL, Lawrence ES, Marotz CA, Ngo TT, Ostrander T, Plascencia A, Salido RA, Seaver P, Smoot EW, McDonald D, Neuhard RM, Scioscia AL, Satterlund AM, Simmons EH, Abelman DB, Brenner D, Bruner JC, Buckley A, Ellison M, Gattas J, Gonias SL, Hale M, Hawkins F, Ikeda L, Jhaveri H, Johnson T, Kellen V, Kremer B, Matthews G, McLawhon RW, Ouillet P, Park D, Pradenas A, Reed S, Riggs L, Sanders A, Sollenberger B, Song A, White B, Winbush T, Aceves CM, Anderson C, Gangavarapu K, Hufbauer E, Kurzban E, Lee J, Matteson NL, Parker E, Perkins SA, Ramesh KS, Robles-Sikisaka R, Schwab MA, Spencer E, Wohl S, Nicholson L, Mchardy IH, Dimmock DP, Hobbs CA, Bakhtar O, Harding A, Mendoza A, Bolze A, Becker D, Cirulli ET, Isaksson M, Barrett KMS, Washington NL, Malone JD, Schafer AM, Gurfield N, Stous S, Fielding-Miller R, Garfein RS, Gaines T, Anderson C, Martin NK, Schooley R, Austin B, MacCannell DR, Kingsmore SF, Lee W, Shah S, McDonald E, Yu AT, Zeller M, Fisch KM, Longhurst C, Maysent P, Pride D, Khosla PK, Laurent LC, Yeo GW, Andersen KG, Knight R. Wastewater sequencing uncovers early, cryptic SARS-CoV-2 variant transmission. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022. [PMID: 35411350 DOI: 10.1101/2022.01.27.22269965] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing/sequencing capacity, which can also introduce biases. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here, we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We develop and deploy improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detect emerging variants of concern up to 14 days earlier in wastewater samples, and identify multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.
Collapse
Affiliation(s)
- Smruthi Karthikeyan
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Joshua I Levy
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Peter De Hoff
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Greg Humphrey
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Amanda Birmingham
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
| | - Kristen Jepsen
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Sawyer Farmer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Helena M Tubb
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tommy Valles
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | | | - Rebecca Tsai
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Stefan Aigner
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Shashank Sathe
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Benjamin Henson
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Adam M Mark
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
| | - Abbas Hakim
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Nathan A Baer
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tom Barber
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Pedro Belda-Ferre
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Marisol Chacón
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Willi Cheung
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Evelyn S Cresini
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Emily R Eisner
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Alma L Lastrella
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Elijah S Lawrence
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Clarisse A Marotz
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Toan T Ngo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tyler Ostrander
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Ashley Plascencia
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rodolfo A Salido
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Phoebe Seaver
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Elizabeth W Smoot
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Robert M Neuhard
- Operational Strategic Initiatives, University of California San Diego, La Jolla, CA, USA
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Angela L Scioscia
- Student Health and Well-Being, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
| | | | | | - Dismas B Abelman
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - David Brenner
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Judith C Bruner
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Anne Buckley
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Michael Ellison
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Jeffrey Gattas
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Steven L Gonias
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| | - Matt Hale
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Faith Hawkins
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Lydia Ikeda
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Hemlata Jhaveri
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Ted Johnson
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Vince Kellen
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Brendan Kremer
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Gary Matthews
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Ronald W McLawhon
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Pierre Ouillet
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Daniel Park
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Allorah Pradenas
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Sharon Reed
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Lindsay Riggs
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Alison Sanders
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | | | - Angela Song
- Operational Strategic Initiatives, University of California San Diego, La Jolla, CA, USA
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Benjamin White
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Terri Winbush
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Christine M Aceves
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Catelyn Anderson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Karthik Gangavarapu
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Emory Hufbauer
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ezra Kurzban
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Justin Lee
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nathaniel L Matteson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Edyth Parker
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Sarah A Perkins
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Karthik S Ramesh
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Refugio Robles-Sikisaka
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Madison A Schwab
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Emily Spencer
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Shirlee Wohl
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Laura Nicholson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian H Mchardy
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - David P Dimmock
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | | | | | | | | | | | | | | | | | | | | | - John D Malone
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | | | - Nikos Gurfield
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Sarah Stous
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Rebecca Fielding-Miller
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
- Division of Infectious Disease and Global Public Health, University of California San Diego, La Jolla, CA, USA
| | - Richard S Garfein
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Tommi Gaines
- Division of Infectious Disease and Global Public Health, University of California San Diego, La Jolla, CA, USA
| | - Cheryl Anderson
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Natasha K Martin
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Robert Schooley
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | | | - Duncan R MacCannell
- Office of Advanced Molecular Detection, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | | | | | - Seema Shah
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Eric McDonald
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Alexander T Yu
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Kathleen M Fisch
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
| | - Christopher Longhurst
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA
| | - Patty Maysent
- Office of the UC San Diego Health CEO, University of California, San Diego
| | - David Pride
- Departments of Pathology and Medicine, University of California, San Diego, La Jolla, CA
| | - Pradeep K Khosla
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Louise C Laurent
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- Sanford Consortium of Regenerative Medicine, University of California San Diego, La Jolla, CA
| | - Gene W Yeo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Sanford Consortium of Regenerative Medicine, University of California San Diego, La Jolla, CA
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA
| | - Kristian G Andersen
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
167
|
Karthikeyan S, Levy JI, De Hoff P, Humphrey G, Birmingham A, Jepsen K, Farmer S, Tubb HM, Valles T, Tribelhorn CE, Tsai R, Aigner S, Sathe S, Moshiri N, Henson B, Mark AM, Hakim A, Baer NA, Barber T, Belda-Ferre P, Chacón M, Cheung W, Cresini ES, Eisner ER, Lastrella AL, Lawrence ES, Marotz CA, Ngo TT, Ostrander T, Plascencia A, Salido RA, Seaver P, Smoot EW, McDonald D, Neuhard RM, Scioscia AL, Satterlund AM, Simmons EH, Abelman DB, Brenner D, Bruner JC, Buckley A, Ellison M, Gattas J, Gonias SL, Hale M, Hawkins F, Ikeda L, Jhaveri H, Johnson T, Kellen V, Kremer B, Matthews G, McLawhon RW, Ouillet P, Park D, Pradenas A, Reed S, Riggs L, Sanders A, Sollenberger B, Song A, White B, Winbush T, Aceves CM, Anderson C, Gangavarapu K, Hufbauer E, Kurzban E, Lee J, Matteson NL, Parker E, Perkins SA, Ramesh KS, Robles-Sikisaka R, Schwab MA, Spencer E, Wohl S, Nicholson L, Mchardy IH, Dimmock DP, Hobbs CA, Bakhtar O, Harding A, Mendoza A, Bolze A, Becker D, Cirulli ET, Isaksson M, Barrett KMS, Washington NL, Malone JD, Schafer AM, Gurfield N, Stous S, Fielding-Miller R, Garfein RS, Gaines T, Anderson C, Martin NK, Schooley R, Austin B, MacCannell DR, Kingsmore SF, Lee W, Shah S, McDonald E, Yu AT, Zeller M, Fisch KM, Longhurst C, Maysent P, Pride D, Khosla PK, Laurent LC, Yeo GW, Andersen KG, Knight R. Wastewater sequencing uncovers early, cryptic SARS-CoV-2 variant transmission. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2021.12.21.21268143. [PMID: 35411350 PMCID: PMC8996633 DOI: 10.1101/2021.12.21.21268143] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
As SARS-CoV-2 continues to spread and evolve, detecting emerging variants early is critical for public health interventions. Inferring lineage prevalence by clinical testing is infeasible at scale, especially in areas with limited resources, participation, or testing/sequencing capacity, which can also introduce biases. SARS-CoV-2 RNA concentration in wastewater successfully tracks regional infection dynamics and provides less biased abundance estimates than clinical testing. Tracking virus genomic sequences in wastewater would improve community prevalence estimates and detect emerging variants. However, two factors limit wastewater-based genomic surveillance: low-quality sequence data and inability to estimate relative lineage abundance in mixed samples. Here, we resolve these critical issues to perform a high-resolution, 295-day wastewater and clinical sequencing effort, in the controlled environment of a large university campus and the broader context of the surrounding county. We develop and deploy improved virus concentration protocols and deconvolution software that fully resolve multiple virus strains from wastewater. We detect emerging variants of concern up to 14 days earlier in wastewater samples, and identify multiple instances of virus spread not captured by clinical genomic surveillance. Our study provides a scalable solution for wastewater genomic surveillance that allows early detection of SARS-CoV-2 variants and identification of cryptic transmission.
Collapse
Affiliation(s)
- Smruthi Karthikeyan
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Joshua I Levy
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Peter De Hoff
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Greg Humphrey
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Amanda Birmingham
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
| | - Kristen Jepsen
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Sawyer Farmer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Helena M. Tubb
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tommy Valles
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | | | - Rebecca Tsai
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Stefan Aigner
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Shashank Sathe
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Benjamin Henson
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Adam M. Mark
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
| | - Abbas Hakim
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Nathan A Baer
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tom Barber
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Pedro Belda-Ferre
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Marisol Chacón
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Willi Cheung
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Evelyn S Cresini
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Emily R Eisner
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Alma L Lastrella
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Elijah S Lawrence
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Clarisse A Marotz
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Toan T Ngo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tyler Ostrander
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Ashley Plascencia
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rodolfo A Salido
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Phoebe Seaver
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Elizabeth W Smoot
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Robert M Neuhard
- Operational Strategic Initiatives, University of California San Diego, La Jolla, CA, USA
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Angela L Scioscia
- Student Health and Well-Being, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
| | | | | | - Dismas B. Abelman
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - David Brenner
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Judith C. Bruner
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Anne Buckley
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Michael Ellison
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Jeffrey Gattas
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Steven L. Gonias
- Department of Pathology, University of California San Diego, La Jolla, CA, USA
| | - Matt Hale
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Faith Hawkins
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Lydia Ikeda
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Hemlata Jhaveri
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Ted Johnson
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Vince Kellen
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Brendan Kremer
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Gary Matthews
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | | | - Pierre Ouillet
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Daniel Park
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Allorah Pradenas
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Sharon Reed
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Lindsay Riggs
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Alison Sanders
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | | | - Angela Song
- Operational Strategic Initiatives, University of California San Diego, La Jolla, CA, USA
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Benjamin White
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Terri Winbush
- Return to Learn, University of California San Diego, La Jolla, CA, USA
| | - Christine M Aceves
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Catelyn Anderson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Karthik Gangavarapu
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Emory Hufbauer
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ezra Kurzban
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Justin Lee
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nathaniel L Matteson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Edyth Parker
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Sarah A Perkins
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Karthik S Ramesh
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Refugio Robles-Sikisaka
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Madison A Schwab
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Emily Spencer
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Shirlee Wohl
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Laura Nicholson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian H Mchardy
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - David P Dimmock
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
| | | | | | | | | | | | | | | | | | | | | | - John D Malone
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | | | - Nikos Gurfield
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Sarah Stous
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Rebecca Fielding-Miller
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
- Division of Infectious Disease and Global Public Health, University of California San Diego, La Jolla, CA, USA
| | - Richard S. Garfein
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Tommi Gaines
- Division of Infectious Disease and Global Public Health, University of California San Diego, La Jolla, CA, USA
| | - Cheryl Anderson
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Natasha K. Martin
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | - Robert Schooley
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA
| | | | - Duncan R. MacCannell
- Office of Advanced Molecular Detection, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | | | | | - Seema Shah
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Eric McDonald
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Alexander T. Yu
- COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Kathleen M Fisch
- Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
| | - Christopher Longhurst
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA
| | - Patty Maysent
- Office of the UC San Diego Health CEO, University of California, San Diego
| | - David Pride
- Departments of Pathology and Medicine, University of California, San Diego, La Jolla, CA
| | - Pradeep K. Khosla
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Louise C. Laurent
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California San Diego, La Jolla, CA, USA
- Sanford Consortium of Regenerative Medicine, University of California San Diego, La Jolla, CA
| | - Gene W Yeo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Sanford Consortium of Regenerative Medicine, University of California San Diego, La Jolla, CA
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA
| | - Kristian G Andersen
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
168
|
De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets. PLoS Comput Biol 2022; 18:e1010056. [PMID: 35486906 PMCID: PMC9094560 DOI: 10.1371/journal.pcbi.1010056] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 05/11/2022] [Accepted: 03/25/2022] [Indexed: 11/26/2022] Open
Abstract
Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - William Boulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, California, United States of America
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| |
Collapse
|
169
|
Knyazev S, Chhugani K, Sarwal V, Ayyala R, Singh H, Karthikeyan S, Deshpande D, Baykal PI, Comarova Z, Lu A, Porozov Y, Vasylyeva TI, Wertheim JO, Tierney BT, Chiu CY, Sun R, Wu A, Abedalthagafi MS, Pak VM, Nagaraj SH, Smith AL, Skums P, Pasaniuc B, Komissarov A, Mason CE, Bortz E, Lemey P, Kondrashov F, Beerenwinkel N, Lam TTY, Wu NC, Zelikovsky A, Knight R, Crandall KA, Mangul S. Unlocking capacities of genomics for the COVID-19 response and future pandemics. Nat Methods 2022; 19:374-380. [PMID: 35396471 PMCID: PMC9467803 DOI: 10.1038/s41592-022-01444-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
During the COVID-19 pandemic, genomics and bioinformatics have emerged as essential public health tools. The genomic data acquired using these methods have supported the global health response, facilitated development of testing methods, and allowed timely tracking of novel SARS-CoV-2 variants. Yet the virtually unlimited potential for rapid generation and analysis of genomic data is also coupled with unique technical, scientific, and organizational challenges. Here, we discuss the application of genomic and computational methods for the efficient data driven COVID-19 response, advantages of democratization of viral sequencing around the world, and challenges associated with viral genome data collection and processing.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Karishma Chhugani
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
| | - Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Ram Ayyala
- Department of Translational Biomedical Informatics, University of Southern California, Los Angeles, CA, USA
| | - Harman Singh
- Department of Electrical Engineering, Indian Institute of Technology, Hauz Khas, New Delhi, India
| | - Smruthi Karthikeyan
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Dhrithi Deshpande
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
| | - Pelin Icer Baykal
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Zoia Comarova
- Astani Department of Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, USA
| | - Angela Lu
- Department of Pharmacology and Pharmaceutical Sciences, School of Pharmacy, University of Southern California, Los Angeles, CA, USA
| | - Yuri Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow, Russia
- Department of Computational Biology, Sirius University of Science and Technology, Sochi, Russia
| | - Tetyana I Vasylyeva
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Joel O Wertheim
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Braden T Tierney
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Charles Y Chiu
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
- Department of Medicine, Division of Infectious Diseases, University of California, San Francisco, San Francisco, CA, USA
- UCSF-Abbott Viral Diagnostics and Discovery Center, University of California, San Francisco, San Francisco, CA, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, CA, USA
- School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, P.R. China
| | - Aiping Wu
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Malak S Abedalthagafi
- Genomics Research Department, Saudi Human Genome Project, King Fahad Medical City and King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
- King Salman Center for Disability Research, Riyadh, Saudi Arabia
| | - Victoria M Pak
- Emory University, School of Nursing, Atlanta, GA, CA, USA
- Emory University, Rollins School of Public Health, Department of Epidemiology, Atlanta, GA, CA, USA
| | - Shivashankar H Nagaraj
- Centre for Genomics and Personalised Health, Queensland University of Technology, Brisbane, Queensland, Australia
- Translational Research Institute, Brisbane, Queensland, Australia
| | - Adam L Smith
- Astani Department of Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, USA
| | - Pavel Skums
- Department of Computer Science, College of Art and Science, Georgia State University, Atlanta, GA, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Precision Health, University of California, Los Angeles, Los Angeles, CA, USA
| | - Andrey Komissarov
- Smorodintsev Research Institute of Influenza, Saint Petersburg, Russia
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| | - Eric Bortz
- Department of Biological Sciences, University of Alaska Anchorage, Anchorage, AK, CA, USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven-University of Leuven, Leuven, Belgium
| | - Fyodor Kondrashov
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Tommy Tsan-Yuk Lam
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, P.R. China
- Laboratory of Data Discovery for Health Limited, Hong Kong SAR, P.R. China
- Centre for Immunology & Infection Limited, Hong Kong SAR, P.R. China
| | - Nicholas C Wu
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Alex Zelikovsky
- Department of Computer Science, College of Art and Science, Georgia State University, Atlanta, GA, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
| | - Keith A Crandall
- Computational Biology Institute and Department of Biostatistics & Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
170
|
Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant. Commun Biol 2022; 5:285. [PMID: 35351970 PMCID: PMC8964801 DOI: 10.1038/s42003-022-03198-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 02/24/2022] [Indexed: 12/14/2022] Open
Abstract
We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to predict future mutations in the virus, in particular, non-synonymous amino acid substitutions creating new variants. We use tens of thousands of publicly available SARS-CoV-2 sequences and consider tens of thousands of candidate models. Through a careful validation process, we confirm that our chosen models are indeed able to predict new amino acid substitutions: candidates ranked high by our model are eight times more likely to occur than random amino acid changes. We also show that named variants were highly ranked by our models before their appearance, emphasizing the value of our models for identifying likely variants and potentially utilizing this knowledge in vaccine design and other aspects of the ongoing battle against COVID-19. As the virus that causes COVID-19 continues to mutate and spread, new methods are needed to predict new potential variants. Here, the authors identify the best regression models for predicting likely mutation sites in the SARS-CoV-2 genome using a candidate set that considers sequence, gene location, and biological function.
Collapse
|
171
|
Valieris R, Drummond RD, Defelicibus A, Dias-Neto E, Rosales RA, Tojal da Silva I. A mixture model for determining SARS-Cov-2 variant composition in pooled samples. Bioinformatics 2022; 38:1809-1815. [PMID: 35104309 DOI: 10.1093/bioinformatics/btac047] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 12/14/2021] [Accepted: 01/26/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Despite of the fast development of highly effective vaccines to control the current COVID-19 pandemics, the unequal distribution and availability of these vaccines worldwide and the number of people infected in the world lead to the continuous emergence of Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) variants of concern. Therefore, it is likely that real-time genomic surveillance will be continuously needed as an unceasing monitoring tool, necessary to follow the spread of the disease and the evolution of the virus. In this context, new genomic variants of SARS-CoV-2, including variants refractory to current vaccines, makes genomic surveillance programs tools of utmost importance. Nevertheless, the lack of appropriate analytical tools to quickly and effectively access the viral composition in meta-transcriptomic sequencing data, including environmental surveillance, represent possible challenges that may impact the fast adoption of this approach to mitigate the spread and transmission of viruses. RESULTS We propose a statistical model for the estimation of the relative frequencies of SARS-CoV-2 variants in pooled samples. This model is built by considering a previously defined selection of genomic polymorphisms that characterize SARS-CoV-2 variants. The methods described here support both raw sequencing reads for polymorphisms-based markers calling and predefined markers in the variant call format. Results obtained using simulated data show that our method is quite effective in recovering the correct variant proportions. Further, results obtained by considering longitudinal data from wastewater samples of two locations in Switzerland agree well with those describing the epidemiological evolution of COVID-19 variants in clinical samples of these locations. Our results show that the described method can be a valuable tool for tracking the proportions of SARS-CoV-2 variants in complex mixtures such as waste water and environmental samples. AVAILABILITY AND IMPLEMENTATION http://github.com/rvalieris/LCS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renan Valieris
- Laboratory of Computational Biology and Bioinformatics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil
| | - Rodrigo D Drummond
- Laboratory of Computational Biology and Bioinformatics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil
| | - Alexandre Defelicibus
- Laboratory of Computational Biology and Bioinformatics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil
| | - Emmanuel Dias-Neto
- Laboratory of Medical Genomics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil
| | - Rafael A Rosales
- Departamento de Computação e Matemática, Universidade de São Paulo, Ribeirão Preto, São Paulo 14040-901, Brazil
| | - Israel Tojal da Silva
- Laboratory of Computational Biology and Bioinformatics, CIPE/A.C. Camargo Cancer Center, São Paulo 01508-010, Brazil
| |
Collapse
|
172
|
De Maio N, Kalaghatgi P, Turakhia Y, Corbett-detig R, Minh BQ, Goldman N. Maximum likelihood pandemic-scale phylogenetics.. [PMID: 35350209 PMCID: PMC8963701 DOI: 10.1101/2022.03.22.485312] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Genomic data plays an essential role in the study of transmissible disease, as exemplified by its current use in identifying and tracking the spread of novel SARS-CoV-2 variants. However, with the increase in size of genomic epidemiological datasets, their phylogenetic analyses become increasingly impractical due to high computational demand. In particular, while maximum likelihood methods are go-to tools for phylogenetic inference, the scale of datasets from the ongoing pandemic has made apparent the urgent need for more computationally efficient approaches. Here we propose a new likelihood-based phylogenetic framework that greatly reduces both the memory and time demand of popular maximum likelihood approaches when analysing many closely related genomes, as in the scenario of SARS-CoV-2 genome data and more generally throughout genomic epidemiology. To achieve this, we rewrite the classical Felsenstein pruning algorithm so that we can infer phylogenetic trees on at least 10 times larger datasets with higher accuracy than existing maximum likelihood methods. Our algorithms provide a powerful framework for maximum-likelihood genomic epidemiology and could facilitate similarly groundbreaking applications in Bayesian phylogenomic analyses as well.
Collapse
|
173
|
Klink GV, Safina KR, Nabieva E, Shvyrev N, Garushyants S, Alekseeva E, Komissarov AB, Danilenko DM, Pochtovyi AA, Divisenko EV, Vasilchenko LA, Shidlovskaya EV, Kuznetsova NA, Speranskaya AS, Samoilov AE, Neverov AD, Popova AV, Fedonin GG, Akimkin VG, Lioznov D, Gushchin VA, Shchur V, Bazykin GA. The rise and spread of the SARS-CoV-2 AY.122 lineage in Russia. Virus Evol 2022; 8:veac017. [PMID: 35371558 PMCID: PMC8966696 DOI: 10.1093/ve/veac017] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 02/15/2022] [Accepted: 03/04/2022] [Indexed: 11/14/2022] Open
Abstract
Delta has outcompeted most preexisting variants of SARS-CoV-2, becoming the globally predominant lineage by mid-2021. Its subsequent evolution has led to the emergence of multiple sublineages, most of which are well-mixed between countries. By contrast, here we show that nearly the entire Delta epidemic in Russia has probably descended from a single import event, or from multiple closely timed imports from a single poorly sampled geographic location. Indeed, over 90 per cent of Delta samples in Russia are characterized by the nsp2:K81N + ORF7a:P45L pair of mutations which is rare outside Russia, putting them in the AY.122 sublineage. The AY.122 lineage was frequent in Russia among Delta samples from the start, and has not increased in frequency in other countries where it has been observed, suggesting that its high prevalence in Russia has probably resulted from a random founder effect rather than a transmission advantage. The apartness of the genetic composition of the Delta epidemic in Russia makes Russia somewhat unusual, although not exceptional, among other countries.
Collapse
Affiliation(s)
- Galya V Klink
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bol’shoi Karetnyi per., 19, Moscow 127051, Russia
| | - Ksenia R Safina
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bol’shoi Karetnyi per., 19, Moscow 127051, Russia
- Skolkovo Institute of Science and Technology (Skoltech), Nobel st., Building 1, Moscow 121205, Russia
| | - Elena Nabieva
- Skolkovo Institute of Science and Technology (Skoltech), Nobel st., Building 1, Moscow 121205, Russia
| | - Nikita Shvyrev
- International Laboratory of Statistical and Computational Genomics, HSE University, Moscow, Russia
| | - Sofya Garushyants
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bol’shoi Karetnyi per., 19, Moscow 127051, Russia
| | - Evgeniia Alekseeva
- Skolkovo Institute of Science and Technology (Skoltech), Nobel st., Building 1, Moscow 121205, Russia
| | - Andrey B Komissarov
- Smorodintsev Research Institute of Influenza, Prof. Popov 15/17, Saint Petersburg 197376, Russia
| | - Daria M Danilenko
- Smorodintsev Research Institute of Influenza, Prof. Popov 15/17, Saint Petersburg 197376, Russia
| | - Andrei A Pochtovyi
- Federal State Budget Institution ‘National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya’ of the Ministry of Health of the Russian Federation, Gamaleya st., 18, Moscow 123098, Russia
- Department of Virology, Biological Faculty, Lomonosov Moscow State University, Kolmogorov st., 1, building 73, Moscow 119192, Russia
| | - Elizaveta V Divisenko
- Federal State Budget Institution ‘National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya’ of the Ministry of Health of the Russian Federation, Gamaleya st., 18, Moscow 123098, Russia
| | - Lyudmila A Vasilchenko
- Federal State Budget Institution ‘National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya’ of the Ministry of Health of the Russian Federation, Gamaleya st., 18, Moscow 123098, Russia
| | - Elena V Shidlovskaya
- Federal State Budget Institution ‘National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya’ of the Ministry of Health of the Russian Federation, Gamaleya st., 18, Moscow 123098, Russia
| | - Nadezhda A Kuznetsova
- Federal State Budget Institution ‘National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya’ of the Ministry of Health of the Russian Federation, Gamaleya st., 18, Moscow 123098, Russia
| | | | - Anna S Speranskaya
- Central Research Institute for Epidemiology, Novogireyevskaya st., 3a, Moscow 111123, Russia
| | - Andrei E Samoilov
- Central Research Institute for Epidemiology, Novogireyevskaya st., 3a, Moscow 111123, Russia
- Saint Petersburg Pasteur Institute, Mira st., 14, Saint Petersburg 197101, Russia
| | - Alexey D Neverov
- Central Research Institute for Epidemiology, Novogireyevskaya st., 3a, Moscow 111123, Russia
| | - Anfisa V Popova
- Central Research Institute for Epidemiology, Novogireyevskaya st., 3a, Moscow 111123, Russia
| | - Gennady G Fedonin
- Central Research Institute for Epidemiology, Novogireyevskaya st., 3a, Moscow 111123, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bol’shoi Karetnyi per., 19, Moscow 127051, Russia
- Moscow Institute of Physics and Technology, Institutskiy per., 9, Dolgoprudny, Moscow region 141701, Russia
| | | | - Vasiliy G Akimkin
- Central Research Institute for Epidemiology, Novogireyevskaya st., 3a, Moscow 111123, Russia
| | - Dmitry Lioznov
- Smorodintsev Research Institute of Influenza, Prof. Popov 15/17, Saint Petersburg 197376, Russia
- First Pavlov State Medical University, L’va Tolstogo st., 6-8, Saint Petersburg 197022, Russia
| | - Vladimir A Gushchin
- Federal State Budget Institution ‘National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya’ of the Ministry of Health of the Russian Federation, Gamaleya st., 18, Moscow 123098, Russia
- Department of Virology, Biological Faculty, Lomonosov Moscow State University, Kolmogorov st., 1, building 73, Moscow 119192, Russia
| | - Vladimir Shchur
- International Laboratory of Statistical and Computational Genomics, HSE University, Moscow, Russia
| | - Georgii A Bazykin
- Skolkovo Institute of Science and Technology (Skoltech), Nobel st., Building 1, Moscow 121205, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bol’shoi Karetnyi per., 19, Moscow 127051, Russia
| |
Collapse
|
174
|
Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, Zhu S, Eldon B, Ellerman EC, Galloway JG, Gladstein AL, Gorjanc G, Guo B, Jeffery B, Kretzschumar WW, Lohse K, Matschiner M, Nelson D, Pope NS, Quinto-Cortés CD, Rodrigues MF, Saunack K, Sellinger T, Thornton K, van Kemenade H, Wohns AW, Wong Y, Gravel S, Kern AD, Koskela J, Ralph PL, Kelleher J. Efficient ancestry and mutation simulation with msprime 1.0. Genetics 2022; 220:iyab229. [PMID: 34897427 PMCID: PMC9176297 DOI: 10.1093/genetics/iyab229] [Citation(s) in RCA: 116] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/03/2021] [Indexed: 11/13/2022] Open
Abstract
Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.
Collapse
Affiliation(s)
- Franz Baumdicker
- Cluster of Excellence “Controlling Microbes to Fight Infections”, Mathematical and Computational Population Genetics, University of Tübingen, 72076 Tübingen, Germany
| | - Gertjan Bisschop
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Daniel Goldstein
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham Gower
- Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, 1350 Copenhagen K, Denmark
| | - Aaron P Ragsdale
- Department of Integrative Biology, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Georgia Tsambos
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia
| | - Sha Zhu
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Bjarki Eldon
- Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Berlin 10115, Germany
| | | | - Jared G Galloway
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98102, USA
| | - Ariella L Gladstein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7264, USA
- Embark Veterinary, Inc., Boston, MA 02111, USA
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Ben Jeffery
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Warren W Kretzschumar
- Center for Hematology and Regenerative Medicine, Karolinska Institute, 141 83 Huddinge, Sweden
| | - Konrad Lohse
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | | | - Dominic Nelson
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Nathaniel S Pope
- Department of Entomology, Pennsylvania State University, State College, PA 16802, USA
| | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO), Unit of Advanced Genomics, CINVESTAV, Irapuato, Mexico
| | - Murillo F Rodrigues
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | | | - Thibaut Sellinger
- Professorship for Population Genetics, Department of Life Science Systems, Technical University of Munich, 85354 Freising, Germany
| | - Kevin Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697, USA
| | | | - Anthony W Wohns
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yan Wong
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Simon Gravel
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
| | - Andrew D Kern
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jere Koskela
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Peter L Ralph
- Department of Biology, Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA
- Department of Mathematics, University of Oregon, Eugene, OR 97403-5289, USA
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| |
Collapse
|
175
|
Lin L, Zhang J, Rogers J, Campbell A, Zhao J, Harding D, Sahr F, Liu Y, Wurie I. The dynamic change of SARS-CoV-2 variants in Sierra Leone. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 98:105208. [PMID: 34999288 PMCID: PMC8734169 DOI: 10.1016/j.meegid.2022.105208] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 12/24/2021] [Accepted: 01/03/2022] [Indexed: 10/31/2022]
Abstract
Since the beginning of the SARS-CoV-2 pandemic, the emergence of multiple new variants posed an increased risk to global public health. The aim of this study is to investigate SARS-CoV-2 variants and possible transmission of variants of concern (VOCs) in Sierra Leone. A total of 65 nasal swab samples were collected from COVID-19 cases in Sierra Leone, among which 24 samples were collected during the second wave and 41 samples were collected during the third wave. Nanopore sequencing generated 54 SARS-CoV-2 whole genomes. The second COVID-19 wave was mainly caused by R.1 lineage while the third COVID-19 wave was dominated by B.1.617.2 lineage (Delta variant). The phylogenetic analysis suggested multiple introductions of SARS-CoV-2 Delta variant into Sierra Leone and subsequent local transmission in this country. Our findings highlight the importance of genomic surveillance of SARS-CoV-2 variants and the urgent need for implementation of strengthened public health and social measures (PHSM) to control the spread of virus variants.
Collapse
Affiliation(s)
- Lei Lin
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Juling Zhang
- Department of Clinical Laboratory, the Fifth Medical Center of PLA General Hospital, Beijing, China
| | - James Rogers
- College of Medicine and Allied Health Science, University of Sierra Leone, Freetown, Sierra Leone
| | - Allan Campbell
- Central Public Health Reference Laboratories, Ministry of Health and Sanitation, Freetown, Sierra Leone
| | - Jianjun Zhao
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China
| | - Doris Harding
- Central Public Health Reference Laboratories, Ministry of Health and Sanitation, Freetown, Sierra Leone
| | - Foday Sahr
- College of Medicine and Allied Health Science, University of Sierra Leone, Freetown, Sierra Leone
| | - Yongjian Liu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing, China.
| | - Isata Wurie
- College of Medicine and Allied Health Science, University of Sierra Leone, Freetown, Sierra Leone
| |
Collapse
|
176
|
Nasimiyu C, Matoke-Muhia D, Rono GK, Osoro E, Obado DO, Mwangi JM, Mwikwabe N, Thiong’o K, Dawa J, Ngere I, Gachohi J, Kariuki S, Amukoye E, Mureithi M, Ngere P, Amoth P, Were I, Makayotto L, Nene V, Abworo EO, Njenga MK, Seifert SN, Oyola SO. Imported SARS-COV-2 Variants of Concern Drove Spread of Infections Across Kenya During the Second Year of the Pandemic. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2022.02.28.22271467. [PMID: 35262086 PMCID: PMC8902869 DOI: 10.1101/2022.02.28.22271467] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Background Using classical and genomic epidemiology, we tracked the COVID-19 pandemic in Kenya over 23 months to determine the impact of SARS-CoV-2 variants on its progression. Methods SARS-CoV-2 surveillance and testing data were obtained from the Kenya Ministry of Health, collected daily from 306 health facilities. COVID-19-associated fatality data were also obtained from these health facilities and communities. Whole SARS-CoV-2 genome sequencing were carried out on 1241 specimens. Results Over the pandemic duration (March 2020 - January 2022) Kenya experienced five waves characterized by attack rates (AR) of between 65.4 and 137.6 per 100,000 persons, and intra-wave case fatality ratios (CFR) averaging 3.5%, two-fold higher than the national average COVID-19 associated CFR. The first two waves that occurred before emergence of global variants of concerns (VoC) had lower AR (65.4 and 118.2 per 100,000). Waves 3, 4, and 5 that occurred during the second year were each dominated by multiple introductions each, of Alpha (74.9% genomes), Delta (98.7%), and Omicron (87.8%) VoCs, respectively. During this phase, government-imposed restrictions failed to alleviate pandemic progression, resulting in higher attack rates spread across the country. Conclusions The emergence of Alpha, Delta, and Omicron variants was a turning point that resulted in widespread and higher SARS-CoV-2 infections across the country.
Collapse
Affiliation(s)
- Carolyne Nasimiyu
- Washington State Global Health Program-Kenya, Washington State University, Nairobi, Kenya
- Paul G. Allen School for Global Health, Washington State University, Pullman, USA
| | | | | | - Eric Osoro
- Washington State Global Health Program-Kenya, Washington State University, Nairobi, Kenya
- Paul G. Allen School for Global Health, Washington State University, Pullman, USA
| | | | | | | | | | - Jeanette Dawa
- Washington State Global Health Program-Kenya, Washington State University, Nairobi, Kenya
| | - Isaac Ngere
- Washington State Global Health Program-Kenya, Washington State University, Nairobi, Kenya
| | - John Gachohi
- Washington State Global Health Program-Kenya, Washington State University, Nairobi, Kenya
| | | | | | - Marianne Mureithi
- Department of Medical Microbiology, University of Nairobi, Nairobi, Kenya
| | | | | | - Ian Were
- Kenya Ministry of Health, Nairobi, Kenya
| | | | | | | | - M. Kariuki Njenga
- Washington State Global Health Program-Kenya, Washington State University, Nairobi, Kenya
- Paul G. Allen School for Global Health, Washington State University, Pullman, USA
| | - Stephanie N. Seifert
- Paul G. Allen School for Global Health, Washington State University, Pullman, USA
| | | |
Collapse
|
177
|
Obermeyer F, Jankowiak M, Barkas N, Schaffner SF, Pyle JD, Yurkovetskiy L, Bosso M, Park DJ, Babadi M, MacInnis BL, Luban J, Sabeti PC, Lemieux JE. Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2021.09.07.21263228. [PMID: 35194619 PMCID: PMC8863165 DOI: 10.1101/2021.09.07.21263228] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Repeated emergence of SARS-CoV-2 variants with increased fitness necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR 0 , a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR 0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR 0 forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization. ONE SENTENCE SUMMARY A Bayesian hierarchical model of all SARS-CoV-2 viral genomes predicts lineage fitness and identifies associated mutations.
Collapse
Affiliation(s)
- Fritz Obermeyer
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Pyro Committee, Linux AI & Data Foundation; 548 Market St San Francisco, California 94104
| | - Martin Jankowiak
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Pyro Committee, Linux AI & Data Foundation; 548 Market St San Francisco, California 94104
| | - Nikolaos Barkas
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
| | - Stephen F. Schaffner
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Harvard University; Cambridge, MA 02138, USA
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University; Boston, MA, USA
| | - Jesse D. Pyle
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
| | - Lonya Yurkovetskiy
- Program in Molecular Medicine, University of Massachusetts Medical School; Worcester, MA 01605, USA
| | - Matteo Bosso
- Program in Molecular Medicine, University of Massachusetts Medical School; Worcester, MA 01605, USA
| | - Daniel J. Park
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
| | - Mehrtash Babadi
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
| | - Bronwyn L. MacInnis
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University; Boston, MA, USA
- Massachusetts Consortium on Pathogen Readiness; Boston, MA 02115, USA
| | - Jeremy Luban
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Program in Molecular Medicine, University of Massachusetts Medical School; Worcester, MA 01605, USA
- Massachusetts Consortium on Pathogen Readiness; Boston, MA 02115, USA
- Ragon Institute of MGH, MIT, and Harvard; 400 Technology Square, Cambridge, MA 02139, USA
| | - Pardis C. Sabeti
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Harvard University; Cambridge, MA 02138, USA
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University; Boston, MA, USA
- Massachusetts Consortium on Pathogen Readiness; Boston, MA 02115, USA
- Howard Hughes Medical Institute; 4000 Jones Bridge Rd, Chevy Chase, MD 20815, USA
| | - Jacob E. Lemieux
- Broad Institute of MIT and Harvard; 415 Main Street, Cambridge, MA 02142, USA
- Division of Infectious Diseases, Massachusetts General Hospital; Boston, MA, USA
| |
Collapse
|
178
|
Siddle KJ, Krasilnikova LA, Moreno GK, Schaffner SF, Vostok J, Fitzgerald NA, Lemieux JE, Barkas N, Loreth C, Specht I, Tomkins-Tinch CH, Paull JS, Schaeffer B, Taylor BP, Loftness B, Johnson H, Schubert PL, Shephard HM, Doucette M, Fink T, Lang AS, Baez S, Beauchamp J, Hennigan S, Buzby E, Ash S, Brown J, Clancy S, Cofsky S, Gagne L, Hall J, Harrington R, Gionet GL, DeRuff KC, Vodzak ME, Adams GC, Dobbins ST, Slack SD, Reilly SK, Anderson LM, Cipicchio MC, DeFelice MT, Grimsby JL, Anderson SE, Blumenstiel BS, Meldrim JC, Rooke HM, Vicente G, Smith NL, Messer KS, Reagan FL, Mandese ZM, Lee MD, Ray MC, Fisher ME, Ulcena MA, Nolet CM, English SE, Larkin KL, Vernest K, Chaluvadi S, Arvidson D, Melchiono M, Covell T, Harik V, Brock-Fisher T, Dunn M, Kearns A, Hanage WP, Bernard C, Philippakis A, Lennon NJ, Gabriel SB, Gallagher GR, Smole S, Madoff LC, Brown CM, Park DJ, MacInnis BL, Sabeti PC. Transmission from vaccinated individuals in a large SARS-CoV-2 Delta variant outbreak. Cell 2022; 185:485-492.e10. [PMID: 35051367 PMCID: PMC8695126 DOI: 10.1016/j.cell.2021.12.027] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/24/2021] [Accepted: 12/17/2021] [Indexed: 02/08/2023]
Abstract
An outbreak of over 1,000 COVID-19 cases in Provincetown, Massachusetts (MA), in July 2021-the first large outbreak mostly in vaccinated individuals in the US-prompted a comprehensive public health response, motivating changes to national masking recommendations and raising questions about infection and transmission among vaccinated individuals. To address these questions, we combined viral genomic and epidemiological data from 467 individuals, including 40% of outbreak-associated cases. The Delta variant accounted for 99% of cases in this dataset; it was introduced from at least 40 sources, but 83% of cases derived from a single source, likely through transmission across multiple settings over a short time rather than a single event. Genomic and epidemiological data supported multiple transmissions of Delta from and between fully vaccinated individuals. However, despite its magnitude, the outbreak had limited onward impact in MA and the US overall, likely due to high vaccination rates and a robust public health response.
Collapse
Affiliation(s)
| | - Lydia A Krasilnikova
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Gage K Moreno
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Stephen F Schaffner
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Johanna Vostok
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | | | - Jacob E Lemieux
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nikolaos Barkas
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Ivan Specht
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Christopher H Tomkins-Tinch
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Jillian S Paull
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Beau Schaeffer
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Bradford P Taylor
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Bryn Loftness
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Hillary Johnson
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Petra L Schubert
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Hanna M Shephard
- Massachusetts Department of Public Health, Boston, MA 02199, USA; Applied Epidemiology Fellowship, Council of State and Territorial Epidemiologists, Atlanta, GA 30345, USA
| | - Matthew Doucette
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Timelia Fink
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Andrew S Lang
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Stephanie Baez
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - John Beauchamp
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Scott Hennigan
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Erika Buzby
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Stephanie Ash
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Jessica Brown
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Selina Clancy
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Seana Cofsky
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Luc Gagne
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Joshua Hall
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | | | | | | | - Megan E Vodzak
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Gordon C Adams
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Sarah D Slack
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Steven K Reilly
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Lisa M Anderson
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | | | - Jonna L Grimsby
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | | | - James C Meldrim
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Heather M Rooke
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Gina Vicente
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Natasha L Smith
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Faye L Reagan
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Zoe M Mandese
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Matthew D Lee
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Marianne C Ray
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Maesha A Ulcena
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Corey M Nolet
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Sean E English
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Katie L Larkin
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Kyle Vernest
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Deirdre Arvidson
- Barnstable County Department of Health and the Environment, Barnstable, MA 02630, USA
| | - Maurice Melchiono
- Barnstable County Department of Health and the Environment, Barnstable, MA 02630, USA
| | - Theresa Covell
- Barnstable County Department of Health and the Environment, Barnstable, MA 02630, USA
| | - Vaira Harik
- Barnstable County Department of Human Services, Barnstable, MA 02630, USA
| | - Taylor Brock-Fisher
- Community Tracing Collaborative, Commonwealth of Massachusetts, Boston, MA 02199, USA
| | - Molly Dunn
- Community Tracing Collaborative, Commonwealth of Massachusetts, Boston, MA 02199, USA
| | - Amanda Kearns
- Community Tracing Collaborative, Commonwealth of Massachusetts, Boston, MA 02199, USA
| | - William P Hanage
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Clare Bernard
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Niall J Lennon
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | | | - Glen R Gallagher
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | - Sandra Smole
- Massachusetts Department of Public Health, Boston, MA 02199, USA
| | | | | | - Daniel J Park
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Bronwyn L MacInnis
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Massachusetts Consortium for Pathogen Readiness, Boston, MA 02115, USA.
| | - Pardis C Sabeti
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA; Massachusetts Consortium for Pathogen Readiness, Boston, MA 02115, USA
| |
Collapse
|
179
|
Wienkes H, Vilen K, Lorentz A, Gerlach D, Wang X, Saupe A, Danila R, Lynfield R, Smith K, Medus C. Transmission of and Infection With COVID-19 Among Vaccinated and Unvaccinated Attendees of an Indoor Wedding Reception in Minnesota. JAMA Netw Open 2022; 5:e220536. [PMID: 35212747 PMCID: PMC8881767 DOI: 10.1001/jamanetworkopen.2022.0536] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
IMPORTANCE Characterizing rates of SARS-CoV-2 infection among vaccinated and unvaccinated persons with the same exposure is critical to understanding the association of vaccination with the risk of infection with the Delta variant. Additionally, evidence of Delta variant transmission by children to vaccinated adults has important public health implications. OBJECTIVE To characterize transmission and infection of SARS-CoV-2 among vaccinated and unvaccinated attendees of an indoor wedding reception. DESIGN, SETTING, AND PARTICIPANTS This cohort study included attendees at an indoor wedding reception in Minnesota in July 2021. Data were collected from REDCap surveys and routine surveillance interviews. The full list of attendees and a partial list of emails were obtained. Fifty-seven attendees completed the emailed survey. Eighteen additional attendees were identified from the state health department COVID-19 surveillance database. EXPOSURES Attendance at an indoor event. MAIN OUTCOMES AND MEASURES Risk of SARS-CoV-2 infection among vaccinated and unvaccinated attendees, identification of an index case, whole genome sequencing (WGS) to identify the COVID-19 variant, understanding of transmission patterns, and assessment of secondary transmission. The primary case definition was an individual with a positive SARS-CoV-2 test who attended the wedding in the 14 days prior to their illness. RESULTS Data were gathered for 75 attendees (mean [SE] age, 37.5 [13.7] years; 57 [76%] female individuals), of whom 56 (75%) were fully vaccinated, 4 (5%) were partially vaccinated, and 15 (20%) were unvaccinated. Of 62 attendees who were tested, 29 (47%) tested positive, including 16 of 46 fully vaccinated attendees (35%), 2 of 4 partially vaccinated attendees (50%), and 11 of 12 unvaccinated attendees (92%). Being unvaccinated was associated with a higher risk of infection compared with being vaccinated (risk ratio, 2.64; 95% CI, 1.71-4.06; P = .001). One unvaccinated adult required hospitalization. An unvaccinated child who was symptomatic on the event date was identified as the index case. Eleven specimens were available for WGS. All sequenced specimens were closely related and were identified as the Delta variant. WGS supported secondary transmission from a vaccinated individual with SARS-CoV-2. CONCLUSIONS AND RELEVANCE This cohort study identified a COVID-19 Delta variant outbreak at an indoor event despite a high proportion of vaccinated attendees. It found that vaccination was associated with a reduced risk of infection.
Collapse
Affiliation(s)
| | | | - Alexandra Lorentz
- Minnesota Department of Health, St Paul
- Association of Public Health Laboratories, Silver Spring, Maryland
| | | | | | - Amy Saupe
- Minnesota Department of Health, St Paul
| | | | | | | | | |
Collapse
|
180
|
Liu LT, Tsai JJ, Chen CH, Lin PC, Tsai CY, Tsai YY, Hsu MC, Chuang WL, Chang JM, Hwang SJ, Chong IW. Isolation and Identification of a Rare Spike Gene Double-Deletion SARS-CoV-2 Variant From the Patient With High Cycle Threshold Value. Front Med (Lausanne) 2022; 8:822633. [PMID: 35071285 PMCID: PMC8770430 DOI: 10.3389/fmed.2021.822633] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 12/13/2021] [Indexed: 12/18/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19) is an emerging life-threatening pulmonary disease caused by infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which originated in Wuhan, Hubei Province, China, in December 2019. COVID-19 develops after close contact via inhalation of respiratory droplets containing SARS-CoV-2 during talking, coughing, or sneezing by asymptomatic, presymptomatic, and symptomatic carriers. This virus evolved over time, and numerous genetic variants have been reported to have increased disease severity, mortality, and transmissibility. Variants have also developed resistance to antivirals and vaccination and can escape the immune response of humans. Reverse transcription polymerase chain reaction (RT–PCR) is the method of choice among diagnostic techniques, including nucleic acid amplification tests (NAATs), serological tests, and diagnostic imaging, such as computed tomography (CT). The limitation of RT–PCR is that it cannot distinguish fragmented RNA genomes from live transmissible viruses. Thus, SARS-CoV-2 isolation by using cell culture has been developed and makes important contributions in the field of diagnosis, development of antivirals, vaccines, and SARS-CoV-2 virology research. In this research, two SARS-CoV-2 strains were isolated from four RT–PCR-positive nasopharyngeal swabs using VERO E6 cell culture. One isolate was cultured successfully with a blind passage on day 3 post inoculation from a swab with a Ct > 35, while the cells did not develop cytopathic effects without a blind passage until day 14 post inoculation. Our results indicated that infectious SARS-CoV-2 virus particles existed, even with a Ct > 35. Cultivable viruses could provide additional consideration for releasing the patient from quarantine. The results of the whole genome sequencing and bioinformatic analysis suggested that these two isolates contain a spike 68-76del+spike 675-679del double-deletion variation. The double deletion was confirmed by amplification of the regions spanning the spike gene deletion using Sanger sequencing. Phylogenetic analysis revealed that this double-deletion variant was rare (one per million in public databases, including GenBank and GISAID). The impact of this double deletion in the spike gene on the SARS-CoV-2 virus itself as well as on cultured cells and/or humans remains to be further elucidated.
Collapse
Affiliation(s)
- Li-Teh Liu
- Department of Medical Laboratory Science and Biotechnology, College of Medical Technology, Chung-Hwa University of Medical Technology, Tainan, Taiwan
| | - Jih-Jin Tsai
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,Division of Infectious Diseases, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Chun-Hong Chen
- National Mosquito-Borne Diseases Control Research Center, National Health Research Institutes, Zhunan, Taiwan.,National Institute of Infectious Diseases and Vaccinology, National Health Research Institutes, Zhunan, Taiwan
| | - Ping-Chang Lin
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Ching-Yi Tsai
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Yan-Yi Tsai
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Miao-Chen Hsu
- Tropical Medicine Center, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Wan-Long Chuang
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Hepatobiliary and Pancreatic, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Jer-Ming Chang
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.,Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Shang-Jyh Hwang
- School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Division of Nephrology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Inn-Wen Chong
- Department of Internal Medicine and Graduate Institute of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.,Department of Pulmonary Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| |
Collapse
|
181
|
Assessment of Inter-Laboratory Differences in SARS-CoV-2 Consensus Genome Assemblies between Public Health Laboratories in Australia. Viruses 2022; 14:v14020185. [PMID: 35215779 PMCID: PMC8875182 DOI: 10.3390/v14020185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/10/2022] [Accepted: 01/10/2022] [Indexed: 12/12/2022] Open
Abstract
Whole-genome sequencing of viral isolates is critical for informing transmission patterns and for the ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.
Collapse
|
182
|
Sarkar R, Saha R, Mallick P, Sharma R, Kaur A, Dutta S, Chawla-Sarkar M. Emergence of a novel SARS-CoV-2 Pango lineage B.1.1.526 in West Bengal, India. J Infect Public Health 2022; 15:42-50. [PMID: 34896696 PMCID: PMC8642833 DOI: 10.1016/j.jiph.2021.11.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 11/16/2021] [Accepted: 11/30/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Since its inception in late 2019, SARS-CoV-2 has been evolving continuously by procuring mutations, leading to emergence of numerous variants, causing second wave of pandemic in many countries including India in 2021. To control this pandemic continuous mutational surveillance and genomic epidemiology of circulating strains is very important to unveil the emergence of the novel variants and also monitor the evolution of existing variants. METHODS SARS-CoV-2 sequences were retrieved from GISAID database. Sequence alignment was performed with MAFT version 7. Phylogenetic tree was constructed by using MEGA (version X) and UShER. RESULTS In this study, we reported the emergence of a novel variant of SARS-CoV-2, named B.1.1.526, in India. This novel variant encompasses 129 SARS-CoV-2 strains which are characterized by the presence of 11 coexisting mutations including D614G, P681H, and V1230L in S glycoprotein. Out of these 129 sequences, 27 sequences also harbored E484K mutation in S glycoprotein. Phylogenetic analysis revealed strains of this novel variant emerged from the GR clade and formed a new cluster. Geographical distribution showed, out of 129 sequences, 126 were found in seven different states of India. Rest 3 sequences were observed in USA. Temporal analysis revealed this novel variant was first collected from Kolkata district of West Bengal, India. CONCLUSIONS The D614G, P618H and E484K mutations have previously been reported to favor increased transmissibility, enhanced infectivity, and immune invasion, respectively. The transmembrane domain (TM) of S2 subunit anchors S glycoprotein to the virus envelope. The V1230L mutation, present within the TM domain of S glycoprotein, might strengthen the interaction of S glycoprotein with the viral envelope and increase S glycoprotein deposition to the virion, resulting in more infectious virion. Therefore, the new variant having D614G, P618H, V1230L, and E484K may have higher infectivity, transmissibility, and immune invasion characteristics, and thus need to be monitored closely.
Collapse
Affiliation(s)
- Rakesh Sarkar
- ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Ritubrita Saha
- ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Pratik Mallick
- Department of Biotechnology, St. Xavier's College, Kolkata, West Bengal, India
| | - Ranjana Sharma
- ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Amandeep Kaur
- Guru Nanak Institute of Pharmaceutical Science and Technology, Kolkata, West Bengal, India
| | - Shanta Dutta
- ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Mamta Chawla-Sarkar
- ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.
| |
Collapse
|
183
|
Chen C, Nadeau S, Yared M, Voinov P, Xie N, Roemer C, Stadler T. CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants. Bioinformatics 2021; 38:1735-1737. [PMID: 34954792 PMCID: PMC8896605 DOI: 10.1093/bioinformatics/btab856] [Citation(s) in RCA: 155] [Impact Index Per Article: 51.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 11/26/2021] [Accepted: 12/21/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY The CoV-Spectrum website supports the identification of new SARS-CoV-2 variants of concern and the tracking of known variants. Its flexible amino acid and nucleotide mutation search allows querying of variants before they are designated by a lineage nomenclature system. The platform brings together SARS-CoV-2 data from different sources and applies analyses. Results include the proportion of different variants over time, their demographic and geographic distributions, common mutations, hospitalization and mortality probabilities, estimates for transmission fitness advantage and insights obtained from wastewater samples. AVAILABILITY AND IMPLEMENTATION CoV-Spectrum is available at https://cov-spectrum.org. The code is released under the GPL-3.0 license at https://github.com/cevo-public/cov-spectrum-website.
Collapse
Affiliation(s)
| | - Sarah Nadeau
- Department of Biosystems Science and Engineering, ETH Zürich, CH-4058 Basel, Switzerland,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Michael Yared
- Department of Computer Science, ETH Zürich, CH-8092 Zürich, Switzerland
| | - Philippe Voinov
- Department of Computer Science, ETH Zürich, CH-8092 Zürich, Switzerland
| | - Ning Xie
- Department of Informatics, University of Zurich, CH-8050 Zürich, Switzerland
| | - Cornelius Roemer
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland,Biozentrum, University of Basel, CH-4056 Basel, Switzerland
| | | |
Collapse
|
184
|
Caraballo-Ortiz MA, Miura S, Sanderford M, Dolker T, Tao Q, Weaver S, Pond SLK, Kumar S. TopHap: Rapid inference of key phylogenetic structures from common haplotypes in large genome collections with limited diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.12.13.472454. [PMID: 34931186 PMCID: PMC8687460 DOI: 10.1101/2021.12.13.472454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
MOTIVATION Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of SARS-CoV-2 strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites and millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate phylogenetic inference of resolvable phylogenetic features. RESULTS We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. To assess topological robustness, we develop a bootstrap resampling strategy that resamples genomes spatiotemporally. The application of TopHap to build a phylogeny of 68,057 genomes (68KG) produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major variants of concern. AVAILABILITY TopHap is available on the web at https://github.com/SayakaMiura/TopHap . CONTACT s.kumar@temple.edu.
Collapse
Affiliation(s)
- Marcos A. Caraballo-Ortiz
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Tenzin Dolker
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sergei L. K. Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center of Excellence in Genome Medicine Research, King Abdulaziz University, Saudi Arabia
| |
Collapse
|
185
|
McBroome J, Thornlow B, Hinrichs AS, Kramer A, De Maio N, Goldman N, Haussler D, Corbett-Detig R, Turakhia Y. A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees. Mol Biol Evol 2021; 38:5819-5824. [PMID: 34469548 PMCID: PMC8662617 DOI: 10.1093/molbev/msab264] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.
Collapse
Affiliation(s)
- Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alexander Kramer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| |
Collapse
|
186
|
Jacob Machado D, White RA, Kofsky J, Janies DA. Fundamentals of genomic epidemiology, lessons learned from the coronavirus disease 2019 (COVID-19) pandemic, and new directions. ANTIMICROBIAL STEWARDSHIP & HEALTHCARE EPIDEMIOLOGY : ASHE 2021; 1:e60. [PMID: 36168505 PMCID: PMC9495640 DOI: 10.1017/ash.2021.222] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 10/15/2021] [Indexed: 04/19/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic was one of the significant causes of death worldwide in 2020. The disease is caused by severe acute coronavirus syndrome (SARS) coronavirus 2 (SARS-CoV-2), an RNA virus of the subfamily Orthocoronavirinae related to 2 other clinically relevant coronaviruses, SARS-CoV and MERS-CoV. Like other coronaviruses and several other viruses, SARS-CoV-2 originated in bats. However, unlike other coronaviruses, SARS-CoV-2 resulted in a devastating pandemic. The SARS-CoV-2 pandemic rages on due to viral evolution that leads to more transmissible and immune evasive variants. Technology such as genomic sequencing has driven the shift from syndromic to molecular epidemiology and promises better understanding of variants. The COVID-19 pandemic has exposed critical impediments that must be addressed to develop the science of pandemics. Much of the progress is being applied in the developed world. However, barriers to the use of molecular epidemiology in low- and middle-income countries (LMICs) remain, including lack of logistics for equipment and reagents and lack of training in analysis. We review the molecular epidemiology literature to understand its origins from the SARS epidemic (2002-2003) through influenza events and the current COVID-19 pandemic. We advocate for improved genomic surveillance of SARS-CoV and understanding the pathogen diversity in potential zoonotic hosts. This work will require training in phylogenetic and high-performance computing to improve analyses of the origin and spread of pathogens. The overarching goals are to understand and abate zoonosis risk through interdisciplinary collaboration and lowering logistical barriers.
Collapse
Affiliation(s)
- Denis Jacob Machado
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, Charlotte, North Carolina
| | - Richard Allen White
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, Charlotte, North Carolina
- University of North Carolina at Charlotte, North Carolina Research Campus (NCRC), Kannapolis, North Carolina
| | - Janice Kofsky
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, Charlotte, North Carolina
| | - Daniel A. Janies
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, Charlotte, North Carolina
| |
Collapse
|
187
|
Ye C, Thornlow B, Kramer A, McBroome J, Hinrichs A, Corbett-Detig R, Turakhia Y. Pandemic-scale phylogenetics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.12.03.470766. [PMID: 34927180 PMCID: PMC8679213 DOI: 10.1101/2021.12.03.470766] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Phylogenetics has been central to the genomic surveillance, epidemiology and contact tracing efforts during the COVD-19 pandemic. But the massive scale of genomic sequencing has rendered the pre-pandemic tools inadequate for comprehensive phylogenetic analyses. Here, we discuss the phylogenetic package that we developed to address the needs imposed by this pandemic. The package incorporates several pandemic-specific optimization and parallelization techniques and comprises four programs: UShER, matOptimize, RIPPLES and matUtils. Using high-performance computing, UShER and matOptimize maintain and refine daily a massive mutation-annotated phylogenetic tree consisting of all SARS-CoV-2 sequences available in online repositories. With UShER and RIPPLES, individual labs - even with modest compute resources - incorporate newly-sequenced SARS-CoV-2 genomes on this phylogeny and discover evidence for recombination in real-time. With matUtils, they rapidly query and visualize massive SARS-CoV-2 phylogenies. These tools have empowered scientists worldwide to study the SARS-CoV-2 evolution and transmission at an unprecedented scale, resolution and speed.
Collapse
Affiliation(s)
- Cheng Ye
- University of California, San Diego; San Diego, CA 92093, USA
| | - Bryan Thornlow
- University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Alexander Kramer
- University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Jakob McBroome
- University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Angie Hinrichs
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Russell Corbett-Detig
- University of California, Santa Cruz; Santa Cruz, CA 95064, USA
- Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA 95064, USA
| | - Yatish Turakhia
- University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
188
|
Klink GV, Safina K, Nabieva E, Shvyrev N, Garushyants S, Alekseeva E, Komissarov AB, Danilenko DM, Pochtovyi AA, Divisenko EV, Vasilchenko LA, Shidlovskaya EV, Kuznetsova NA, Samoilov AE, Neverov AD, Popova AV, Fedonin GG, Akimkin VG, Lioznov D, Gushchin VA, Shchur V, Bazykin GA. The rise and spread of the SARS-CoV-2 AY.122 lineage in Russia. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.12.02.21267168. [PMID: 34909799 PMCID: PMC8669866 DOI: 10.1101/2021.12.02.21267168] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
BACKGROUND Delta has outcompeted most preexisting variants of SARS-CoV-2, becoming the globally predominant lineage by mid-2021. Its subsequent evolution has led to emergence of multiple sublineages, many of which are well-mixed between countries. AIM Here, we aim to study the emergence and spread of the Delta lineage in Russia. METHODS We use a phylogeographic approach to infer imports of Delta sublineages into Russia, and phylodynamic models to assess the rate of their spread. RESULTS We show that nearly the entire Delta epidemic in Russia has probably descended from a single import event despite genetic evidence of multiple Delta imports. Indeed, over 90% of Delta samples in Russia are characterized by the nsp2:K81N+ORF7a:P45L pair of mutations which is rare outside Russia, putting them in the AY.122 sublineage. The AY.122 lineage was frequent in Russia among Delta samples from the start, and has not increased in frequency in other countries where it has been observed, suggesting that its high prevalence in Russia has probably resulted from a random founder effect. CONCLUSION The apartness of the genetic composition of the Delta epidemic in Russia makes Russia somewhat unusual, although not exceptional, among other countries.
Collapse
Affiliation(s)
- Galya V. Klink
- A.A. Kharkevich Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
| | - Ksenia Safina
- Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
| | - Elena Nabieva
- Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
| | - Nikita Shvyrev
- National Research University Higher School of Economics, Moscow, Russia
| | - Sofya Garushyants
- A.A. Kharkevich Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
- Present address: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Andrei A. Pochtovyi
- Federal State Budget Institution “National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya” of the Ministry of Health of the Russian Federation, Moscow, Russia
- Department of Virology, Biological Faculty, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Elizaveta V. Divisenko
- Federal State Budget Institution “National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya” of the Ministry of Health of the Russian Federation, Moscow, Russia
| | - Lyudmila A. Vasilchenko
- Federal State Budget Institution “National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya” of the Ministry of Health of the Russian Federation, Moscow, Russia
| | - Elena V. Shidlovskaya
- Federal State Budget Institution “National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya” of the Ministry of Health of the Russian Federation, Moscow, Russia
| | - Nadezhda A. Kuznetsova
- Federal State Budget Institution “National Research Centre for Epidemiology and Microbiology Named after Honorary Academician N F Gamaleya” of the Ministry of Health of the Russian Federation, Moscow, Russia
| | | | - Andrei E. Samoilov
- Federal Budget Institution of Science “Central Research Institute for Epidemiology” of the Federal Service for Supervision of Consumer Rights Protection and Human Welfare (Rospotrebnadzor), Moscow, Russia
| | - Alexey D. Neverov
- Federal Budget Institution of Science “Central Research Institute for Epidemiology” of the Federal Service for Supervision of Consumer Rights Protection and Human Welfare (Rospotrebnadzor), Moscow, Russia
| | - Anfisa V. Popova
- Federal Budget Institution of Science “Central Research Institute for Epidemiology” of the Federal Service for Supervision of Consumer Rights Protection and Human Welfare (Rospotrebnadzor), Moscow, Russia
| | - Gennady G. Fedonin
- Federal Budget Institution of Science “Central Research Institute for Epidemiology” of the Federal Service for Supervision of Consumer Rights Protection and Human Welfare (Rospotrebnadzor), Moscow, Russia
| | | | - Vasiliy G. Akimkin
- Federal Budget Institution of Science “Central Research Institute for Epidemiology” of the Federal Service for Supervision of Consumer Rights Protection and Human Welfare (Rospotrebnadzor), Moscow, Russia
| | - Dmitry Lioznov
- Smorodintsev Research Institute of Influenza, Saint Petersburg, Russia
- First Pavlov State Medical University, Saint Petersburg, Russia
| | - Vladimir A. Gushchin
- Department of Virology, Biological Faculty, Lomonosov Moscow State University, 119991 Moscow, Russia
- https://corgi.center/en/ (see the list of consortium members in Supplementary File 1)
| | - Vladimir Shchur
- National Research University Higher School of Economics, Moscow, Russia
| | - Georgii A. Bazykin
- A.A. Kharkevich Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russia
| |
Collapse
|
189
|
Vöhringer HS, Sanderson T, Sinnott M, De Maio N, Nguyen T, Goater R, Schwach F, Harrison I, Hellewell J, Ariani CV, Gonçalves S, Jackson DK, Johnston I, Jung AW, Saint C, Sillitoe J, Suciu M, Goldman N, Panovska-Griffiths J, Birney E, Volz E, Funk S, Kwiatkowski D, Chand M, Martincorena I, Barrett JC, Gerstung M. Genomic reconstruction of the SARS-CoV-2 epidemic in England. Nature 2021; 600:506-511. [PMID: 34649268 PMCID: PMC8674138 DOI: 10.1038/s41586-021-04069-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 09/29/2021] [Indexed: 11/09/2022]
Abstract
The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.1.1.7/Alpha lineage. The Alpha variant grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown suppressed the Alpha variant and eliminated nearly all other lineages in early 2021. Yet a series of variants (most of which contained the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. However, by accounting for sustained introductions, we found that the transmissibility of these variants is unlikely to have exceeded the transmissibility of the Alpha variant. Finally, B.1.617.2/Delta was repeatedly introduced in England and grew rapidly in early summer 2021, constituting approximately 98% of sampled SARS-CoV-2 genomes on 26 June 2021.
Collapse
Affiliation(s)
- Harald S Vöhringer
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, UK
| | - Theo Sanderson
- Wellcome Sanger Institute, Hinxton, UK
- The Francis Crick Institute, London, UK
| | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, UK
| | | | | | - Frank Schwach
- Wellcome Sanger Institute, Hinxton, UK
- Public Health England, London, UK
| | | | - Joel Hellewell
- London School of Hygiene & Tropical Medicine, London, UK
| | | | | | | | | | - Alexander W Jung
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, UK
| | | | | | | | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, UK
| | | | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, UK
| | - Erik Volz
- MRC Centre for Global Infectious Disease Analysis, Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK
| | - Sebastian Funk
- London School of Hygiene & Tropical Medicine, London, UK
| | | | - Meera Chand
- Public Health England, London, UK
- Guy's and St Thomas' NHS Foundation Trust, London, UK
| | | | | | - Moritz Gerstung
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, UK.
- Division for AI in Oncology, German Cancer Research Centre DKFZ, Heidelberg, Germany.
| |
Collapse
|
190
|
Garushyants SK, Rogozin IB, Koonin EV. Template switching and duplications in SARS-CoV-2 genomes give rise to insertion variants that merit monitoring. Commun Biol 2021; 4:1343. [PMID: 34848826 PMCID: PMC8632935 DOI: 10.1038/s42003-021-02858-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 11/01/2021] [Indexed: 12/29/2022] Open
Abstract
The appearance of multiple new SARS-CoV-2 variants during the COVID-19 pandemic is a matter of grave concern. Some of these variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the Spike (S) protein is thought to be a determinant of SARS-CoV-2 virulence. Here, we identify 346 unique inserts of different lengths in SARS-CoV-2 genomes and present evidence that these inserts reflect actual virus variance rather than sequencing artifacts. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. At least three inserts in the N-terminal domain of the S protein are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity. Thus, inserts in the S protein can affect its antigenic properties and merit monitoring.
Collapse
Affiliation(s)
- Sofya K Garushyants
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
191
|
Gauthier NPG, Nelson C, Bonsall MB, Locher K, Charles M, MacDonald C, Krajden M, Chorlton SD, Manges AR. Nanopore metagenomic sequencing for detection and characterization of SARS-CoV-2 in clinical samples. PLoS One 2021; 16:e0259712. [PMID: 34793508 PMCID: PMC8601544 DOI: 10.1371/journal.pone.0259712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/25/2021] [Indexed: 11/18/2022] Open
Abstract
OBJECTIVES The COVID-19 pandemic has underscored the need for rapid novel diagnostic strategies. Metagenomic Next-Generation Sequencing (mNGS) may allow for the detection of pathogens that can be missed in targeted assays. The goal of this study was to assess the performance of nanopore-based Sequence-Independent Single Primer Amplification (SISPA) for the detection and characterization of SARS-CoV-2. METHODS We performed mNGS on clinical samples and designed a diagnostic classifier that corrects for barcode crosstalk between specimens. Phylogenetic analysis was performed on genome assemblies. RESULTS Our assay yielded 100% specificity overall and 95.2% sensitivity for specimens with a RT-PCR cycle threshold value less than 30. We assembled 10 complete, and one near-complete genomes from 20 specimens that were classified as positive by mNGS. Phylogenetic analysis revealed that 10/11 specimens from British Columbia had a closest relative to another British Columbian specimen. We found 100% concordance between phylogenetic lineage assignment and Variant of Concern (VOC) PCR results. Our assay was able to distinguish between the Alpha and Gamma variants, which was not possible with the current standard VOC PCR being used in British Columbia. CONCLUSIONS This study supports future work examining the broader feasibility of nanopore mNGS as a diagnostic strategy for the detection and characterization of viral pathogens.
Collapse
Affiliation(s)
- Nick P G Gauthier
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassidy Nelson
- Mathematical Ecology Research Group, Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Michael B Bonsall
- Mathematical Ecology Research Group, Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Kerstin Locher
- Division of Medical Microbiology, Vancouver General Hospital, Vancouver, British Columbia, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Marthe Charles
- Division of Medical Microbiology, Vancouver General Hospital, Vancouver, British Columbia, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Clayton MacDonald
- Division of Medical Microbiology, Vancouver General Hospital, Vancouver, British Columbia, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Mel Krajden
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| | - Samuel D Chorlton
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
- BugSeq Bioinformatics Inc, Vancouver, British Columbia, Canada
| | - Amee R Manges
- British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
192
|
Balaban M, Jiang Y, Roush D, Zhu Q, Mirarab S. Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol Ecol Resour 2021; 22:1213-1227. [PMID: 34643995 DOI: 10.1111/1755-0998.13527] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/05/2021] [Indexed: 01/04/2023]
Abstract
Phylogenetic placement of query samples on an existing phylogeny is increasingly used in molecular ecology, including sample identification and microbiome environmental sampling. As the size of available reference trees used in these analyses continues to grow, there is a growing need for methods that place sequences on ultra-large trees with high accuracy. Distance-based placement methods have recently emerged as a path to provide such scalability while allowing flexibility to analyse both assembled and unassembled environmental samples. In this study, we introduce a distance-based phylogenetic placement method, APPLES-2, that is more accurate and scalable than existing distance-based methods and even some of the leading maximum-likelihood methods. This scalability is owed to a divide-and-conquer technique that limits distance calculation and phylogenetic placement to parts of the tree most relevant to each query. The increased scalability and accuracy enables us to study the effectiveness of APPLES-2 for placing microbial genomes on a data set of 10,575 microbial species using subsets of 381 marker genes. APPLES-2 has very high accuracy in this setting, placing 97% of query genomes within three branches of the optimal position in the species tree using 50 marker genes. Our proof-of-concept results show that APPLES-2 can quickly place metagenomic scaffolds on ultra-large backbone trees with high accuracy as long as a scaffold includes tens of marker genes. These results pave the path for a more scalable and widespread use of distance-based placement in various areas of molecular ecology.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Yueyu Jiang
- Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, USA
| | - Daniel Roush
- Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
| | - Qiyun Zhu
- Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, USA
| |
Collapse
|
193
|
Blanke M, Morgenstern B. App-SpaM: phylogenetic placement of short reads without sequence alignment. BIOINFORMATICS ADVANCES 2021; 1:vbab027. [PMID: 36700102 PMCID: PMC9710606 DOI: 10.1093/bioadv/vbab027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 09/27/2021] [Accepted: 10/11/2021] [Indexed: 01/28/2023]
Abstract
Motivation Phylogenetic placement is the task of placing a query sequence of unknown taxonomic origin into a given phylogenetic tree of a set of reference sequences. A major field of application of such methods is, for example, the taxonomic identification of reads in metabarcoding or metagenomic studies. Several approaches to phylogenetic placement have been proposed in recent years. The most accurate of them requires a multiple sequence alignment of the references as input. However, calculating multiple alignments is not only time-consuming but also limits the applicability of these approaches. Results Herein, we propose Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM), an efficient algorithm for the phylogenetic placement of short sequencing reads on a tree of a set of reference sequences. App-SpaM produces results of high quality that are on a par with the best available approaches to phylogenetic placement, while our software is two orders of magnitude faster than these existing methods. Our approach neither requires a multiple alignment of the reference sequences nor alignments of the queries to the references. This enables App-SpaM to perform phylogenetic placement on a broad variety of datasets. Availability and implementation The source code of App-SpaM is freely available on Github at https://github.com/matthiasblanke/App-SpaM together with detailed instructions for installation and settings. App-SpaM is furthermore available as a Conda-package on the Bioconda channel. Contact matthias.blanke@biologie.uni-goettingen.de. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Matthias Blanke
- Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Göttingen 37077, Germany
- International Max Planck Research School for Genome Science, Göttingen 37077, Germany
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Göttingen 37077, Germany
- Campus-Institute Data Science (CIDAS), Göttingen 37077, Germany
| |
Collapse
|
194
|
Progress and challenges in virus genomic epidemiology. Trends Parasitol 2021; 37:1038-1049. [PMID: 34620561 DOI: 10.1016/j.pt.2021.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/24/2021] [Accepted: 08/26/2021] [Indexed: 12/18/2022]
Abstract
Genomic epidemiology, which links pathogen genomes with associated metadata to understand disease transmission, has become a key component of outbreak response. Decreasing costs of genome sequencing and increasing computational power provide opportunities to generate and analyse large viral genomic datasets that aim to uncover the spatial scales of transmission, the demographics contributing to transmission patterns, and to forecast epidemic trends. Emerging sources of genomic data and associated metadata provide new opportunities to further unravel transmission patterns. Key challenges include how to integrate genomic data with metadata from multiple sources, how to generate efficient computational algorithms to cope with large datasets, and how to establish sampling frameworks to enable robust conclusions.
Collapse
|
195
|
De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: efficient simulation of sequence evolution for pandemic-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.03.15.435416. [PMID: 33758852 PMCID: PMC7987011 DOI: 10.1101/2021.03.15.435416] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100,000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software is available from https://github.com/NicolaDM/phastSim and allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutatability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - William Boulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
196
|
Exploiting genomic surveillance to map the spatio-temporal dispersal of SARS-CoV-2 spike mutations in Belgium across 2020. Sci Rep 2021; 11:18580. [PMID: 34535691 PMCID: PMC8448849 DOI: 10.1038/s41598-021-97667-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 08/24/2021] [Indexed: 11/21/2022] Open
Abstract
At the end of 2020, several new variants of SARS-CoV-2—designated variants of concern—were detected and quickly suspected to be associated with a higher transmissibility and possible escape of vaccine-induced immunity. In Belgium, this discovery has motivated the initiation of a more ambitious genomic surveillance program, which is drastically increasing the number of SARS-CoV-2 genomes to analyse for monitoring the circulation of viral lineages and variants of concern. In order to efficiently analyse the massive collection of genomic data that are the result of such increased sequencing efforts, streamlined analytical strategies are crucial. In this study, we illustrate how to efficiently map the spatio-temporal dispersal of target mutations at a regional level. As a proof of concept, we focus on the Belgian province of Liège that has been consistently sampled throughout 2020, but was also one of the main epicenters of the second European epidemic wave. Specifically, we employ a recently developed phylogeographic workflow to infer the regional dispersal history of viral lineages associated with three specific mutations on the spike protein (S98F, A222V and S477N) and to quantify their relative importance through time. Our analytical pipeline enables analysing large data sets and has the potential to be quickly applied and updated to track target mutations in space and time throughout the course of an epidemic.
Collapse
|
197
|
Lam-Hine T, McCurdy SA, Santora L, Duncan L, Corbett-Detig R, Kapusinszky B, Willis M. Outbreak Associated with SARS-CoV-2 B.1.617.2 (Delta) Variant in an Elementary School - Marin County, California, May-June 2021. MMWR. MORBIDITY AND MORTALITY WEEKLY REPORT 2021; 70:1214-1219. [PMID: 34473683 PMCID: PMC8422870 DOI: 10.15585/mmwr.mm7035e2] [Citation(s) in RCA: 77] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
198
|
Yang J, Yan Y, Zhong W. Application of omics technology to combat the COVID-19 pandemic. MedComm (Beijing) 2021; 2:381-401. [PMID: 34766152 PMCID: PMC8554664 DOI: 10.1002/mco2.90] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 08/22/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022] Open
Abstract
As of August 27, 2021, the ongoing pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has spread to over 220 countries, areas, and territories. Thus far, 214,468,601 confirmed cases, including 4,470,969 deaths, have been reported to the World Health Organization. To combat the COVID-19 pandemic, multiomics-based strategies, including genomics, transcriptomics, proteomics, and metabolomics, have been used to study the diagnosis methods, pathogenesis, prognosis, and potential drug targets of COVID-19. In order to help researchers and clinicians to keep up with the knowledge of COVID-19, we summarized the most recent progresses reported in omics-based research papers. This review discusses omics-based approaches for studying COVID-19, summarizing newly emerged SARS-CoV-2 variants as well as potential diagnostic methods, risk factors, and pathological features of COVID-19. This review can help researchers and clinicians gain insight into COVID-19 features, providing direction for future drug development and guidance for clinical treatment, so that patients can receive appropriate treatment as soon as possible to reduce the risk of disease progression.
Collapse
Affiliation(s)
- Jingjing Yang
- National Engineering Research Center for the Emergency DrugBeijing Institute of Pharmacology and ToxicologyBeijingChina
- School of Pharmaceutical SciencesHainan UniversityHaikouHainanChina
| | - Yunzheng Yan
- National Engineering Research Center for the Emergency DrugBeijing Institute of Pharmacology and ToxicologyBeijingChina
| | - Wu Zhong
- National Engineering Research Center for the Emergency DrugBeijing Institute of Pharmacology and ToxicologyBeijingChina
| |
Collapse
|
199
|
Bagci C, Bryant D, Cetinkaya B, Huson DH. Microbial Phylogenetic Context Using Phylogenetic Outlines. Genome Biol Evol 2021; 13:6370152. [PMID: 34519776 PMCID: PMC8462278 DOI: 10.1093/gbe/evab213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2021] [Indexed: 12/30/2022] Open
Abstract
Microbial studies typically involve the sequencing and assembly of draft genomes for individual microbes or whole microbiomes. Given a draft genome, one first task is to determine its phylogenetic context, that is, to place it relative to the set of related reference genomes. We provide a new interactive graphical tool that addresses this task using Mash sketches to compare against all bacterial and archaeal representative genomes in the Genome Taxonomy Database taxonomy, all within the framework of SplitsTree5. The phylogenetic context of the query sequences is then displayed as a phylogenetic outline, a new type of phylogenetic network that is more general than a phylogenetic tree, but significantly less complex than other types of phylogenetic networks. We propose to use such networks, rather than trees, to represent phylogenetic context, because they can express uncertainty in the placement of taxa, whereas a tree must always commit to a specific branching pattern. We illustrate the new method using a number of draft genomes of different assembly quality.
Collapse
Affiliation(s)
- Caner Bagci
- Algorithms in Bioinformatics, University of Tübingen, Germany
| | - David Bryant
- Department of Mathematics, University of Otago, Dunedin, New Zealand
| | - Banu Cetinkaya
- Computer Science Program, Sabanci University, Tuzla/İstanbul, Turkey
| | - Daniel H Huson
- Algorithms in Bioinformatics, University of Tübingen, Germany.,Cluster of Excellence: Controlling Microbes to Fight Infection, University of Tübingen, Tübingen, Germany
| |
Collapse
|
200
|
Garushyants SK, Rogozin IB, Koonin EV. Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.04.23.441209. [PMID: 33907754 PMCID: PMC8077628 DOI: 10.1101/2021.04.23.441209] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The appearance of multiple new SARS-CoV-2 variants during the winter of 2020-2021 is a matter of grave concern. Some of these new variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the COVID-19 pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on point nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the spike protein is thought to be a determinant of SARS-CoV-2 virulence and other inserts might have contributed to coronavirus pathogenicity as well. Here, we investigate insertions in SARS-CoV-2 genomes and identify 347 unique inserts of different lengths. We present evidence that these inserts reflect actual virus variance rather than sequencing errors. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. We show that inserts in the Spike glycoprotein can affect its antigenic properties and thus merit monitoring. At least, three inserts in the N-terminal domain of the Spike (ins245IME, ins246DSWG, and ins248SSLT) that were first detected in 2021 are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity.
Collapse
Affiliation(s)
- Sofya K. Garushyants
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Igor B. Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|