1
|
Ong'era EM, Mohammed KS, Makori TO, Bejon P, Ocholla-Oyier LI, Nokes DJ, Agoti CN, Githinji G. High-throughput sequencing approaches applied to SARS-CoV-2. Wellcome Open Res 2023. [DOI: 10.12688/wellcomeopenres.18701.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023] Open
Abstract
High-throughput sequencing is crucial for surveillance and control of viral outbreaks. During the ongoing coronavirus disease 2019 (COVID-19) pandemic, advances in the high-throughput sequencing technology resources have enhanced diagnosis, surveillance, and vaccine discovery. From the onset of the pandemic in December 2019, several genome-sequencing approaches have been developed and supported across the major sequencing platforms such as Illumina, Oxford Nanopore, PacBio, MGI DNBSEQTM and Ion Torrent. Here, we share insights from the sequencing approaches developed for sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) between December 2019 and October 2022.
Collapse
|
2
|
De Maio N, Walker CR, Turakhia Y, Lanfear R, Corbett-Detig R, Goldman N. Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2. Genome Biol Evol 2021; 13:evab087. [PMID: 33895815 PMCID: PMC8135539 DOI: 10.1093/gbe/evab087] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2021] [Indexed: 12/23/2022] Open
Abstract
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
| | - Conor R Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
- Department of Genetics, University of Cambridge, United Kingdom
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
| |
Collapse
|
3
|
Yang W, Jin G. Origin-independent analysis links SARS-CoV-2 local genomes with COVID-19 incidence and mortality. Brief Bioinform 2021; 22:905-913. [PMID: 32924062 PMCID: PMC7543285 DOI: 10.1093/bib/bbaa208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 07/24/2020] [Accepted: 08/12/2020] [Indexed: 12/24/2022] Open
Abstract
There is an urgent public health need to better understand Severe Acute Respiratory Syndrome (SARS)-CoV-2/COVID-19, particularly how sequences of the viruses could lead to diverse incidence and mortality of COVID-19 in different countries. However, because of its unknown ancestors and hosts, elucidating the genetic variations of the novel coronavirus, SARS-CoV-2, has been difficult. Without needing to know ancestors, we identified an uneven distribution of local genome similarities among the viruses categorized by geographic regions, and it was strongly correlated with incidence and mortality. To ensure unbiased and origin-independent analyses, we used a pairwise comparison of local genome sequences of virus genomes by Basic Local Alignment Search Tool (BLAST). We found a strong statistical correlation between dominance of the SARS-CoV-2 in distributions of uneven similarities and the incidence and mortality of illness. Genomic annotation of the BLAST hits also showed that viruses from geographic regions with severe infections tended to have more dynamic genomic regions in the SARS-CoV-2 receptor-binding domain (RBD) and receptor-binding motif (RBM) of the spike protein (S protein). Dynamic domains in the S protein were also confirmed by a canyon region of mismatches coincident with RBM and RBD, without hits of alignments of 100% matching. Thus, our origin-independent analysis suggests that the dynamic and unstable SARS-CoV-2-RBD could be the main reason for diverse incidence and mortality of COVID-19 infection.
Collapse
Affiliation(s)
| | - Guangxu Jin
- Corresponding author: Guangxu Jin, Department of Cancer Biology, Wake Forest School of Medicine Wake Forest Baptist Comprehensive Cancer Center Center for Precision Medicine, Wake Forest School of Medicine Medical Center Boulevard, Winston-Salem, NC 27157, USA.Tel: +(336) 713-7515; Fax: +(336) 713-7544. E-mail:
| |
Collapse
|
4
|
Chen X, Kang Y, Luo J, Pang K, Xu X, Wu J, Li X, Jin S. Next-Generation Sequencing Reveals the Progression of COVID-19. Front Cell Infect Microbiol 2021; 11:632490. [PMID: 33777844 PMCID: PMC7991797 DOI: 10.3389/fcimb.2021.632490] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open
Abstract
The novel coronavirus SARS-CoV-2 (causing the disease COVID-19) has caused a highly transmissible and ongoing pandemic worldwide. Due to its rapid development, next-generation sequencing plays vital roles in many aspects. Here, we summarize the current knowledge on the origin and human transmission of SARS-CoV-2 based on NGS analysis. The ACE2 expression levels in various human tissues and relevant cells were compared to provide insights into the mechanism of SAS-CoV-2 infection. Gut microbiota dysbiosis observed by metagenome sequencing and the immunogenetics of COVID-19 patients according to single-cell sequencing analysis were also highlighted. Overall, the application of these sequencing techniques could be meaningful for finding novel intermediate SARS-CoV-2 hosts to block interspecies transmission. This information will further benefit SARS-CoV-2 diagnostic development and new therapeutic target discovery. The extensive application of NGS will provide powerful support for our fight against future public health emergencies.
Collapse
Affiliation(s)
- Xiaomin Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Yutong Kang
- Wenzhou Key Laboratory of Sanitary Microbiology, Ministry of Education, Wenzhou, China
- Key Laboratory of Laboratory Medicine, Ministry of Education, Wenzhou, China
- School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China
| | - Jing Luo
- Rheumatology Department, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Kun Pang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xin Xu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Jinyu Wu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Xiaokun Li
- Chemical Biology Research Center, School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Shengwei Jin
- Department of Anesthesia and Critical Care, Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
5
|
De Maio N, Walker CR, Turakhia Y, Lanfear R, Corbett-Detig R, Goldman N. Mutation rates and selection on synonymous mutations in SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.14.426705. [PMID: 33469589 PMCID: PMC7814826 DOI: 10.1101/2021.01.14.426705] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G→U and C→U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. While previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Conor R Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
6
|
Zeng HL, Dichio V, Rodríguez Horta E, Thorell K, Aurell E. Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis between eight viral genes. Proc Natl Acad Sci U S A 2020; 117:31519-31526. [PMID: 33203681 PMCID: PMC7733830 DOI: 10.1073/pnas.2012331117] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Genome-wide epistasis analysis is a powerful tool to infer gene interactions, which can guide drug and vaccine development and lead to deeper understanding of microbial pathogenesis. We have considered all complete severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes deposited in the Global Initiative on Sharing All Influenza Data (GISAID) repository until four different cutoff dates, and used direct coupling analysis together with an assumption of quasi-linkage equilibrium to infer epistatic contributions to fitness from polymorphic loci. We find eight interactions, of which three are between pairs where one locus lies in gene ORF3a, both loci holding nonsynonymous mutations. We also find interactions between two loci in gene nsp13, both holding nonsynonymous mutations, and four interactions involving one locus holding a synonymous mutation. Altogether, we infer interactions between loci in viral genes ORF3a and nsp2, nsp12, and nsp6, between ORF8 and nsp4, and between loci in genes nsp2, nsp13, and nsp14. The paper opens the prospect to use prominent epistatically linked pairs as a starting point to search for combinatorial weaknesses of recombinant viral pathogens.
Collapse
Affiliation(s)
- Hong-Li Zeng
- New Energy Technology Engineering Laboratory of Jiangsu Province, School of Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
- Nordic Institute for Theoretical Physics, Royal Institute of Technology and Stockholm University, 10691 Stockholm, Sweden
| | - Vito Dichio
- Nordic Institute for Theoretical Physics, Royal Institute of Technology and Stockholm University, 10691 Stockholm, Sweden
- Department of Physics, University of Trieste, 34151 Trieste, Italy
- Department of Computational Science and Technology, AlbaNova University Center, 10691 Stockholm, Sweden
| | - Edwin Rodríguez Horta
- Group of Complex Systems and Statistical Physics, Department of Theoretical Physics, Physics Faculty, University of Havana, 10400 Havana, Cuba
| | - Kaisa Thorell
- Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 40530 Gothenburg, Sweden
- Center for Translational Microbiome Research, Department of Microbiology, Cell and Tumor Biology, Karolinska Institutet, 17177 Stockholm, Sweden
| | - Erik Aurell
- Department of Computational Science and Technology, AlbaNova University Center, 10691 Stockholm, Sweden;
| |
Collapse
|
7
|
Turakhia Y, De Maio N, Thornlow B, Gozashti L, Lanfear R, Walker CR, Hinrichs AS, Fernandes JD, Borges R, Slodkowicz G, Weilguny L, Haussler D, Goldman N, Corbett-Detig R. Stability of SARS-CoV-2 phylogenies. PLoS Genet 2020; 16:e1009175. [PMID: 33206635 PMCID: PMC7721162 DOI: 10.1371/journal.pgen.1009175] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 12/07/2020] [Accepted: 10/06/2020] [Indexed: 12/23/2022] Open
Abstract
The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.
Collapse
Affiliation(s)
- Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Landen Gozashti
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States of America
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Angie S. Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Jason D. Fernandes
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, United States of America
| | - Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | - Greg Slodkowicz
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, United States of America
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| |
Collapse
|
8
|
Chen J, Hilt EE, Li F, Wu H, Jiang Z, Zhang Q, Wang J, Wang Y, Li Z, Tang J, Yang S. Epidemiological and Genomic Analysis of SARS-CoV-2 in 10 Patients From a Mid-Sized City Outside of Hubei, China in the Early Phase of the COVID-19 Outbreak. Front Public Health 2020; 8:567621. [PMID: 33072702 PMCID: PMC7531217 DOI: 10.3389/fpubh.2020.567621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 08/14/2020] [Indexed: 12/28/2022] Open
Abstract
A novel coronavirus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing Coronavirus Disease 2019 (COVID-19) pandemic. In this study, we performed a comprehensive epidemiological and genomic analysis of SARS-CoV-2 genomes from 10 patients in Shaoxing (Zhejiang Province), a mid-sized city outside of the epicenter Hubei province, China, during the early stage of the outbreak (late January to early February, 2020). We obtained viral genomes with >99% coverage and a mean depth of 296X demonstrating that viral genomic analysis is feasible via metagenomics sequencing directly on nasopharyngeal samples with SARS-CoV-2 Real-time PCR Ct values <28. We found that a cluster of four patients with travel history to Hubei shared the exact same virus with patients from Wuhan, Taiwan, Belgium, and Australia, highlighting how quickly this virus spread to the globe. The virus from another cluster of two family members living together without travel history but with a sick contact of a confirmed case from another city outside of Hubei accumulated significantly more mutations (9 SNPs vs. average 4 SNPs), suggesting a complex and dynamic nature of this outbreak. Our findings add to the growing knowledge of the epidemiological and genomic characteristics of SARS-CoV-2 and offers a glimpse into the early phase of this viral infection outside of Hubei, China.
Collapse
Affiliation(s)
- Jinkun Chen
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Evann E Hilt
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Fan Li
- Three Coin Analytics, Inc., Pleasanton, CA, United States
| | - Huan Wu
- IngeniGen XunMinKang Biotechnology Inc., Shaoxing, China
| | - Zhuojing Jiang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Qinchao Zhang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Jiling Wang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Yifang Wang
- IngeniGen XunMinKang Biotechnology Inc., Shaoxing, China
| | - Ziqin Li
- Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou, China
| | - Jialiang Tang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Shangxin Yang
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, United States.,Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou, China
| |
Collapse
|
9
|
Lemey P, Hong S, Hill V, Baele G, Poletto C, Colizza V, O’Toole Á, McCrone JT, Andersen KG, Worobey M, Nelson MI, Rambaut A, Suchard MA. Accommodating individual travel history, global mobility, and unsampled diversity in phylogeography: a SARS-CoV-2 case study. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.06.22.165464. [PMID: 32596695 PMCID: PMC7315996 DOI: 10.1101/2020.06.22.165464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spatiotemporal bias in genome sequence sampling can severely confound phylogeographic inference based on discrete trait ancestral reconstruction. This has impeded our ability to accurately track the emergence and spread of SARS-CoV-2, which is the virus responsible for the COVID-19 pandemic. Despite the availability of staggering numbers of genomes on a global scale, evolutionary reconstructions of SARS-CoV-2 are hindered by the slow accumulation of sequence divergence over its relatively short transmission history. When confronted with these issues, incorporating additional contextual data may critically inform phylodynamic reconstructions. Here, we present a new approach to integrate individual travel history data in Bayesian phylogeographic inference and apply it to the early spread of SARS-CoV-2, while also including global air transportation data. We demonstrate that including travel history data for each SARS-CoV-2 genome yields more realistic reconstructions of virus spread, particularly when travelers from undersampled locations are included to mitigate sampling bias. We further explore the impact of sampling bias by incorporating unsampled sequences from undersampled locations in the analyses. Our reconstructions reinforce specific transmission hypotheses suggested by the inclusion of travel history data, but also suggest alternative routes of virus migration that are plausible within the epidemiological context but are not apparent with current sampling efforts. Although further research is needed to fully examine the performance of our new data integration approaches and to further improve them, they represent multiple new avenues for directly addressing the colossal issue of sample bias in phylogeographic inference.
Collapse
Affiliation(s)
- Philippe Lemey
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| | - Samuel Hong
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| | - Verity Hill
- Centre for Immunology, Infection and Evolution, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Guy Baele
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| | - Chiara Poletto
- INSERM, Sorbonne Université, Institut Pierre Louis d’Epidémiologie et de Santé Publique IPLESP, F75012 Paris, France
| | - Vittoria Colizza
- INSERM, Sorbonne Université, Institut Pierre Louis d’Epidémiologie et de Santé Publique IPLESP, F75012 Paris, France
| | - Áine O’Toole
- Centre for Immunology, Infection and Evolution, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - John T. McCrone
- Centre for Immunology, Infection and Evolution, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Kristian G. Andersen
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA 92037, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Martha I. Nelson
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Andrew Rambaut
- Centre for Immunology, Infection and Evolution, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
10
|
Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. The emergence of SARS-CoV-2 in Europe and the US. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.05.21.109322. [PMID: 32511416 PMCID: PMC7265688 DOI: 10.1101/2020.05.21.109322] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Accurate understanding of the global spread of emerging viruses is critically important for public health response and for anticipating and preventing future outbreaks. Here, we elucidate when, where and how the earliest sustained SARS-CoV-2 transmission networks became established in Europe and the United States (US). Our results refute prior findings erroneously linking cases in January 2020 with outbreaks that occurred weeks later. Instead, rapid interventions successfully prevented onward transmission of those early cases in Germany and Washington State. Other, later introductions of the virus from China to both Italy and Washington State founded the earliest sustained European and US transmission networks. Our analyses reveal an extended period of missed opportunity when intensive testing and contact tracing could have prevented SARS-CoV-2 from becoming established in the US and Europe.
Collapse
Affiliation(s)
- Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Brendan B. Larsen
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Martha I. Nelson
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Verity Hill
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Jeffrey B. Joy
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Bioinformatics Programme, University of British Columbia, Vancouver, BC
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Philippe Lemey
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| |
Collapse
|