51
|
Turakhia Y, De Maio N, Thornlow B, Gozashti L, Lanfear R, Walker CR, Hinrichs AS, Fernandes JD, Borges R, Slodkowicz G, Weilguny L, Haussler D, Goldman N, Corbett-Detig R. Stability of SARS-CoV-2 phylogenies. PLoS Genet 2020; 16:e1009175. [PMID: 33206635 PMCID: PMC7721162 DOI: 10.1371/journal.pgen.1009175] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 12/07/2020] [Accepted: 10/06/2020] [Indexed: 12/23/2022] Open
Abstract
The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.
Collapse
Affiliation(s)
- Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Landen Gozashti
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States of America
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Angie S. Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Jason D. Fernandes
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, United States of America
| | - Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | - Greg Slodkowicz
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, United States of America
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| |
Collapse
|
52
|
Dey A, Das R, Misra H, Uppal S. Coronavirus disease 2019: scientific overview of the global pandemic. New Microbes New Infect 2020; 38:100800. [PMID: 33133611 PMCID: PMC7591944 DOI: 10.1016/j.nmni.2020.100800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 10/09/2020] [Accepted: 10/22/2020] [Indexed: 12/24/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19) is the disease caused by the novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Genome sequencing of the virus revealed that it is a new zoonotic virus that might have evolved by jumping from bats to humans with one or more intermediate hosts. The immediate availability of the sequence information in the public domain has accelerated the development of quantitative RT-PCR-based diagnostics. Numerous clinical trials have been prioritized globally for testing new vaccines and treatments against this disease. This review provides a broad insight into different aspects of COVID-19, an introduction to SARS-CoV-2 mitigation strategies and the present status of diagnostics and therapeutics.
Collapse
Affiliation(s)
- A. Dey
- Molecular Genetics Section, Molecular Biology Division, Bhabha Atomic Research Centre, Trombay, Mumbai, India
| | - R. Das
- Molecular Genetics Section, Molecular Biology Division, Bhabha Atomic Research Centre, Trombay, Mumbai, India
- Homi Bhabha National Institute, Anushakti Nagar, Mumbai, India
| | - H.S. Misra
- Molecular Genetics Section, Molecular Biology Division, Bhabha Atomic Research Centre, Trombay, Mumbai, India
- Homi Bhabha National Institute, Anushakti Nagar, Mumbai, India
| | - S. Uppal
- Molecular Genetics Section, Molecular Biology Division, Bhabha Atomic Research Centre, Trombay, Mumbai, India
- Homi Bhabha National Institute, Anushakti Nagar, Mumbai, India
| |
Collapse
|
53
|
Banerjee A, Sarkar R, Mitra S, Lo M, Dutta S, Chawla-Sarkar M. The Novel Coronavirus Enigma: Phylogeny and Analyses of Coevolving Mutations Among the SARS-CoV-2 Viruses Circulating in India. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2020; 1:e20735. [PMID: 33496683 PMCID: PMC7720937 DOI: 10.2196/20735] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 07/25/2020] [Accepted: 08/24/2020] [Indexed: 01/15/2023]
Abstract
BACKGROUND The RNA genome of the emerging novel coronavirus is rapidly mutating, and its human-to-human transmission rate is increasing. Hence, temporal dissection of their evolutionary dynamics, the nature of variations among different strains, and understanding the single nucleotide polymorphisms in the endemic settings are crucial. Delineating the heterogeneous genomic constellations of this novel virus will help us understand its complex behavior in a particular geographical region. OBJECTIVE This is a comprehensive analysis of 95 Indian SARS-CoV-2 genome sequences available from the Global Initiative on Sharing All Influenza Data (GISAID) repository during the first 6 months of 2020 (January through June). Evolutionary dynamics, gene-specific phylogeny, and the emergence of the novel coevolving mutations in 9 structural and nonstructural genes among circulating SARS-CoV-2 strains across 12 different Indian states were analyzed. METHODS A total of 95 SARS-CoV-2 nucleotide sequences submitted from India were downloaded from the GISAID database. Molecular Evolutionary Genetics Analysis, version X software was used to construct the 9 phylogenetic dendrograms based on nucleotide sequences of the SARS-CoV-2 genes. Analyses of the coevolving mutations were done in comparison to the prototype SARS-CoV-2 from Wuhan, China. The secondary structure of the RNA-dependent RNA polymerase/nonstructural protein NSP12 was predicted with respect to the novel A97V mutation. RESULTS Phylogenetic analyses revealed the evolution of "genome-type clusters" and adaptive selection of "L"-type SARS-CoV-2 strains with genetic closeness to the bat severe acute respiratory syndrome-like coronaviruses. These strains were distant to pangolin or Middle East respiratory syndrome-related coronavirus strains. With regard to the novel coevolving mutations, 2 groups have been seen circulating in India at present, the "major group" (66/95, 69.4%) and the "minor group" (21/95, 22.1%) , harboring 4 and 5 coexisting mutations, respectively. The "major group" mutations fall in the A2a clade. All the minor group mutations, except 11083G>T (L37F, NSP6 gene), were unique to the Indian isolates. CONCLUSIONS This study highlights the rapidly evolving SARS-CoV-2 virus and the cocirculation of multiple clades and subclades. This comprehensive study is a potential resource for monitoring the novel mutations in the viral genome, interpreting changes in viral pathogenesis, and designing vaccines or other therapeutics.
Collapse
Affiliation(s)
- Anindita Banerjee
- Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases Kolkata India
| | - Rakesh Sarkar
- Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases Kolkata India
| | - Suvrotoa Mitra
- Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases Kolkata India
| | - Mahadeb Lo
- Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases Kolkata India
| | - Shanta Dutta
- Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases Kolkata India
| | - Mamta Chawla-Sarkar
- Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases Kolkata India
| |
Collapse
|