1
|
Ly-Trong N, Bielow C, De Maio N, Minh BQ. CMAPLE: Efficient Phylogenetic Inference in the Pandemic Era. Mol Biol Evol 2024; 41:msae134. [PMID: 38934791 PMCID: PMC11232695 DOI: 10.1093/molbev/msae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/15/2024] [Accepted: 06/21/2024] [Indexed: 06/28/2024] Open
Abstract
We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (i) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements, and (ii) CMAPLE library, a suite of application programming interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step toward better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.
Collapse
Affiliation(s)
- Nhan Ly-Trong
- School of Computing, College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2600, Australia
| | - Chris Bielow
- Bioinformatics Solution Center, Freie Universität Berlin, 14195 Berlin, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Bui Quang Minh
- School of Computing, College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2600, Australia
| |
Collapse
|
2
|
Rossier V, Train C, Nevers Y, Robinson-Rechavi M, Dessimoz C. Matreex: Compact and Interactive Visualization for Scalable Studies of Large Gene Families. Genome Biol Evol 2024; 16:evae100. [PMID: 38742690 PMCID: PMC11149776 DOI: 10.1093/gbe/evae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/17/2024] [Accepted: 05/03/2024] [Indexed: 05/16/2024] Open
Abstract
Studying gene family evolution strongly benefits from insightful visualizations. However, the ever-growing number of sequenced genomes is leading to increasingly larger gene families, which challenges existing gene tree visualizations. Indeed, most of them present users with a dilemma: display complete but intractable gene trees, or collapse subtrees, thereby hiding their children's information. Here, we introduce Matreex, a new dynamic tool to scale up the visualization of gene families. Matreex's key idea is to use "phylogenetic" profiles, which are dense representations of gene repertoires, to minimize the information loss when collapsing subtrees. We illustrate Matreex's usefulness with three biological applications. First, we demonstrate on the MutS family the power of combining gene trees and phylogenetic profiles to delve into precise evolutionary analyses of large multicopy gene families. Second, by displaying 22 intraflagellar transport gene families across 622 species cumulating 5,500 representatives, we show how Matreex can be used to automate large-scale analyses of gene presence-absence. Notably, we report for the first time the complete loss of intraflagellar transport in the myxozoan Thelohanellus kitauei. Finally, using the textbook example of visual opsins, we show Matreex's potential to create easily interpretable figures for teaching and outreach. Matreex is available from the Python Package Index (pip install Matreex) with the source code and documentation available at https://github.com/DessimozLab/matreex.
Collapse
Affiliation(s)
- Victor Rossier
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Clement Train
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Comparative Genomics, Lausanne, Switzerland
| |
Collapse
|
3
|
Hunt M, Hinrichs AS, Anderson D, Karim L, Dearlove BL, Knaggs J, Constantinides B, Fowler PW, Rodger G, Street T, Lumley S, Webster H, Sanderson T, Ruis C, de Maio N, Amenga-Etego LN, Amuzu DSY, Avaro M, Awandare GA, Ayivor-Djanie R, Bashton M, Batty EM, Bediako Y, De Belder D, Benedetti E, Bergthaler A, Boers SA, Campos J, Carr RAA, Cuba F, Dattero ME, Dejnirattisai W, Dilthey A, Duedu KO, Endler L, Engelmann I, Francisco NM, Fuchs J, Gnimpieba EZ, Groc S, Gyamfi J, Heemskerk D, Houwaart T, Hsiao NY, Huska M, Hölzer M, Iranzadeh A, Jarva H, Jeewandara C, Jolly B, Joseph R, Kant R, Ki KKK, Kurkela S, Lappalainen M, Lataretu M, Liu C, Malavige GN, Mashe T, Mongkolsapaya J, Montes B, Molina Mora JA, Morang'a CM, Mvula B, Nagarajan N, Nelson A, Ngoi JM, da Paixão JP, Panning M, Poklepovich T, Quashie PK, Ranasinghe D, Russo M, San JE, Sanderson ND, Scaria V, Screaton G, Sironen T, Sisay A, Smith D, Smura T, Supasa P, Suphavilai C, Swann J, Tegally H, Tegomoh B, Vapalahti O, Walker A, Wilkinson RJ, Williamson C, de Oliveira T, Peto TE, Crook D, Corbett-Detig R, Iqbal Z. Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591666. [PMID: 38746185 PMCID: PMC11092452 DOI: 10.1101/2024.04.29.591666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The SARS-CoV-2 genome occupies a unique place in infection biology - it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https://viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.
Collapse
Affiliation(s)
- Martin Hunt
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Angie S Hinrichs
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA
| | - Daniel Anderson
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
| | - Lily Karim
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA
| | - Bethany L Dearlove
- Institute for Hygiene and Applied Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna 1090, Austria
| | - Jeff Knaggs
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Bede Constantinides
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Philip W Fowler
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Gillian Rodger
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Teresa Street
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK
| | - Sheila Lumley
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Department of Infectious Diseases and Microbiology, John Radcliffe Hospital, Oxford, UK
| | - Hermione Webster
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | | | - Christopher Ruis
- Victor Phillip Dahdaleh Heart & Lung Research Institute, University of Cambridge, Cambridge, UK
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Nicola de Maio
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
| | - Lucas N Amenga-Etego
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Dominic S Y Amuzu
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Martin Avaro
- Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Gordon A Awandare
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Reuben Ayivor-Djanie
- Laboratory for Medical Biotechnology and Biomanufacturing, International Centre for Genetic Engineering and Biotechnology, Tristie, Italy
- Department of Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana
| | - Matthew Bashton
- The Hub for Biotechnology in the Built Environment, Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne, NE1 8ST, UK
| | - Elizabeth M Batty
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand
| | - Yaw Bediako
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Denise De Belder
- Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Estefania Benedetti
- Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Andreas Bergthaler
- Institute for Hygiene and Applied Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna 1090, Austria
| | - Stefan A Boers
- Dept. Medical Microbiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Josefina Campos
- Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Rosina Afua Ampomah Carr
- Department of Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, Ann Arbor, MI, USA
| | - Facundo Cuba
- Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Maria Elena Dattero
- Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Wanwisa Dejnirattisai
- Division of Emerging Infectious Disease, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkoknoi, Bangkok 10700, Thailand
| | - Alexander Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Kwabena Obeng Duedu
- Department of Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana
- College of Life Sciences, Birmingham City University, Birmingham, UK
| | - Lukas Endler
- Institute for Hygiene and Applied Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna 1090, Austria
| | - Ilka Engelmann
- Pathogenesis and Control of Chronic and Emerging Infections, Univ Montpellier, INSERM, Etablissement Français du Sang, Virology Laboratory, CHU Montpellier, Montpellier, France
| | - Ngiambudulu M Francisco
- Grupo de Investigação Microbiana e Imunológica, Instituto Nacional de Investigação em Saúde (National Institute for Health Research), Luanda, Angola
| | - Jonas Fuchs
- Institute of Virology, Freiburg University Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Etienne Z Gnimpieba
- Biomedical Engineering Department, University of South Dakota, Sioux Falls, SD 57107
| | - Soraya Groc
- Virology Laboratory, CHU Montpellier, Montpellier, France
| | - Jones Gyamfi
- Department of Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana
- School of Health and Life Sciences, Teesside University, Middlesbrough, UK
| | - Dennis Heemskerk
- Dept. Medical Microbiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Nei-Yuan Hsiao
- Divison of Medical Virology, University of Cape Town and National Health Laboratory Service
| | - Matthew Huska
- Genome Competence Center (MF1), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
| | - Martin Hölzer
- Genome Competence Center (MF1), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
| | | | - Hanna Jarva
- HUS Diagnostic Center, Clinical Microbiology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Chandima Jeewandara
- Allergy Immunology and Cell Biology Unit, Department of Immunology and Molecular Medicine, University of Sri Jayewardenepura, Nugegoda, Sri Lanka
| | - Bani Jolly
- Karkinos Healthcare Private Limited (KHPL), Aurbis Business Parks, Bellandur, Bengaluru, Karnataka, 560103, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | | | - Ravi Kant
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
- Department of Tropical Parasitology, Institute of Maritime and Tropical Medicine, Medical University of Gdansk, 81-519 Gdynia, Poland
| | | | - Satu Kurkela
- HUS Diagnostic Center, Clinical Microbiology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Maija Lappalainen
- HUS Diagnostic Center, Clinical Microbiology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Marie Lataretu
- Genome Competence Center (MF1), Robert Koch Institute, Nordufer 20, 13353 Berlin, Germany
| | - Chang Liu
- Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Gathsaurie Neelika Malavige
- Allergy Immunology and Cell Biology Unit, Department of Immunology and Molecular Medicine, University of Sri Jayewardenepura, Nugegoda, Sri Lanka
| | - Tapfumanei Mashe
- Health System Strengthening Unit, World Health Organisation, Harare, Zimbabwe
| | - Juthathip Mongkolsapaya
- Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand
- Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | | | - Jose Arturo Molina Mora
- Centro de investigación en Enfermedades Tropicales & Facultad de Microbiología, Universidad de Costa Rica, Costa Rica
| | - Collins M Morang'a
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Bernard Mvula
- Public Health Institute of Malawi, Ministry of Health, Malawi
| | - Niranjan Nagarajan
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Andrew Nelson
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne, NE1 8ST, UK
| | - Joyce M Ngoi
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Joana Paula da Paixão
- Grupo de Investigação Microbiana e Imunológica, Instituto Nacional de Investigação em Saúde (National Institute for Health Research), Luanda, Angola
| | - Marcus Panning
- Institute of Virology, Freiburg University Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Tomas Poklepovich
- Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - Peter K Quashie
- West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana
| | - Diyanath Ranasinghe
- Allergy Immunology and Cell Biology Unit, Department of Immunology and Molecular Medicine, University of Sri Jayewardenepura, Nugegoda, Sri Lanka
| | - Mara Russo
- Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina
| | - James Emmanuel San
- Duke Human Vaccine Institute, Duke University, Durham, NC 27710
- University of KwaZulu Natal, Durban, South Africa, 4001
| | - Nicholas D Sanderson
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK
| | - Vinod Scaria
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
- Vishwanath Cancer Care Foundation (VCCF), Neelkanth Business Park Kirol Village, West Mumbai, Maharashtra, 400086, India
| | - Gavin Screaton
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Tarja Sironen
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
| | - Abay Sisay
- Department of Medical Laboratory Sciences, College of Health Sciences, Addis Ababa University, P.O.Box 1176, Addis Ababa, Ethiopia
| | - Darren Smith
- The Hub for Biotechnology in the Built Environment, Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne, NE1 8ST, UK
| | - Teemu Smura
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
| | - Piyada Supasa
- Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Chayaporn Suphavilai
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore
| | - Jeremy Swann
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Houriiyah Tegally
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, South Africa
| | - Bryan Tegomoh
- Centre de Coordination des Opérations d'Urgences de Santé Publique, Ministere de Sante Publique, Cameroun
- University of California, Berkeley, Berkeley, California, USA
- Nebraska Department of Health and Human Services, Lincoln, Nebraska, USA
| | - Olli Vapalahti
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
| | - Andreas Walker
- Institute of Virology, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Robert J Wilkinson
- Francis Crick Institute, London, UK
- Centre for Infectious Diseases Research in Africa, University of Cape Town
- Imperial College London, UK
| | | | - Tulio de Oliveira
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, South Africa
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), University of KwaZulu-Natal, South Africa
| | - Timothy Ea Peto
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Derrick Crook
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Russell Corbett-Detig
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA
| | - Zamin Iqbal
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
- Milner Centre for Evolution, University of Bath, UK
| |
Collapse
|
4
|
Jaya FR, Brito BP, Darling AE. Evaluation of recombination detection methods for viral sequencing. Virus Evol 2023; 9:vead066. [PMID: 38131005 PMCID: PMC10734630 DOI: 10.1093/ve/vead066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/03/2023] [Accepted: 11/15/2023] [Indexed: 12/23/2023] Open
Abstract
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods (RDMs) have been developed over the past two decades; however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed eight RDMs: PhiPack (Profile), 3SEQ, GENECONV, recombination detection program (RDP) (OpenRDP), MaxChi (OpenRDP), Chimaera (OpenRDP), UCHIME (VSEARCH), and gmos; to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Furthermore, we provide a practical example for the analysis and validation of empirical data. We find that RDMs need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
Collapse
Affiliation(s)
- Frederick R Jaya
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- Ecology and Evolution, Research School of Biology, Australian National University, 134 Linnaeus Way, Acton, Australian Capital Territory 2600, Australia
| | - Barbara P Brito
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- New South Wales Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, Menangle, New South Wales 2568, Australia
| | - Aaron E Darling
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- Illumina Australia Pty Ltd, Ultimo, New South Wales 2007, Australia
| |
Collapse
|
5
|
Li X, Trovão NS, Wertheim JO, Baele G, de Bernardi Schneider A. Optimizing ancestral trait reconstruction of large HIV Subtype C datasets through multiple-trait subsampling. Virus Evol 2023; 9:vead069. [PMID: 38046219 PMCID: PMC10691791 DOI: 10.1093/ve/vead069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/05/2023] Open
Abstract
Large datasets along with sampling bias represent a challenge for phylodynamic reconstructions, particularly when the study data are obtained from various heterogeneous sources and/or through convenience sampling. In this study, we evaluate the presence of unbalanced sampled distribution by collection date, location, and risk group of human immunodeficiency virus Type 1 Subtype C using a comprehensive subsampling strategy and assess their impact on the reconstruction of the viral spatial and risk group dynamics using phylogenetic comparative methods. Our study shows that a most suitable dataset for ancestral trait reconstruction can be obtained through subsampling by all available traits, particularly using multigene datasets. We also demonstrate that sampling bias is inflated when considerable information for a given trait is unavailable or of poor quality, as we observed for the trait risk group. In conclusion, we suggest that, even if traits are not well recorded, including them deliberately optimizes the representativeness of the original dataset rather than completely excluding them. Therefore, we advise the inclusion of as many traits as possible with the aid of subsampling approaches in order to optimize the dataset for phylodynamic analysis while reducing the computational burden. This will benefit research communities investigating the evolutionary and spatio-temporal patterns of infectious diseases.
Collapse
Affiliation(s)
| | - Nídia S Trovão
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, 31 Center Dr, Bethesda, MA 20892, USA
| | - Joel O Wertheim
- Department of Medicine, University of California, La Jolla, San Diego, CA 92093, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven BE-3000, Belgium
| | - Adriano de Bernardi Schneider
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Ningbo No.2 Hospital, Ningbo 315010, China
- Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo 315000, China
| |
Collapse
|
6
|
Kramer AM, Thornlow B, Ye C, De Maio N, McBroome J, Hinrichs AS, Lanfear R, Turakhia Y, Corbett-Detig R. Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations. Syst Biol 2023; 72:1039-1051. [PMID: 37232476 PMCID: PMC10627557 DOI: 10.1093/sysbio/syad031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 05/14/2023] [Accepted: 06/22/2023] [Indexed: 05/27/2023] Open
Abstract
Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.
Collapse
Affiliation(s)
- Alexander M Kramer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cheng Ye
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
7
|
Bloom JD, Neher RA. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evol 2023; 9:vead055. [PMID: 37727875 PMCID: PMC10506532 DOI: 10.1093/ve/vead055] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/08/2023] [Accepted: 08/22/2023] [Indexed: 09/21/2023] Open
Abstract
Knowledge of the fitness effects of mutations to SARS-CoV-2 can inform assessment of new variants, design of therapeutics resistant to escape, and understanding of the functions of viral proteins. However, experimentally measuring effects of mutations is challenging: we lack tractable lab assays for many SARS-CoV-2 proteins, and comprehensive deep mutational scanning has been applied to only two SARS-CoV-2 proteins. Here, we develop an approach that leverages millions of publicly available SARS-CoV-2 sequences to estimate effects of mutations. We first calculate how many independent occurrences of each mutation are expected to be observed along the SARS-CoV-2 phylogeny in the absence of selection. We then compare these expected observations to the actual observations to estimate the effect of each mutation. These estimates correlate well with deep mutational scanning measurements. For most genes, synonymous mutations are nearly neutral, stop-codon mutations are deleterious, and amino acid mutations have a range of effects. However, some viral accessory proteins are under little to no selection. We provide interactive visualizations of effects of mutations to all SARS-CoV-2 proteins (https://jbloomlab.github.io/SARS2-mut-fitness/). The framework we describe is applicable to any virus for which the number of available sequences is sufficiently large that many independent occurrences of each neutral mutation are observed.
Collapse
Affiliation(s)
- Jesse D Bloom
- Basic Sciences and Computational Biology, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA 98109, USA
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, 1100 Fairview Ave N, Seattle, WA 98109, USA
| | - Richard A Neher
- Biozentrum, University of Basel, Spitalstrasse 41, Basel 4056, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerl
| |
Collapse
|
8
|
Reichmuth ML, Hodcroft EB, Althaus CL. Importation of Alpha and Delta variants during the SARS-CoV-2 epidemic in Switzerland: Phylogenetic analysis and intervention scenarios. PLoS Pathog 2023; 19:e1011553. [PMID: 37561788 PMCID: PMC10443857 DOI: 10.1371/journal.ppat.1011553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/22/2023] [Accepted: 07/11/2023] [Indexed: 08/12/2023] Open
Abstract
The SARS-CoV-2 pandemic has led to the emergence of various variants of concern (VoCs) that are associated with increased transmissibility, immune evasion, or differences in disease severity. The emergence of VoCs fueled interest in understanding the potential impact of travel restrictions and surveillance strategies to prevent or delay the early spread of VoCs. We performed phylogenetic analyses and mathematical modeling to study the importation and spread of the VoCs Alpha and Delta in Switzerland in 2020 and 2021. Using a phylogenetic approach, we estimated between 383-1,038 imports of Alpha and 455-1,347 imports of Delta into Switzerland. We then used the results from the phylogenetic analysis to parameterize a dynamic transmission model that accurately described the subsequent spread of Alpha and Delta. We modeled different counterfactual intervention scenarios to quantify the potential impact of border closures and surveillance of travelers on the spread of Alpha and Delta. We found that implementing border closures after the announcement of VoCs would have been of limited impact to mitigate the spread of VoCs. In contrast, increased surveillance of travelers could prove to be an effective measure for delaying the spread of VoCs in situations where their severity remains unclear. Our study shows how phylogenetic analysis in combination with dynamic transmission models can be used to estimate the number of imported SARS-CoV-2 variants and the potential impact of different intervention scenarios to inform the public health response during the pandemic.
Collapse
Affiliation(s)
- Martina L. Reichmuth
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
| | - Emma B. Hodcroft
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Multidisciplinary Center for Infectious Diseases, University of Bern, Bern, Switzerland
| | - Christian L. Althaus
- Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland
- Multidisciplinary Center for Infectious Diseases, University of Bern, Bern, Switzerland
| |
Collapse
|
9
|
Gupta A, Basu R, Bashyam MD. Assessing the evolution of SARS-CoV-2 lineages and the dynamic associations between nucleotide variations. Access Microbiol 2023; 5:acmi000513.v3. [PMID: 37601437 PMCID: PMC10436015 DOI: 10.1099/acmi.0.000513.v3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 02/20/2023] [Indexed: 08/22/2023] Open
Abstract
Despite seminal advances towards understanding the infection mechanism of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), it continues to cause significant morbidity and mortality worldwide. Though mass immunization programmes have been implemented in several countries, the viral transmission cycle has shown a continuous progression in the form of multiple waves. A constant change in the frequencies of dominant viral lineages, arising from the accumulation of nucleotide variations (NVs) through favourable selection, is understandably expected to be a major determinant of disease severity and possible vaccine escape. Indeed, worldwide efforts have been initiated to identify specific virus lineage(s) and/or NVs that may cause a severe clinical presentation or facilitate vaccination breakthrough. Since host genetics is expected to play a major role in shaping virus evolution, it is imperative to study the role of genome-wide SARS-CoV-2 NVs across various populations. In the current study, we analysed the whole genome sequence of 3543 SARS-CoV-2-infected samples obtained from the state of Telangana, India (including 210 from our previous study), collected over an extended period from April 2020 to October 2021. We present a unique perspective on the evolution of prevalent virus lineages and NVs during this period. We also highlight the presence of specific NVs likely to be associated favourably with samples classified as vaccination breakthroughs. Finally, we report genome-wide intra-host variations at novel genomic positions. The results presented here provide critical insights into virus evolution over an extended period and pave the way to rigorously investigate the role of specific NVs in vaccination breakthroughs.
Collapse
Affiliation(s)
- Asmita Gupta
- Laboratory of Molecular Oncology, Centre of DNA Fingerprinting and Diagnostics, Hyderabad, India
| | - Reelina Basu
- Laboratory of Molecular Oncology, Centre of DNA Fingerprinting and Diagnostics, Hyderabad, India
| | - Murali Dharan Bashyam
- Laboratory of Molecular Oncology, Centre of DNA Fingerprinting and Diagnostics, Hyderabad, India
| |
Collapse
|
10
|
Bloom JD, Neher RA. Fitness effects of mutations to SARS-CoV-2 proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.30.526314. [PMID: 36778462 PMCID: PMC9915511 DOI: 10.1101/2023.01.30.526314] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Knowledge of the fitness effects of mutations to SARS-CoV-2 can inform assessment of new variants, design of therapeutics resistant to escape, and understanding of the functions of viral proteins. However, experimentally measuring effects of mutations is challenging: we lack tractable lab assays for many SARS-CoV-2 proteins, and comprehensive deep mutational scanning has been applied to only two SARS-CoV-2 proteins. Here we develop an approach that leverages millions of publicly available SARS-CoV-2 sequences to estimate effects of mutations. We first calculate how many independent occurrences of each mutation are expected to be observed along the SARS-CoV-2 phylogeny in the absence of selection. We then compare these expected observations to the actual observations to estimate the effect of each mutation. These estimates correlate well with deep mutational scanning measurements. For most genes, synonymous mutations are nearly neutral, stop-codon mutations are deleterious, and amino-acid mutations have a range of effects. However, some viral accessory proteins are under little to no selection. We provide interactive visualizations of effects of mutations to all SARS-CoV-2 proteins (https://jbloomlab.github.io/SARS2-mut-fitness/). The framework we describe is applicable to any virus for which the number of available sequences is sufficiently large that many independent occurrences of each neutral mutation are observed.
Collapse
Affiliation(s)
- Jesse D. Bloom
- Basic Sciences and Computational Biology, Fred Hutchinson Cancer Center
- Department of Genome Sciences, University of Washington
- Howard Hughes Medical Institute
| | - Richard A. Neher
- Biozentrum, University of Basel
- Swiss Institute of Bioinformatics
| |
Collapse
|
11
|
De Maio N, Kalaghatgi P, Turakhia Y, Corbett-Detig R, Minh BQ, Goldman N. Maximum likelihood pandemic-scale phylogenetics. Nat Genet 2023; 55:746-752. [PMID: 37038003 PMCID: PMC10181937 DOI: 10.1038/s41588-023-01368-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 03/07/2023] [Indexed: 04/12/2023]
Abstract
Phylogenetics has a crucial role in genomic epidemiology. Enabled by unparalleled volumes of genome sequence data generated to study and help contain the COVID-19 pandemic, phylogenetic analyses of SARS-CoV-2 genomes have shed light on the virus's origins, spread, and the emergence and reproductive success of new variants. However, most phylogenetic approaches, including maximum likelihood and Bayesian methods, cannot scale to the size of the datasets from the current pandemic. We present 'MAximum Parsimonious Likelihood Estimation' (MAPLE), an approach for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. MAPLE infers SARS-CoV-2 phylogenies more accurately than existing maximum likelihood approaches while running up to thousands of times faster, and requiring at least 100 times less memory on large datasets. This extends the reach of genomic epidemiology, allowing the continued use of accurate phylogenetic, phylogeographic and phylodynamic analyses on datasets of millions of genomes.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK.
| | | | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Bui Quang Minh
- School of Computing, College of Engineering, Computing and Cybernetics, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| |
Collapse
|
12
|
Bloom JD, Beichman AC, Neher RA, Harris K. Evolution of the SARS-CoV-2 Mutational Spectrum. Mol Biol Evol 2023; 40:msad085. [PMID: 37039557 PMCID: PMC10124870 DOI: 10.1093/molbev/msad085] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/07/2023] [Accepted: 04/06/2023] [Indexed: 04/12/2023] Open
Abstract
SARS-CoV-2 evolves rapidly in part because of its high mutation rate. Here, we examine whether this mutational process itself has changed during viral evolution. To do this, we quantify the relative rates of different types of single-nucleotide mutations at 4-fold degenerate sites in the viral genome across millions of human SARS-CoV-2 sequences. We find clear shifts in the relative rates of several types of mutations during SARS-CoV-2 evolution. The most striking trend is a roughly 2-fold decrease in the relative rate of G→T mutations in Omicron versus early clades, as was recently noted by Ruis et al. (2022. Mutational spectra distinguish SARS-CoV-2 replication niches. bioRxiv, doi:10.1101/2022.09.27.509649). There is also a decrease in the relative rate of C→T mutations in Delta, and other subtle changes in the mutation spectrum along the phylogeny. We speculate that these changes in the mutation spectrum could arise from viral mutations that affect genome replication, packaging, and antagonization of host innate-immune factors, although environmental factors could also play a role. Interestingly, the mutation spectrum of Omicron is more similar than that of earlier SARS-CoV-2 clades to the spectrum that shaped the long-term evolution of sarbecoviruses. Overall, our work shows that the mutation process is itself a dynamic variable during SARS-CoV-2 evolution and suggests that human SARS-CoV-2 may be trending toward a mutation spectrum more similar to that of other animal sarbecoviruses.
Collapse
Affiliation(s)
- Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA
- Department of Genome Sciences, University of Washington, Seattle, WA
- Howard Hughes Medical Institute, Seattle, WA
| | | | - Richard A Neher
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Kelley Harris
- Department of Genome Sciences, University of Washington, Seattle, WA
| |
Collapse
|
13
|
Dadonaite B, Crawford KHD, Radford CE, Farrell AG, Yu TC, Hannon WW, Zhou P, Andrabi R, Burton DR, Liu L, Ho DD, Chu HY, Neher RA, Bloom JD. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell 2023; 186:1263-1278.e20. [PMID: 36868218 PMCID: PMC9922669 DOI: 10.1016/j.cell.2023.02.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 01/11/2023] [Accepted: 01/31/2023] [Indexed: 02/15/2023]
Abstract
A major challenge in understanding SARS-CoV-2 evolution is interpreting the antigenic and functional effects of emerging mutations in the viral spike protein. Here, we describe a deep mutational scanning platform based on non-replicative pseudotyped lentiviruses that directly quantifies how large numbers of spike mutations impact antibody neutralization and pseudovirus infection. We apply this platform to produce libraries of the Omicron BA.1 and Delta spikes. These libraries each contain ∼7,000 distinct amino acid mutations in the context of up to ∼135,000 unique mutation combinations. We use these libraries to map escape mutations from neutralizing antibodies targeting the receptor-binding domain, N-terminal domain, and S2 subunit of spike. Overall, this work establishes a high-throughput and safe approach to measure how ∼105 combinations of mutations affect antibody neutralization and spike-mediated infection. Notably, the platform described here can be extended to the entry proteins of many other viruses.
Collapse
Affiliation(s)
- Bernadeta Dadonaite
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Katharine H D Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, WA 98109, USA
| | - Caelan E Radford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98109, USA
| | - Ariana G Farrell
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timothy C Yu
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98109, USA
| | - William W Hannon
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA 98109, USA
| | - Panpan Zhou
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA; IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA; Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Raiees Andrabi
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA; IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA; Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Dennis R Burton
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA; IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA; Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA; Ragon Institute of Massachusetts General Hospital, MIT & Harvard, Cambridge, MA 02139, USA
| | - Lihong Liu
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - David D Ho
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA; Department of Microbiology and Immunology, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA; Division of Infectious Diseases, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
| | - Helen Y Chu
- University of Washington, Department of Medicine, Division of Allergy and Infectious Diseases, Seattle, WA, USA
| | - Richard A Neher
- Biozentrum, University of Basel, Basel, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98195, USA.
| |
Collapse
|
14
|
Intragenomic rearrangements involving 5'-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses. Virol J 2023; 20:36. [PMID: 36829234 PMCID: PMC9957694 DOI: 10.1186/s12985-023-01998-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 02/21/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Variation of the betacoronavirus SARS-CoV-2 has been the bane of COVID-19 control. Documented variation includes point mutations, deletions, insertions, and recombination among closely or distantly related coronaviruses. Here, we describe yet another aspect of genome variation by beta- and alphacoronaviruses that was first documented in an infectious isolate of the betacoronavirus SARS-CoV-2, obtained from 3 patients in Hong Kong that had a 5'-untranslated region segment at the end of the ORF6 gene that in its new location translated into an ORF6 protein with a predicted modified carboxyl terminus. While comparing the amino acid sequences of translated ORF8 genes in the GenBank database, we found a subsegment of the same 5'-UTR-derived amino acid sequence modifying the distal end of ORF8 of an isolate from the United States and decided to carry out a systematic search. METHODS Using the nucleotide and in the case of SARS-CoV-2 also the translated amino acid sequence in three reading frames of the genomic termini of coronaviruses as query sequences, we searched for 5'-UTR sequences in regions other than the 5'-UTR in SARS-CoV-2 and reference strains of alpha-, beta-, gamma-, and delta-coronaviruses. RESULTS We here report numerous genomic insertions of 5'-untranslated region sequences into coding regions of SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, but not delta- or gammacoronaviruses. To our knowledge this is the first systematic description of such insertions. In many cases, these insertions would change viral protein sequences and further foster genomic flexibility and viral adaptability through insertion of transcription regulatory sequences in novel positions within the genome. Among human Embecorivus betacoronaviruses, for instance, from 65% to all of the surveyed sequences in publicly available databases contain inserted 5'-UTR sequences. CONCLUSION The intragenomic rearrangements involving 5'-untranslated region sequences described here, which in several cases affect highly conserved genes with a low propensity for recombination, may underlie the generation of variants homotypic with those of concern or interest and with potentially differing pathogenic profiles. Intragenomic rearrangements thus add to our appreciation of how variants of SARS-CoV-2 and other beta- and alphacoronaviruses may arise.
Collapse
|
15
|
Saldivar-Espinoza B, Macip G, Garcia-Segura P, Mestres-Truyol J, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks. Int J Mol Sci 2022; 23:ijms232314683. [PMID: 36499005 PMCID: PMC9736107 DOI: 10.3390/ijms232314683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/18/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022] Open
Abstract
Predicting SARS-CoV-2 mutations is difficult, but predicting recurrent mutations driven by the host, such as those caused by host deaminases, is feasible. We used machine learning to predict which positions from the SARS-CoV-2 genome will hold a recurrent mutation and which mutations will be the most recurrent. We used data from April 2021 that we separated into three sets: a training set, a validation set, and an independent test set. For the test set, we obtained a specificity value of 0.69, a sensitivity value of 0.79, and an Area Under the Curve (AUC) of 0.8, showing that the prediction of recurrent SARS-CoV-2 mutations is feasible. Subsequently, we compared our predictions with updated data from January 2022, showing that some of the false positives in our prediction model become true positives later on. The most important variables detected by the model's Shapley Additive exPlanation (SHAP) are the nucleotide that mutates and RNA reactivity. This is consistent with the SARS-CoV-2 mutational bias pattern and the preference of some host deaminases for specific sequences and RNA secondary structures. We extend our investigation by analyzing the mutations from the variants of concern Alpha, Beta, Delta, Gamma, and Omicron. Finally, we analyzed amino acid changes by looking at the predicted recurrent mutations in the M-pro and spike proteins.
Collapse
Affiliation(s)
- Bryan Saldivar-Espinoza
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Guillem Macip
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Pol Garcia-Segura
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Júlia Mestres-Truyol
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Pere Puigbò
- Department of Biology, University of Turku, 20500 Turku, Finland
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, 43007 Tarragona, Spain
- Nutrition and Health Unit, Eurecat Technology Centre of Catalonia, 43204 Reus, Spain
| | - Adrià Cereto-Massagué
- EURECAT Centre Tecnològic de Catalunya, Centre for Omic Sciences (COS), Joint Unit Universitat Rovira i Virgili-EURECAT, Unique Scientific and Technical Infrastructures (ICTS), 43204 Reus, Spain
| | - Gerard Pujadas
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Santiago Garcia-Vallve
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
- Correspondence:
| |
Collapse
|
16
|
Bloom JD, Beichman AC, Neher RA, Harris K. Evolution of the SARS-CoV-2 mutational spectrum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.11.19.517207. [PMID: 36451887 PMCID: PMC9709787 DOI: 10.1101/2022.11.19.517207] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
SARS-CoV-2 evolves rapidly in part because of its high mutation rate. Here we examine whether this mutational process itself has changed during viral evolution. To do this, we quantify the relative rates of different types of single nucleotide mutations at four-fold degenerate sites in the viral genome across millions of human SARS-CoV-2 sequences. We find clear shifts in the relative rates of several types of mutations during SARS-CoV-2 evolution. The most striking trend is a roughly two-fold decrease in the relative rate of G→T mutations in Omicron versus early clades, as was recently noted by Ruis et al (2022). There is also a decrease in the relative rate of C→T mutations in Delta, and other subtle changes in the mutation spectrum along the phylogeny. We speculate that these changes in the mutation spectrum could arise from viral mutations that affect genome replication, packaging, and antagonization of host innate-immune factors-although environmental factors could also play a role. Interestingly, the mutation spectrum of Omicron is more similar than that of earlier SARS-CoV-2 clades to the spectrum that shaped the long-term evolution of sarbecoviruses. Overall, our work shows that the mutation process is itself a dynamic variable during SARS-CoV-2 evolution, and suggests that human SARS-CoV-2 may be trending towards a mutation spectrum more similar to that of other animal sarbecoviruses.
Collapse
Affiliation(s)
- Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, USA
- Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, Washington, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Annabel C Beichman
- Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, Washington, USA
| | - Richard A Neher
- Biozentrum, University of Basel, Basel, Switzerland, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
17
|
Dadonaite B, Crawford KHD, Radford CE, Farrell AG, Yu TC, Hannon WW, Zhou P, Andrabi R, Burton DR, Liu L, Ho DD, Neher RA, Bloom JD. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.10.13.512056. [PMID: 36263061 PMCID: PMC9580381 DOI: 10.1101/2022.10.13.512056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A major challenge in understanding SARS-CoV-2 evolution is interpreting the antigenic and functional effects of emerging mutations in the viral spike protein. Here we describe a new deep mutational scanning platform based on non-replicative pseudotyped lentiviruses that directly quantifies how large numbers of spike mutations impact antibody neutralization and pseudovirus infection. We demonstrate this new platform by making libraries of the Omicron BA.1 and Delta spikes. These libraries each contain ~7000 distinct amino-acid mutations in the context of up to ~135,000 unique mutation combinations. We use these libraries to map escape mutations from neutralizing antibodies targeting the receptor binding domain, N-terminal domain, and S2 subunit of spike. Overall, this work establishes a high-throughput and safe approach to measure how ~10 5 combinations of mutations affect antibody neutralization and spike-mediated infection. Notably, the platform described here can be extended to the entry proteins of many other viruses.
Collapse
Affiliation(s)
- Bernadeta Dadonaite
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Katharine H D Crawford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Department of Genome Sciences & Medical Scientist Training Program, University of Washington, Seattle, Washington, 98109, USA
| | - Caelan E Radford
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington, 98109, USA
| | - Ariana G Farrell
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
| | - Timothy C Yu
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington, 98109, USA
| | - William W Hannon
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, and Basic Sciences Division, Fred Hutch Cancer Center, Seattle, Washington, 98109, USA
| | - Panpan Zhou
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Raiees Andrabi
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Dennis R Burton
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037, USA
- Consortium for HIV/AIDS Vaccine Development (CHAVD), The Scripps Research Institute, La Jolla, CA 92037, USA
- Ragon Institute of MGH, MIT & Harvard, Cambridge, MA 02139, USA
| | - Lihong Liu
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - David D. Ho
- Aaron Diamond AIDS Research Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
- Department of Microbiology and Immunology, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
- Division of Infectious Diseases, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY 10032, USA
| | - Richard A. Neher
- Biozentrum, University of Basel, Basel, Switzerland, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, 98109, USA
- Howard Hughes Medical Institute, Seattle, WA, 98195, USA
| |
Collapse
|
18
|
Basile K, Rockett RJ, McPhie K, Fennell M, Johnson-Mackinnon J, Agius JE, Fong W, Rahman H, Ko D, Donavan L, Hueston L, Lam C, Arnott A, Chen SCA, Maddocks S, O’Sullivan MV, Dwyer DE, Sintchenko V, Kok J. Improved Neutralisation of the SARS-CoV-2 Omicron Variant following a Booster Dose of Pfizer-BioNTech (BNT162b2) COVID-19 Vaccine. Viruses 2022; 14:v14092023. [PMID: 36146829 PMCID: PMC9501619 DOI: 10.3390/v14092023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/02/2022] [Accepted: 09/08/2022] [Indexed: 11/20/2022] Open
Abstract
In late November 2021, the World Health Organization declared the SARS-CoV-2 lineage B.1.1.529 the fifth variant of concern, Omicron. This variant has acquired over 30 mutations in the spike protein (with 15 in the receptor-binding domain), raising concerns that Omicron could evade naturally acquired and vaccine-derived immunity. We utilized an authentic virus, multicycle neutralisation assay to demonstrate that sera collected one, three, and six months post-two doses of Pfizer-BioNTech BNT162b2 had a limited ability to neutralise SARS-CoV-2. However, four weeks after a third dose, neutralising antibody titres were boosted. Despite this increase, neutralising antibody titres were reduced fourfold for Omicron compared to lineage A.2.2 SARS-CoV-2.
Collapse
Affiliation(s)
- Kerri Basile
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Correspondence: (K.B.); (J.K.)
| | - Rebecca J. Rockett
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
- Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, NSW 2006, Australia
| | - Kenneth McPhie
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- The Westmead Institute for Medical Research, Westmead, Sydney, NSW 2145, Australia
| | - Michael Fennell
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
| | - Jessica Johnson-Mackinnon
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
| | - Jessica E. Agius
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
| | - Winkie Fong
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
| | - Hossinur Rahman
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
| | - Danny Ko
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
| | - Linda Donavan
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
| | - Linda Hueston
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Menzies Health Institute Queensland, Griffith University, Brisbane, QLD 4222, Australia
| | - Connie Lam
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
| | - Alicia Arnott
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
| | - Sharon C.-A. Chen
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
- Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, NSW 2006, Australia
| | - Susan Maddocks
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
| | - Matthew V. O’Sullivan
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
- Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, NSW 2006, Australia
| | - Dominic E. Dwyer
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
- Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, NSW 2006, Australia
| | - Vitali Sintchenko
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
- Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, NSW 2006, Australia
| | - Jen Kok
- Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology, Institute of Clinical Pathology and Medical Research, Westmead Hospital, Westmead, Sydney, NSW 2145, Australia
- Centre for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW 2145, Australia
- Correspondence: (K.B.); (J.K.)
| |
Collapse
|
19
|
Turakhia Y, Thornlow B, Hinrichs A, McBroome J, Ayala N, Ye C, Smith K, De Maio N, Haussler D, Lanfear R, Corbett-Detig R. Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape. Nature 2022; 609:994-997. [PMID: 35952714 PMCID: PMC9519458 DOI: 10.1038/s41586-022-05189-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 08/03/2022] [Indexed: 11/29/2022]
Abstract
Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses1-4. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution5. Here, we use a new phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. In a 1.6 million sample tree from May 2021, we identify 589 recombination events, which indicate that around 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Recombination breakpoints are inferred to occur disproportionately in the 3' portion of the genome that contains the spike protein. Our results highlight the need for timely analyses of recombination for pinpointing the emergence of recombinant lineages with the potential to increase transmissibility or virulence of the virus. We anticipate that this approach will empower comprehensive real-time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.
Collapse
Affiliation(s)
- Yatish Turakhia
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, USA.
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Angie Hinrichs
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nicolas Ayala
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Cheng Ye
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, USA
| | - Kyle Smith
- Department of Biological Sciences, University of California, San Diego, San Diego, CA, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, UK
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
20
|
Attwood SW, Hill SC, Aanensen DM, Connor TR, Pybus OG. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nat Rev Genet 2022; 23:547-562. [PMID: 35459859 PMCID: PMC9028907 DOI: 10.1038/s41576-022-00483-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2022] [Indexed: 01/05/2023]
Abstract
Determining the transmissibility, prevalence and patterns of movement of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections is central to our understanding of the impact of the pandemic and to the design of effective control strategies. Phylogenies (evolutionary trees) have provided key insights into the international spread of SARS-CoV-2 and enabled investigation of individual outbreaks and transmission chains in specific settings. Phylodynamic approaches combine evolutionary, demographic and epidemiological concepts and have helped track virus genetic changes, identify emerging variants and inform public health strategy. Here, we review and synthesize studies that illustrate how phylogenetic and phylodynamic techniques were applied during the first year of the pandemic, and summarize their contributions to our understanding of SARS-CoV-2 transmission and control.
Collapse
Affiliation(s)
- Stephen W Attwood
- Department of Zoology, University of Oxford, Oxford, UK.
- Pathogen Genomics Unit, Public Health Wales NHS Trust, Cardiff, UK.
| | - Sarah C Hill
- Department of Pathobiology and Population Sciences, Royal Veterinary College, University of London, London, UK
| | - David M Aanensen
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Thomas R Connor
- Pathogen Genomics Unit, Public Health Wales NHS Trust, Cardiff, UK
- School of Biosciences, Cardiff University, Cardiff, UK
| | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, UK.
- Department of Pathobiology and Population Sciences, Royal Veterinary College, University of London, London, UK.
| |
Collapse
|
21
|
Merhi G, Trotter AJ, de Oliveira Martins L, Koweyes J, Le-Viet T, Abou Naja H, Al Buaini M, Prosolek SJ, Alikhan NF, Lott M, Tohmeh T, Badran B, Jupp OJ, Gardner S, Felgate MW, Makin KA, Wilkinson JM, Stanley R, Sesay AK, Webber MA, Davidson RK, Ghosn N, Pallen M, Hasan H, Page AJ, Tokajian S. Replacement of the Alpha variant of SARS-CoV-2 by the Delta variant in Lebanon between April and June 2021. Microb Genom 2022; 8. [PMID: 35876490 PMCID: PMC9455693 DOI: 10.1099/mgen.0.000838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
The COVID-19 pandemic continues to expand globally, with case numbers rising in many areas of the world, including the Eastern Mediterranean Region. Lebanon experienced its largest wave of COVID-19 infections from January to April 2021. Limited genomic surveillance was undertaken, with just 26 SARS-CoV-2 genomes available for this period, nine of which were from travellers from Lebanon detected by other countries. Additional genome sequencing is thus needed to allow surveillance of variants in circulation. In total, 905 SARS-CoV-2 genomes were sequenced using the ARTIC protocol. The genomes were derived from SARS-CoV-2-positive samples, selected retrospectively from the sentinel COVID-19 surveillance network, to capture diversity of location, sampling time, sex, nationality and age. Although 16 PANGO lineages were circulating in Lebanon in January 2021, by February there were just four, with the Alpha variant accounting for 97 % of samples. In the following 2 months, all samples contained the Alpha variant. However, this had changed dramatically by June and July 2021, when all samples belonged to the Delta variant. This study documents a ten-fold increase in the number of SARS-CoV-2 genomes available from Lebanon. The Alpha variant, first detected in the UK, rapidly swept through Lebanon, causing the country's largest wave to date, which peaked in January 2021. The Alpha variant was introduced to Lebanon multiple times despite travel restrictions, but the source of these introductions remains uncertain. The Delta variant was detected in Gambia in travellers from Lebanon in mid-May, suggesting community transmission in Lebanon several weeks before this variant was detected in the country. Prospective sequencing in June/July 2021 showed that the Delta variant had completely replaced the Alpha variant in under 6 weeks.
Collapse
Affiliation(s)
- Georgi Merhi
- Department of Natural Sciences, Lebanese American University, School of Arts and Sciences, Byblos, Lebanon
| | | | | | - Jad Koweyes
- Department of Natural Sciences, Lebanese American University, School of Arts and Sciences, Byblos, Lebanon
| | - Thanh Le-Viet
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - Hala Abou Naja
- Ministry of Public Health, Epidemiological Surveillance Program, Museum Square, Beirut, Lebanon
| | - Mona Al Buaini
- National Influenza Centre Research Laboratory, Rafic Hariri University Hospital, Beirut, Lebanon
| | - Sophie J Prosolek
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | | | - Martin Lott
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - Tatiana Tohmeh
- Ministry of Public Health, Epidemiological Surveillance Program, Museum Square, Beirut, Lebanon
| | - Bassam Badran
- Laboratory of Molecular Biology and Cancer Immunology, Faculty of Sciences, Lebanese University, Lebanon
| | - Orla J Jupp
- University of East Anglia, Norwich, Norfolk, UK
| | | | | | | | | | - Rachael Stanley
- Norfolk and Norwich University Hospital, Norwich, Norfolk, UK
| | | | - Mark A Webber
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK.,University of East Anglia, Norwich, Norfolk, UK
| | | | - Nada Ghosn
- Ministry of Public Health, Epidemiological Surveillance Program, Museum Square, Beirut, Lebanon
| | - Mark Pallen
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK.,University of East Anglia, Norwich, Norfolk, UK
| | | | - Andrew J Page
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - Sima Tokajian
- Department of Natural Sciences, Lebanese American University, School of Arts and Sciences, Byblos, Lebanon
| |
Collapse
|
22
|
Alisoltani A, Jaroszewski L, Iyer M, Iranzadeh A, Godzik A. Increased Frequency of Indels in Hypervariable Regions of SARS-CoV-2 Proteins—A Possible Signature of Adaptive Selection. Front Genet 2022; 13:875406. [PMID: 35719386 PMCID: PMC9201826 DOI: 10.3389/fgene.2022.875406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 04/08/2022] [Indexed: 11/29/2022] Open
Abstract
Most attention in the surveillance of evolving SARS-CoV-2 genome has been centered on nucleotide substitutions in the spike glycoprotein. We show that, as the pandemic extends into its second year, the numbers and ratio of genomes with in-frame insertions and deletions (indels) increases significantly, especially among the variants of concern (VOCs). Monitoring of the SARS-CoV-2 genome evolution shows that co-occurrence (i.e., highly correlated presence) of indels, especially deletions on spike N-terminal domain and non-structural protein 6 (NSP6) is a shared feature in several VOCs such as Alpha, Beta, Delta, and Omicron. Indels distribution is correlated with spike mutations associated with immune escape and growth in the number of genomes with indels coincides with the increasing population resistance due to vaccination and previous infections. Indels occur most frequently in the spike, but also in other proteins, especially those involved in interactions with the host immune system. We also showed that indels concentrate in regions of individual SARS-CoV-2 proteins known as hypervariable regions (HVRs) that are mostly located in specific loop regions. Structural analysis suggests that indels remodel viral proteins’ surfaces at common epitopes and interaction interfaces, affecting the virus’ interactions with host proteins. We hypothesize that the increased frequency of indels, the non-random distribution of them and their independent co-occurrence in several VOCs is another mechanism of response to elevated global population immunity.
Collapse
Affiliation(s)
- Arghavan Alisoltani
- Biosciences Division, School of Medicine, University of California, Riverside, Riverside, CA, United States
| | - Lukasz Jaroszewski
- Biosciences Division, School of Medicine, University of California, Riverside, Riverside, CA, United States
| | - Mallika Iyer
- Graduate School of Biomedical Sciences, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, United States
| | - Arash Iranzadeh
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Adam Godzik
- Biosciences Division, School of Medicine, University of California, Riverside, Riverside, CA, United States
- *Correspondence: Adam Godzik,
| |
Collapse
|
23
|
Rockett RJ, Draper J, Gall M, Sim EM, Arnott A, Agius JE, Johnson-Mackinnon J, Fong W, Martinez E, Drew AP, Lee C, Ngo C, Ramsperger M, Ginn AN, Wang Q, Fennell M, Ko D, Hueston L, Kairaitis L, Holmes EC, O'Sullivan MN, Chen SCA, Kok J, Dwyer DE, Sintchenko V. Co-infection with SARS-CoV-2 Omicron and Delta variants revealed by genomic surveillance. Nat Commun 2022; 13:2745. [PMID: 35585202 PMCID: PMC9117272 DOI: 10.1038/s41467-022-30518-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/21/2022] [Indexed: 12/02/2022] Open
Abstract
Co-infections with different variants of SARS-CoV-2 are a key precursor to recombination events that are likely to drive SARS-CoV-2 evolution. Rapid identification of such co-infections is required to determine their frequency in the community, particularly in populations at-risk of severe COVID-19, which have already been identified as incubators for punctuated evolutionary events. However, limited data and tools are currently available to detect and characterise the SARS-CoV-2 co-infections associated with recognised variants of concern. Here we describe co-infection with the SARS-CoV-2 variants of concern Omicron and Delta in two epidemiologically unrelated adult patients with chronic kidney disease requiring maintenance haemodialysis. Both variants were co-circulating in the community at the time of detection. Genomic surveillance based on amplicon- and probe-based sequencing using short- and long-read technologies identified and quantified subpopulations of Delta and Omicron viruses in respiratory samples. These findings highlight the importance of integrated genomic surveillance in vulnerable populations and provide diagnostic pathways to recognise SARS-CoV-2 co-infection using genomic data. Here, using genomic approaches, Rockett et al. identify Omicron and Delta SARS-CoV-2 co-infections in two adults, highlighting the usefulness of genomic surveillance for the timely recognition of co-infections in situations when different variants of the virus are circulating in the community.
Collapse
Affiliation(s)
- Rebecca J Rockett
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia
| | - Jenny Draper
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Mailie Gall
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Eby M Sim
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Alicia Arnott
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Jessica E Agius
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia
| | - Jessica Johnson-Mackinnon
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia
| | - Winkie Fong
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia
| | - Elena Martinez
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Alexander P Drew
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Clement Lee
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Christine Ngo
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Marc Ramsperger
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Andrew N Ginn
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Qinning Wang
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Michael Fennell
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Danny Ko
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Linda Hueston
- Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Lukas Kairaitis
- Renal Services Blacktown Hospital, Western Sydney Local Health District, Sydney, NSW, Australia
| | - Edward C Holmes
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia.,School of Medical Sciences, University of Sydney, Sydney, NSW, Australia
| | - Matthew N O'Sullivan
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Sharon C-A Chen
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Jen Kok
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Dominic E Dwyer
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia.,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia.,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia
| | - Vitali Sintchenko
- Sydney Institute for Infectious Diseases, University of Sydney, Sydney, NSW, Australia. .,Centre for Infectious Diseases and Microbiology-Public Health, Westmead Hospital, Westmead, NSW, Australia. .,Institute for Clinical Pathology and Medical Research, New South Wales Health Pathology, Westmead, NSW, Australia. .,School of Medical Sciences, University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
24
|
Lai A, Bergna A, Toppo S, Morganti M, Menzo S, Ghisetti V, Bruzzone B, Codeluppi M, Fiore V, Rullo EV, Antonelli G, Sarmati L, Brindicci G, Callegaro A, Sagnelli C, Francisci D, Vicenti I, Miola A, Tonon G, Cirillo D, Menozzi I, Caucci S, Cerutti F, Orsi A, Schiavo R, Babudieri S, Nunnari G, Mastroianni CM, Andreoni M, Monno L, Guarneri D, Coppola N, Crisanti A, Galli M, Zehender G. Phylogeography and genomic epidemiology of SARS-CoV-2 in Italy and Europe with newly characterized Italian genomes between February-June 2020. Sci Rep 2022; 12:5736. [PMID: 35388091 PMCID: PMC8986836 DOI: 10.1038/s41598-022-09738-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 03/25/2022] [Indexed: 12/29/2022] Open
Abstract
The aims of this study were to characterize new SARS-CoV-2 genomes sampled all over Italy and to reconstruct the origin and the evolutionary dynamics in Italy and Europe between February and June 2020. The cluster analysis showed only small clusters including < 80 Italian isolates, while most of the Italian strains were intermixed in the whole tree. Pure Italian clusters were observed mainly after the lockdown and distancing measures were adopted. Lineage B and B.1 spread between late January and early February 2020, from China to Veneto and Lombardy, respectively. Lineage B.1.1 (20B) most probably evolved within Italy and spread from central to south Italian regions, and to European countries. The lineage B.1.1.1 (20D) developed most probably in other European countries entering Italy only in the second half of March and remained localized in Piedmont until June 2020. In conclusion, within the limitations of phylogeographical reconstruction, the estimated ancestral scenario suggests an important role of China and Italy in the widespread diffusion of the D614G variant in Europe in the early phase of the pandemic and more dispersed exchanges involving several European countries from the second half of March 2020.
Collapse
Affiliation(s)
- Alessia Lai
- Department of Biomedical and Clinical Sciences Luigi Sacco, University of Milan, Milan, Italy.,Pediatric Clinical Research Center Fondazione Romeo ed Enrica Invernizzi, University of Milan, Milan, Italy
| | - Annalisa Bergna
- Department of Biomedical and Clinical Sciences Luigi Sacco, University of Milan, Milan, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padua, Italy.,CRIBI Biotech Center, University of Padova, Padua, Italy
| | - Marina Morganti
- Risk Analyses and Genomic Epidemiology Unit, Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna, Parma, Italy
| | - Stefano Menzo
- Department of Biomedical Sciences and Public Health, Virology Unit, Polytechnic University of Marche, Ancona, Italy
| | - Valeria Ghisetti
- Laboratory of Microbiology and Virology, Amedeo di Savoia, ASL Città di Torino, Torino, Italy
| | | | - Mauro Codeluppi
- UOC of Infectious Diseases, Department of Oncology and Hematology, Guglielmo da Saliceto Hospital, AUSL Piacenza, Piacenza, Italy
| | - Vito Fiore
- Infectious and Tropical Disease Clinic, Department of Medical, Surgical and Experimental Sciences, University of Sassari, Sassari, Italy
| | - Emmanuele Venanzi Rullo
- Unit of Infectious Diseases, Department of Experimental and Clinical Medicine, University of Messina, Messina, Italy
| | - Guido Antonelli
- Department of Molecular Medicine, University Hospital Policlinico Umberto I, Sapienza University of Rome, Rome, Italy
| | | | | | - Annapaola Callegaro
- Microbiology and Virology Laboratory, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Caterina Sagnelli
- Department of Mental Health and Public Medicine, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Daniela Francisci
- Department of Medicine and Surgery, Clinic of Infectious Diseases, "Santa Maria della Misericordia" Hospital, University of Perugia, Perugia, Italy
| | - Ilaria Vicenti
- Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Arianna Miola
- Intesa San Paolo Innovation Center-AI LAB, Turin, Italy
| | - Giovanni Tonon
- Center for Omics Sciences, IRCCS Ospedale San Raffaele, Milan, Italy.,Division of Experimental Oncology, IRCCS Ospedale San Raffaele, Milan, Italy
| | - Daniela Cirillo
- Division of Immunology, Transplantation and Infectious Disease, IRCCS Ospedale San Raffaele, Milan, Italy
| | - Ilaria Menozzi
- Risk Analyses and Genomic Epidemiology Unit, Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna, Parma, Italy
| | - Sara Caucci
- Department of Biomedical Sciences and Public Health, Virology Unit, Polytechnic University of Marche, Ancona, Italy
| | - Francesco Cerutti
- Laboratory of Microbiology and Virology, Amedeo di Savoia, ASL Città di Torino, Torino, Italy
| | - Andrea Orsi
- Department of Health Sciences (DISSAL), University of Genoa, Genoa, Italy
| | - Roberta Schiavo
- UOC of Microbiology, Department of Clinical Pathology, Guglielmo da Saliceto Hospital, AUSL Piacenza, Piacenza, Italy
| | - Sergio Babudieri
- Infectious and Tropical Disease Clinic, Department of Medical, Surgical and Experimental Sciences, University of Sassari, Sassari, Italy
| | - Giuseppe Nunnari
- Unit of Infectious Diseases, Department of Experimental and Clinical Medicine, University of Messina, Messina, Italy
| | - Claudio M Mastroianni
- Department of Public Health and Infectious Diseases, University Hospital Policlinico Umberto I, Sapienza University of Rome, Rome, Italy
| | | | - Laura Monno
- Infectious Diseases Unit, University of Bari, Bari, Italy
| | - Davide Guarneri
- Microbiology and Virology Laboratory, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Nicola Coppola
- Department of Mental Health and Public Medicine, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Andrea Crisanti
- Microbiology and Virology Diagnostic Unit, Padua University Hospital, Padua, Italy.,Department of Life Science, Imperial College London, South Kensington Campus Imperial College Road, London, SW7 AZ, UK
| | - Massimo Galli
- Department of Biomedical and Clinical Sciences Luigi Sacco, University of Milan, Milan, Italy
| | - Gianguglielmo Zehender
- Department of Biomedical and Clinical Sciences Luigi Sacco, University of Milan, Milan, Italy. .,Pediatric Clinical Research Center Fondazione Romeo ed Enrica Invernizzi, University of Milan, Milan, Italy. .,CRC-Coordinated Research Center "EpiSoMI", University of Milan, Milan, Italy.
| | | |
Collapse
|
25
|
De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets. PLoS Comput Biol 2022; 18:e1010056. [PMID: 35486906 PMCID: PMC9094560 DOI: 10.1371/journal.pcbi.1010056] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 05/11/2022] [Accepted: 03/25/2022] [Indexed: 11/26/2022] Open
Abstract
Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - William Boulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, California, United States of America
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| |
Collapse
|
26
|
De Maio N, Kalaghatgi P, Turakhia Y, Corbett-detig R, Minh BQ, Goldman N. Maximum likelihood pandemic-scale phylogenetics.. [PMID: 35350209 PMCID: PMC8963701 DOI: 10.1101/2022.03.22.485312] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Genomic data plays an essential role in the study of transmissible disease, as exemplified by its current use in identifying and tracking the spread of novel SARS-CoV-2 variants. However, with the increase in size of genomic epidemiological datasets, their phylogenetic analyses become increasingly impractical due to high computational demand. In particular, while maximum likelihood methods are go-to tools for phylogenetic inference, the scale of datasets from the ongoing pandemic has made apparent the urgent need for more computationally efficient approaches. Here we propose a new likelihood-based phylogenetic framework that greatly reduces both the memory and time demand of popular maximum likelihood approaches when analysing many closely related genomes, as in the scenario of SARS-CoV-2 genome data and more generally throughout genomic epidemiology. To achieve this, we rewrite the classical Felsenstein pruning algorithm so that we can infer phylogenetic trees on at least 10 times larger datasets with higher accuracy than existing maximum likelihood methods. Our algorithms provide a powerful framework for maximum-likelihood genomic epidemiology and could facilitate similarly groundbreaking applications in Bayesian phylogenomic analyses as well.
Collapse
|
27
|
Pattabiraman C, Prasad P, George AK, Sreenivas D, Rasheed R, Reddy NVK, Desai A, Vasanthapuram R. Importation, circulation, and emergence of variants of SARS-CoV-2 in the South Indian state of Karnataka. Wellcome Open Res 2022; 6:110. [PMID: 35243004 PMCID: PMC8857524 DOI: 10.12688/wellcomeopenres.16768.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/31/2022] [Indexed: 12/18/2022] Open
Abstract
Background: As the coronavirus disease 2019 (COVID-19) pandemic continues, the selection of genomic variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) associated with higher transmission, more severe disease, re-infection, and immune escape are a cause for concern. Such variants have been reported from the UK (B.1.1.7), South Africa (B.1.351) and, Brazil (P.1/B.1.1.28). We performed this study to track the importation, spread, and emergence of variants locally. Methods: We sequenced whole genomes of SARS-CoV-2 from international travellers (n=75) entering Karnataka, South India, between Dec 22, 2020 and Jan 31, 2021, and from positive cases in the city of Bengaluru (n=108), between Nov 22, 2020- Jan 22, 2021, as well as a local outbreak. We present the lineage distribution and analysis of these sequences. Results: Genomes from the study group into 34 lineages. Variant B.1.1.7 was introduced by international travel (24/73, 32.9%). Lineage B.1.36 and B.1 formed a major fraction of both imported (B.1.36: 20/73, 27.4%; B.1: 14/73, 19.2%), and circulating viruses (B.1.36: 45/103; 43.7%,. B.1: 26/103; 25.2%). The lineage B.1.36 was also associated with a local outbreak. We detected nine amino acid changes, previously associated with immune escape, spread across multiple lineages. The N440K change was detected in 45/162 (27.7%) of the sequences, 37 of these were in the B.1.36 lineage (37/65, 56.92%) Conclusions: Our data support the idea that variants of concern spread by travel. Viruses with amino acid replacements associated with immune escape are already circulating. It is critical to check transmission and monitor changes in SARS-CoV-2 locally.
Collapse
Affiliation(s)
- Chitra Pattabiraman
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Pramada Prasad
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Anson K. George
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Darshan Sreenivas
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Risha Rasheed
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Nakka Vijay Kiran Reddy
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Anita Desai
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Ravi Vasanthapuram
- Nodal Officer Genetic Confirmation of SARS-CoV-2, Government of Karnataka, Bengaluru, India
| |
Collapse
|
28
|
Mourier T, Shuaib M, Hala S, Mfarrej S, Alofi F, Naeem R, Alsomali A, Jorgensen D, Subudhi AK, Ben Rached F, Guan Q, Salunke RP, Ooi A, Esau L, Douvropoulou O, Nugmanova R, Perumal S, Zhang H, Rajan I, Al-Omari A, Salih S, Shamsan A, Al Mutair A, Taha J, Alahmadi A, Khotani N, Alhamss A, Mahmoud A, Alquthami K, Dageeg A, Khogeer A, Hashem AM, Moraga P, Volz E, Almontashiri N, Pain A. SARS-CoV-2 genomes from Saudi Arabia implicate nucleocapsid mutations in host response and increased viral load. Nat Commun 2022; 13:601. [PMID: 35105893 PMCID: PMC8807822 DOI: 10.1038/s41467-022-28287-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 01/12/2022] [Indexed: 02/06/2023] Open
Abstract
Monitoring SARS-CoV-2 spread and evolution through genome sequencing is essential in handling the COVID-19 pandemic. Here, we sequenced 892 SARS-CoV-2 genomes collected from patients in Saudi Arabia from March to August 2020. We show that two consecutive mutations (R203K/G204R) in the nucleocapsid (N) protein are associated with higher viral loads in COVID-19 patients. Our comparative biochemical analysis reveals that the mutant N protein displays enhanced viral RNA binding and differential interaction with key host proteins. We found increased interaction of GSK3A kinase simultaneously with hyper-phosphorylation of the adjacent serine site (S206) in the mutant N protein. Furthermore, the host cell transcriptome analysis suggests that the mutant N protein produces dysregulated interferon response genes. Here, we provide crucial information in linking the R203K/G204R mutations in the N protein to modulations of host-virus interactions and underline the potential of the nucleocapsid protein as a drug target during infection.
Collapse
Affiliation(s)
- Tobias Mourier
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Muhammad Shuaib
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Sharif Hala
- Infectious Disease Research Department, King Abdullah International Medical Research Centre, Ministry of National Guard Health Affairs, Jeddah, Saudi Arabia
- King Saud bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Jeddah, Saudi Arabia
| | - Sara Mfarrej
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Fadwa Alofi
- Infectious Diseases Department, King Fahad Hospital, Madinah, MOH, Saudi Arabia
| | - Raeece Naeem
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Afrah Alsomali
- Infectious Diseases Department, King Abdullah Medical Complex, Jeddah, MOH, Saudi Arabia
| | - David Jorgensen
- School of Public Health, Faculty of Medicine, Imperial College, Norfolk Place, St Mary's Campus, London, United Kingdom
| | - Amit Kumar Subudhi
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Fathia Ben Rached
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Qingtian Guan
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Rahul P Salunke
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Amanda Ooi
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Luke Esau
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Olga Douvropoulou
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Raushan Nugmanova
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Sadhasivam Perumal
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Huoming Zhang
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Issaac Rajan
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Awad Al-Omari
- Dr. Suliman Al-Habib Medical Group, Riyadh, Saudi Arabia
| | - Samer Salih
- Dr. Suliman Al-Habib Medical Group, Riyadh, Saudi Arabia
| | - Abbas Shamsan
- Dr. Suliman Al-Habib Medical Group, Riyadh, Saudi Arabia
| | | | - Jumana Taha
- Department of Neuroscience, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Abdulaziz Alahmadi
- Department of Preventive Medicine, Ministry of National Guard - Health Affairs, Riyadh, Saudi Arabia
| | - Nashwa Khotani
- Infectious Diseases Medical Department, Al Noor Specialist Hospital Makkah, Makkah, MOH, Saudi Arabia
| | - Abdelrahman Alhamss
- Gastroenterology Department, King Abdul Aziz Hospital Makkah, Makkah, MOH, Saudi Arabia
| | - Ahmed Mahmoud
- College of Applied Medical Sciences, Taibah University, Madinah, Saudi Arabia
| | - Khaled Alquthami
- Infectious Diseases Medical Department, Al Noor Specialist Hospital Makkah, Makkah, MOH, Saudi Arabia
| | - Abdullah Dageeg
- Department of Medicine, King Abdulaziz University Jeddah, Jeddah, Saudi Arabia
| | - Asim Khogeer
- Plan and Research Department, General Directorate of Health Affairs Makkah Region, Makkah, MOH, Saudi Arabia
| | - Anwar M Hashem
- Vaccines and Immunotherapy Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Medical Microbiology and Parasitology, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Paula Moraga
- King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Science and Engineering Division (CEMSE), Thuwal-Jeddah, 23955-6900, Saudi Arabia
| | - Eric Volz
- School of Public Health, Faculty of Medicine, Imperial College, Norfolk Place, St Mary's Campus, London, United Kingdom
| | - Naif Almontashiri
- College of Applied Medical Sciences, Taibah University, Madinah, Saudi Arabia
- Center for Genetics and Inherited Diseases, Taibah University, Almadinah Almunwarah, Saudi Arabia
| | - Arnab Pain
- King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal-Jeddah, 23955-6900, Saudi Arabia.
- Research Center for Zoonosis Control, Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, N20 W10 Kita-ku, Sapporo, 001-0020, Japan.
| |
Collapse
|
29
|
Santiago GA, Flores B, Gonzalez GL, Charriez KN, Cora-Huertas L, Volkman HR, Van Belleghem S, Rivera-Amill V, Adams LE, Marzan M, Hernandez L, Cardona I, O'Neill E, Paz-Bailey G, Papa R, Munoz-Jordan JL. Genomic surveillance of SARS-CoV-2 in Puerto Rico reveals emergence of an autochthonous lineage and early detection of variants. RESEARCH SQUARE 2022:rs.3.rs-1277781. [PMID: 35075454 PMCID: PMC8786232 DOI: 10.21203/rs.3.rs-1277781/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Puerto Rico has experienced the full impact of the COVID-19 pandemic. Since SARS-CoV-2, the virus that causes COVID-19, was first detected on the island in March of 2020, it spread rapidly though the island’s population and became a critical threat to public health. We conducted a genomic surveillance study through a partnership with health agencies and academic institutions to understand the emergence and molecular epidemiology of the virus on the island. We sampled COVID-19 cases monthly over 19 months and sequenced a total of 753 SARS-CoV-2 genomes between March 2020 and September 2021 to reconstruct the local epidemic in a regional context using phylogenetic inference. Our analyses revealed that multiple importation events propelled the emergence and spread of the virus throughout the study period, including the introduction and spread of most SARS-CoV-2 variants detected world-wide. Lineage turnover cycles through various phases of the local epidemic were observed, where the predominant lineage was replaced by the next competing lineage or variant after approximately 4 months of circulation locally. We also identified the emergence of lineage B.1.588, an autochthonous lineage that predominated circulation in Puerto Rico from September to December 2020 and subsequently spread to the United States. The results of this collaborative approach highlight the importance of timely collection and analysis of SARS-CoV-2 genomic surveillance data to inform public health responses.
Collapse
|
30
|
Gallego-García P, Varela N, Estévez-Gómez N, De Chiara L, Fernández-Silva I, Valverde D, Sapoval N, Treangen TJ, Regueiro B, Cabrera-Alvargonzález JJ, del Campo V, Pérez S, Posada D. OUP accepted manuscript. Virus Evol 2022; 8:veac008. [PMID: 35242361 PMCID: PMC8889950 DOI: 10.1093/ve/veac008] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 12/21/2021] [Accepted: 02/04/2022] [Indexed: 11/23/2022] Open
Abstract
A detailed understanding of how and when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission occurs is crucial for designing effective prevention measures. Other than contact tracing, genome sequencing provides information to help infer who infected whom. However, the effectiveness of the genomic approach in this context depends on both (high enough) mutation and (low enough) transmission rates. Today, the level of resolution that we can obtain when describing SARS-CoV-2 outbreaks using just genomic information alone remains unclear. In order to answer this question, we sequenced forty-nine SARS-CoV-2 patient samples from ten local clusters in NW Spain for which partial epidemiological information was available and inferred transmission history using genomic variants. Importantly, we obtained high-quality genomic data, sequencing each sample twice and using unique barcodes to exclude cross-sample contamination. Phylogenetic and cluster analyses showed that consensus genomes were generally sufficient to discriminate among independent transmission clusters. However, levels of intrahost variation were low, which prevented in most cases the unambiguous identification of direct transmission events. After filtering out recurrent variants across clusters, the genomic data were generally compatible with the epidemiological information but did not support specific transmission events over possible alternatives. We estimated the effective transmission bottleneck size to be one to two viral particles for sample pairs whose donor–recipient relationship was likely. Our analyses suggest that intrahost genomic variation in SARS-CoV-2 might be generally limited and that homoplasy and recurrent errors complicate identifying shared intrahost variants. Reliable reconstruction of direct SARS-CoV-2 transmission based solely on genomic data seems hindered by a slow mutation rate, potential convergent events, and technical artifacts. Detailed contact tracing seems essential in most cases to study SARS-CoV-2 transmission at high resolution.
Collapse
Affiliation(s)
| | - Nair Varela
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
| | - Nuria Estévez-Gómez
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
| | - Loretta De Chiara
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
| | - Iria Fernández-Silva
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, Vigo 36310, Spain
| | - Diana Valverde
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, Vigo 36310, Spain
| | | | | | - Benito Regueiro
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Microbiology, Complexo Hospitalario Universitario de Vigo (CHUVI), Sergas, Vigo 36213, Spain
- Microbiology and Parasitology Department, Medicine and Odontology, Universidade de Santiago, Santiago de Compostela 15782, Spain
| | - Jorge Julio Cabrera-Alvargonzález
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Microbiology, Complexo Hospitalario Universitario de Vigo (CHUVI), Sergas, Vigo 36213, Spain
| | - Víctor del Campo
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Preventive Medicine, Complexo Hospitalario Universitario de Vigo (CHUVI), Sergas, Vigo 36213, Spain
| | | | | |
Collapse
|
31
|
Santiago GA, Flores B, González GL, Charriez KN, Huertas LC, Volkman HR, Van Belleghem SM, Rivera-Amill V, Adams LE, Marzán M, Hernández L, Cardona I, O’Neill E, Paz-Bailey G, Papa R, Muñoz-Jordan JL. Genomic surveillance of SARS-CoV-2 in Puerto Rico enabled early detection and tracking of variants. COMMUNICATIONS MEDICINE 2022; 2:100. [PMID: 35968047 PMCID: PMC9366129 DOI: 10.1038/s43856-022-00168-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 07/28/2022] [Indexed: 11/20/2022] Open
Abstract
Background Puerto Rico has experienced the full impact of the COVID-19 pandemic. Since SARS-CoV-2, the virus that causes COVID-19, was first detected on the island in March of 2020, it spread rapidly though the island's population and became a critical threat to public health. Methods We conducted a genomic surveillance study through a partnership with health agencies and academic institutions to understand the emergence and molecular epidemiology of the virus on the island. We sampled COVID-19 cases monthly over 19 months and sequenced a total of 753 SARS-CoV-2 genomes between March 2020 and September 2021 to reconstruct the local epidemic in a regional context using phylogenetic inference. Results Our analyses reveal that multiple importation events propelled the emergence and spread of the virus throughout the study period, including the introduction and spread of most SARS-CoV-2 variants detected world-wide. Lineage turnover cycles through various phases of the local epidemic were observed, where the predominant lineage was replaced by the next competing lineage or variant after ~4 months of circulation locally. We also identified the emergence of lineage B.1.588, an autochthonous lineage that predominated in Puerto Rico from September to December 2020 and subsequently spread to the United States. Conclusions The results of this collaborative approach highlight the importance of timely collection and analysis of SARS-CoV-2 genomic surveillance data to inform public health responses.
Collapse
Affiliation(s)
- Gilberto A. Santiago
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Betzabel Flores
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Glenda L. González
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Keyla N. Charriez
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Limari Cora Huertas
- grid.280412.dUniversity of Puerto Rico—Río Piedras, Department of Biology, Molecular Sciences and Research Center, San Juan, Puerto Rico
| | - Hannah R. Volkman
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Steven M. Van Belleghem
- grid.280412.dUniversity of Puerto Rico—Río Piedras, Department of Biology, Molecular Sciences and Research Center, San Juan, Puerto Rico
| | - Vanessa Rivera-Amill
- grid.262009.f0000 0004 0455 6268Ponce Health Sciences University, Ponce Research Institute, Department of Basic Sciences, Ponce, Puerto Rico
| | - Laura E. Adams
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Melissa Marzán
- grid.280499.ePuerto Rico Department of Health, Epidemiology Office, San Juan, Puerto Rico
| | - Lorena Hernández
- grid.280499.ePuerto Rico Department of Health, Epidemiology Office, San Juan, Puerto Rico
| | - Iris Cardona
- grid.280499.ePuerto Rico Department of Health, Epidemiology Office, San Juan, Puerto Rico
| | - Eduardo O’Neill
- grid.416738.f0000 0001 2163 0069Centers for Disease Control and Prevention, Office of Island Affairs, Center for State, Tribal, Local, and Territorial Support, Atlanta, GA USA
| | - Gabriela Paz-Bailey
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| | - Riccardo Papa
- grid.280412.dUniversity of Puerto Rico—Río Piedras, Department of Biology, Molecular Sciences and Research Center, San Juan, Puerto Rico
| | - Jorge L. Muñoz-Jordan
- grid.470962.eCenters for Disease Control and Prevention, National Centers for Emerging and Zoonotic Infectious Diseases, Division of Vector Borne Diseases, Dengue Branch, San Juan, Puerto Rico
| |
Collapse
|
32
|
McBroome J, Thornlow B, Hinrichs AS, Kramer A, De Maio N, Goldman N, Haussler D, Corbett-Detig R, Turakhia Y. A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees. Mol Biol Evol 2021; 38:5819-5824. [PMID: 34469548 PMCID: PMC8662617 DOI: 10.1093/molbev/msab264] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.
Collapse
Affiliation(s)
- Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alexander Kramer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| |
Collapse
|
33
|
Abstract
The origin and early spread of SARS-CoV-2 remains shrouded in mystery. Here, I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH's Sequence Read Archive. I recover the deleted files from the Google Cloud and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data further supports the idea that the Huanan Seafood Market sequences are not fully representative of the viruses in Wuhan early in the epidemic. Instead, the progenitor of currently known SARS-CoV-2 sequences likely contained three mutations relative to the market viruses that made it more similar to SARS-CoV-2's bat coronavirus relatives.
Collapse
Affiliation(s)
- Jesse D Bloom
- Fred Hutchinson Cancer Research Center, Howard Hughes Medical Institute, Seattle, WA
| |
Collapse
|
34
|
Sarwar MB, Yasir M, Alikhan NF, Afzal N, de Oliveira Martins L, Le Viet T, Trotter AJ, Prosolek SJ, Kay GL, Foster-Nyarko E, Rudder S, Baker DJ, Muntaha ST, Roman M, Webber MA, Shafiq A, Shabbir B, Akram J, Page AJ, Jahan S. SARS-CoV-2 variants of concern dominate in Lahore, Pakistan in April 2021. Microb Genom 2021; 7. [PMID: 34846280 PMCID: PMC8743565 DOI: 10.1099/mgen.0.000693] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The SARS-CoV-2 pandemic continues to expand globally, with case numbers rising in many areas of the world, including the Indian sub-continent. Pakistan has one of the world’s largest populations, of over 200 million people and is experiencing a severe third wave of infections caused by SARS-CoV-2 that began in March 2021. In Pakistan, during the third wave until now only 12 SARS-CoV-2 genomes have been collected and among these nine are from Islamabad. This highlights the need for more genome sequencing to allow surveillance of variants in circulation. In fact, more genomes are available among travellers with a travel history from Pakistan, than from within the country itself. We thus aimed to provide a snapshot assessment of circulating lineages in Lahore and surrounding areas with a combined population of 11.1 million. Within a week of April 2021, 102 samples were sequenced. The samples were randomly collected from two hospitals with a diagnostic PCR cutoff value of less than 25 cycles. Analysis of the lineages shows that the Alpha variant of concern (first identified in the UK) dominates, accounting for 97.9 % (97/99) of cases, with the Beta variant of concern (first identified in South Africa) accounting for 2.0 % (2/99) of cases. No other lineages were observed. In depth analysis of the Alpha lineages indicated multiple separate introductions and subsequent establishment within the region. Eight samples were identical to genomes observed in Europe (seven UK, one Switzerland), indicating recent transmission. Genomes of other samples show evidence that these have evolved, indicating sustained transmission over a period of time either within Pakistan or other countries with low-density genome sequencing. Vaccines remain effective against Alpha, however, the low level of Beta against which some vaccines are less effective demonstrates the requirement for continued prospective genomic surveillance.
Collapse
Affiliation(s)
| | - Muhammad Yasir
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | | | - Nadeem Afzal
- Department of Immunology, University of Health Sciences, Lahore, Pakistan
| | | | - Thanh Le Viet
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | | | - Sophie J Prosolek
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - Gemma L Kay
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | | | - Steven Rudder
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - David J Baker
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - Sidra Tul Muntaha
- Department of Immunology, University of Health Sciences, Lahore, Pakistan.,Central Diagnostic Facility, Mayo Hospital, Lahore, Pakistan
| | - Muhammad Roman
- Department of Immunology, University of Health Sciences, Lahore, Pakistan
| | - Mark A Webber
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK.,University of East Anglia, Norwich, Norfolk, UK
| | - Almina Shafiq
- Department of Immunology, University of Health Sciences, Lahore, Pakistan
| | - Bilquis Shabbir
- Department of Medicine, East Medical Ward, King Edward Medical University Mayo Hospital, Lahore, Pakistan
| | - Javed Akram
- Department of Immunology, University of Health Sciences, Lahore, Pakistan
| | - Andrew J Page
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
| | - Shah Jahan
- Department of Immunology, University of Health Sciences, Lahore, Pakistan
| |
Collapse
|
35
|
Fontenele RS, Kraberger S, Hadfield J, Driver EM, Bowes D, Holland LA, Faleye TOC, Adhikari S, Kumar R, Inchausti R, Holmes WK, Deitrick S, Brown P, Duty D, Smith T, Bhatnagar A, Yeager RA, Holm RH, von Reitzenstein NH, Wheeler E, Dixon K, Constantine T, Wilson MA, Lim ES, Jiang X, Halden RU, Scotch M, Varsani A. High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants. WATER RESEARCH 2021. [PMID: 34607084 DOI: 10.1101/2021.01.22.21250320%j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) likely emerged from a zoonotic spill-over event and has led to a global pandemic. The public health response has been predominantly informed by surveillance of symptomatic individuals and contact tracing, with quarantine, and other preventive measures have then been applied to mitigate further spread. Non-traditional methods of surveillance such as genomic epidemiology and wastewater-based epidemiology (WBE) have also been leveraged during this pandemic. Genomic epidemiology uses high-throughput sequencing of SARS-CoV-2 genomes to inform local and international transmission events, as well as the diversity of circulating variants. WBE uses wastewater to analyse community spread, as it is known that SARS-CoV-2 is shed through bodily excretions. Since both symptomatic and asymptomatic individuals contribute to wastewater inputs, we hypothesized that the resultant pooled sample of population-wide excreta can provide a more comprehensive picture of SARS-CoV-2 genomic diversity circulating in a community than clinical testing and sequencing alone. In this study, we analysed 91 wastewater samples from 11 states in the USA, where the majority of samples represent Maricopa County, Arizona (USA). With the objective of assessing the viral diversity at a population scale, we undertook a single-nucleotide variant (SNV) analysis on data from 52 samples with >90% SARS-CoV-2 genome coverage of sequence reads, and compared these SNVs with those detected in genomes sequenced from clinical patients. We identified 7973 SNVs, of which 548 were "novel" SNVs that had not yet been identified in the global clinical-derived data as of 17th June 2020 (the day after our last wastewater sampling date). However, between 17th of June 2020 and 20th November 2020, almost half of the novel SNVs have since been detected in clinical-derived data. Using the combination of SNVs present in each sample, we identified the more probable lineages present in that sample and compared them to lineages observed in North America prior to our sampling dates. The wastewater-derived SARS-CoV-2 sequence data indicates there were more lineages circulating across the sampled communities than represented in the clinical-derived data. Principal coordinate analyses identified patterns in population structure based on genetic variation within the sequenced samples, with clear trends associated with increased diversity likely due to a higher number of infected individuals relative to the sampling dates. We demonstrate that genetic correlation analysis combined with SNVs analysis using wastewater sampling can provide a comprehensive snapshot of the SARS-CoV-2 genetic population structure circulating within a community, which might not be observed if relying solely on clinical cases.
Collapse
Affiliation(s)
- Rafaela S Fontenele
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA
| | - Simona Kraberger
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - James Hadfield
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Erin M Driver
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Devin Bowes
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - LaRinda A Holland
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Temitope O C Faleye
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Sangeet Adhikari
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ USA
| | - Rahul Kumar
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Rosa Inchausti
- Strategic Management and Diversity Office, City of Tempe, 31 E Fifth Street, Tempe, AZ 85281, USA
| | - Wydale K Holmes
- Strategic Management and Diversity Office, City of Tempe, 31 E Fifth Street, Tempe, AZ 85281, USA
| | - Stephanie Deitrick
- Enterprise GIS & Data Analytics, Information Technology, 31 E Fifth Street, City of Tempe, Tempe, AZ 85281, USA
| | - Philip Brown
- Municipal Utilities, City of Tempe, 31 E Fifth Street, Tempe, AZ 85281, USA
| | - Darrell Duty
- Tempe Fire Medical Rescue, 31 E Fifth Street, City of Tempe, Tempe, AZ 85281, USA
| | - Ted Smith
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | - Aruni Bhatnagar
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | - Ray A Yeager
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | - Rochelle H Holm
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | | | - Elliott Wheeler
- Jacobs Engineering Group Inc., 1999 Bryan Street, Dallas, TX 75201, USA
| | - Kevin Dixon
- Jacobs Engineering Group Inc., 1999 Bryan Street, Dallas, TX 75201, USA
| | - Tim Constantine
- Jacobs Engineering Group Inc., 1999 Bryan Street, Dallas, TX 75201, USA
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA; Center for Evolution and Medicine, Arizona State University, 401 E. Tyler Mall, Tempe, AZ 85287, USA
| | - Efrem S Lim
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA
| | - Xiaofang Jiang
- National Library of Medicine, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Rolf U Halden
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; OneWaterOneHealth, Nonprofit Project of the Arizona State University Foundation, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Matthew Scotch
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; College of Health Solutions, Arizona State University, 550 N. 3rd St, Phoenix, AZ 85004, USA
| | - Arvind Varsani
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA; Center for Evolution and Medicine, Arizona State University, 401 E. Tyler Mall, Tempe, AZ 85287, USA.
| |
Collapse
|
36
|
Fontenele RS, Kraberger S, Hadfield J, Driver EM, Bowes D, Holland LA, Faleye TOC, Adhikari S, Kumar R, Inchausti R, Holmes WK, Deitrick S, Brown P, Duty D, Smith T, Bhatnagar A, Yeager RA, Holm RH, von Reitzenstein NH, Wheeler E, Dixon K, Constantine T, Wilson MA, Lim ES, Jiang X, Halden RU, Scotch M, Varsani A. High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants. WATER RESEARCH 2021; 205:117710. [PMID: 34607084 PMCID: PMC8464352 DOI: 10.1016/j.watres.2021.117710] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 09/15/2021] [Accepted: 09/22/2021] [Indexed: 05/18/2023]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) likely emerged from a zoonotic spill-over event and has led to a global pandemic. The public health response has been predominantly informed by surveillance of symptomatic individuals and contact tracing, with quarantine, and other preventive measures have then been applied to mitigate further spread. Non-traditional methods of surveillance such as genomic epidemiology and wastewater-based epidemiology (WBE) have also been leveraged during this pandemic. Genomic epidemiology uses high-throughput sequencing of SARS-CoV-2 genomes to inform local and international transmission events, as well as the diversity of circulating variants. WBE uses wastewater to analyse community spread, as it is known that SARS-CoV-2 is shed through bodily excretions. Since both symptomatic and asymptomatic individuals contribute to wastewater inputs, we hypothesized that the resultant pooled sample of population-wide excreta can provide a more comprehensive picture of SARS-CoV-2 genomic diversity circulating in a community than clinical testing and sequencing alone. In this study, we analysed 91 wastewater samples from 11 states in the USA, where the majority of samples represent Maricopa County, Arizona (USA). With the objective of assessing the viral diversity at a population scale, we undertook a single-nucleotide variant (SNV) analysis on data from 52 samples with >90% SARS-CoV-2 genome coverage of sequence reads, and compared these SNVs with those detected in genomes sequenced from clinical patients. We identified 7973 SNVs, of which 548 were "novel" SNVs that had not yet been identified in the global clinical-derived data as of 17th June 2020 (the day after our last wastewater sampling date). However, between 17th of June 2020 and 20th November 2020, almost half of the novel SNVs have since been detected in clinical-derived data. Using the combination of SNVs present in each sample, we identified the more probable lineages present in that sample and compared them to lineages observed in North America prior to our sampling dates. The wastewater-derived SARS-CoV-2 sequence data indicates there were more lineages circulating across the sampled communities than represented in the clinical-derived data. Principal coordinate analyses identified patterns in population structure based on genetic variation within the sequenced samples, with clear trends associated with increased diversity likely due to a higher number of infected individuals relative to the sampling dates. We demonstrate that genetic correlation analysis combined with SNVs analysis using wastewater sampling can provide a comprehensive snapshot of the SARS-CoV-2 genetic population structure circulating within a community, which might not be observed if relying solely on clinical cases.
Collapse
Affiliation(s)
- Rafaela S Fontenele
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA
| | - Simona Kraberger
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - James Hadfield
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Erin M Driver
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Devin Bowes
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - LaRinda A Holland
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Temitope O C Faleye
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Sangeet Adhikari
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ USA
| | - Rahul Kumar
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Rosa Inchausti
- Strategic Management and Diversity Office, City of Tempe, 31 E Fifth Street, Tempe, AZ 85281, USA
| | - Wydale K Holmes
- Strategic Management and Diversity Office, City of Tempe, 31 E Fifth Street, Tempe, AZ 85281, USA
| | - Stephanie Deitrick
- Enterprise GIS & Data Analytics, Information Technology, 31 E Fifth Street, City of Tempe, Tempe, AZ 85281, USA
| | - Philip Brown
- Municipal Utilities, City of Tempe, 31 E Fifth Street, Tempe, AZ 85281, USA
| | - Darrell Duty
- Tempe Fire Medical Rescue, 31 E Fifth Street, City of Tempe, Tempe, AZ 85281, USA
| | - Ted Smith
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | - Aruni Bhatnagar
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | - Ray A Yeager
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | - Rochelle H Holm
- Christina Lee Brown Envirome Institute, University of Louisville, 302 E. Muhammad Ali Blvd., Louisville, KY 40202, USA
| | | | - Elliott Wheeler
- Jacobs Engineering Group Inc., 1999 Bryan Street, Dallas, TX 75201, USA
| | - Kevin Dixon
- Jacobs Engineering Group Inc., 1999 Bryan Street, Dallas, TX 75201, USA
| | - Tim Constantine
- Jacobs Engineering Group Inc., 1999 Bryan Street, Dallas, TX 75201, USA
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA; Center for Evolution and Medicine, Arizona State University, 401 E. Tyler Mall, Tempe, AZ 85287, USA
| | - Efrem S Lim
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA
| | - Xiaofang Jiang
- National Library of Medicine, National Institute of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Rolf U Halden
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; OneWaterOneHealth, Nonprofit Project of the Arizona State University Foundation, 1001 S. McAllister Ave., Tempe, AZ 85281, USA
| | - Matthew Scotch
- Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; College of Health Solutions, Arizona State University, 550 N. 3rd St, Phoenix, AZ 85004, USA
| | - Arvind Varsani
- The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, 1001 S. McAllister Ave., Tempe, AZ 85281, USA; School of Life Sciences, Arizona State University, 427 East Tyler Mall, Tempe, AZ 85287, USA; Center for Evolution and Medicine, Arizona State University, 401 E. Tyler Mall, Tempe, AZ 85287, USA.
| |
Collapse
|
37
|
Smith MR, Trofimova M, Weber A, Duport Y, Kühnert D, von Kleist M. Rapid incidence estimation from SARS-CoV-2 genomes reveals decreased case detection in Europe during summer 2020. Nat Commun 2021; 12:6009. [PMID: 34650062 PMCID: PMC8517019 DOI: 10.1038/s41467-021-26267-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 09/24/2021] [Indexed: 12/24/2022] Open
Abstract
By October 2021, 230 million SARS-CoV-2 diagnoses have been reported. Yet, a considerable proportion of cases remains undetected. Here, we propose GInPipe, a method that rapidly reconstructs SARS-CoV-2 incidence profiles solely from publicly available, time-stamped viral genomes. We validate GInPipe against simulated outbreaks and elaborate phylodynamic analyses. Using available sequence data, we reconstruct incidence histories for Denmark, Scotland, Switzerland, and Victoria (Australia) and demonstrate, how to use the method to investigate the effects of changing testing policies on case ascertainment. Specifically, we find that under-reporting was highest during summer 2020 in Europe, coinciding with more liberal testing policies at times of low testing capacities. Due to the increased use of real-time sequencing, it is envisaged that GInPipe can complement established surveillance tools to monitor the SARS-CoV-2 pandemic. In post-pandemic times, when diagnostic efforts are decreasing, GInPipe may facilitate the detection of hidden infection dynamics.
Collapse
Affiliation(s)
- Maureen Rebecca Smith
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany.
- Bioinformatics (MF1), Robert Koch Institute, Berlin, Germany.
| | - Maria Trofimova
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany
- Bioinformatics (MF1), Robert Koch Institute, Berlin, Germany
| | - Ariane Weber
- Transmission, Infection, Diversification and Evolution Group, Max-Planck Institute for the Science of Human History, Jena, Germany
| | - Yannick Duport
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany
- Bioinformatics (MF1), Robert Koch Institute, Berlin, Germany
| | - Denise Kühnert
- Transmission, Infection, Diversification and Evolution Group, Max-Planck Institute for the Science of Human History, Jena, Germany
- German COVID Omics Initiative (deCOI), Bonn, Germany
| | - Max von Kleist
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany.
- Bioinformatics (MF1), Robert Koch Institute, Berlin, Germany.
- German COVID Omics Initiative (deCOI), Bonn, Germany.
| |
Collapse
|
38
|
Yamasaki L, Moi ML. Complexities in Case Definition of SARS-CoV-2 Reinfection: Clinical Evidence and Implications in COVID-19 Surveillance and Diagnosis. Pathogens 2021; 10:1262. [PMID: 34684211 PMCID: PMC8540172 DOI: 10.3390/pathogens10101262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 09/04/2021] [Accepted: 09/23/2021] [Indexed: 11/16/2022] Open
Abstract
Reinfection cases have been reported in some countries with clinical symptoms ranging from mild to severe. In addition to clinical diagnosis, virus genome sequence from the first and second infection has to be confirmed to either belong to separate clades or had significant mutations for the confirmation of SARS-CoV-2 reinfection. While phylogenetic analysis with paired specimens offers the strongest evidence for reinfection, there remains concerns on the definition of SARS-CoV-2 reinfection, for reasons including accessibility to paired-samples and technical challenges in phylogenetic analysis. In light of the emergence of new SARS-CoV-2 variants that are associated with increased transmissibility and immune-escape further understanding of COVID-19 protective immunity, real-time surveillance directed at identifying COVID-19 transmission patterns, transmissibility of emerging variants and clinical implications of reinfection would be important in addressing the challenges in definition of COVID-19 reinfection and understanding the true disease burden.
Collapse
Affiliation(s)
- Lisa Yamasaki
- WHO Collaborating Center for Reference and Research on Tropical and Emerging Virus Diseases, WHO Global Reference Laboratory for COVID-19, Institute of Tropical Medicine, Nagasaki University, Nagasaki 852-8521, Japan;
- School of International Health/Global Health Science, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan
| | - Meng Ling Moi
- WHO Collaborating Center for Reference and Research on Tropical and Emerging Virus Diseases, WHO Global Reference Laboratory for COVID-19, Institute of Tropical Medicine, Nagasaki University, Nagasaki 852-8521, Japan;
- School of International Health/Global Health Science, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan
| |
Collapse
|
39
|
De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: efficient simulation of sequence evolution for pandemic-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.03.15.435416. [PMID: 33758852 PMCID: PMC7987011 DOI: 10.1101/2021.03.15.435416] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100,000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software is available from https://github.com/NicolaDM/phastSim and allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutatability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - William Boulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
40
|
Kumar S, Tao Q, Weaver S, Sanderford M, Caraballo-Ortiz MA, Sharma S, Pond SLK, Miura S. An Evolutionary Portrait of the Progenitor SARS-CoV-2 and Its Dominant Offshoots in COVID-19 Pandemic. Mol Biol Evol 2021; 38:3046-3059. [PMID: 33942847 PMCID: PMC8135569 DOI: 10.1093/molbev/msab118] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Marcos A Caraballo-Ortiz
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sergei L K Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
41
|
McBroome J, Thornlow B, Hinrichs AS, De Maio N, Goldman N, Haussler D, Corbett-Detig R, Turakhia Y. A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021. [PMID: 33821270 PMCID: PMC8020970 DOI: 10.1101/2021.04.03.438321] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus’ evolutionary history using public data. We also present matUtils – a command-line utility for rapidly querying, interpreting and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.
Collapse
Affiliation(s)
- Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz. Santa Cruz, CA 95064, USA.,Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz. Santa Cruz, CA 95064, USA.,Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz. Santa Cruz, CA 95064, USA.,Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz. Santa Cruz, CA 95064, USA.,Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz. Santa Cruz, CA 95064, USA.,Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
42
|
Marcolungo L, Beltrami C, Degli Esposti C, Lopatriello G, Piubelli C, Mori A, Pomari E, Deiana M, Scarso S, Bisoffi Z, Grosso V, Cosentino E, Maestri S, Lavezzari D, Iadarola B, Paterno M, Segala E, Giovannone B, Gallinaro M, Rossato M, Delledonne M. ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections. Genomics 2021; 113:1628-1638. [PMID: 33839270 PMCID: PMC8028595 DOI: 10.1016/j.ygeno.2021.04.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/26/2021] [Accepted: 04/06/2021] [Indexed: 01/04/2023]
Abstract
Sequencing the SARS-CoV-2 genome from clinical samples can be challenging, especially in specimens with low viral titer. Here we report Accurate SARS-CoV-2 genome Reconstruction (ACoRE), an amplicon-based viral genome sequencing workflow for the complete and accurate reconstruction of SARS-CoV-2 sequences from clinical samples, including suboptimal ones that would usually be excluded even if unique and irreplaceable. The protocol was optimized to improve flexibility and the combination of technical replicates was established as the central strategy to achieve accurate analysis of low-titer/suboptimal samples. We demonstrated the utility of the approach by achieving complete genome reconstruction and the identification of false-positive variants in >170 clinical samples, thus avoiding the generation of inaccurate and/or incomplete sequences. Most importantly, ACoRE was crucial to identify the correct viral strain responsible of a relapse case, that would be otherwise mis-classified as a re-infection due to missing or incorrect variant identification by a standard workflow.
Collapse
Affiliation(s)
- Luca Marcolungo
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Cristina Beltrami
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Chiara Degli Esposti
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Giulia Lopatriello
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Chiara Piubelli
- Department of Infectious and Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, 37024 Verona, Italy
| | - Antonio Mori
- Department of Infectious and Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, 37024 Verona, Italy
| | - Elena Pomari
- Department of Infectious and Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, 37024 Verona, Italy
| | - Michela Deiana
- Department of Infectious and Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, 37024 Verona, Italy
| | - Salvatore Scarso
- Department of Infectious and Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, 37024 Verona, Italy
| | - Zeno Bisoffi
- Department of Infectious and Tropical Diseases and Microbiology, IRCCS Sacro Cuore Don Calabria Hospital, Negrar di Valpolicella, 37024 Verona, Italy,Department of Diagnostics and Public Health, University of Verona, 37134 Verona, Italy
| | - Valentina Grosso
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Emanuela Cosentino
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Simone Maestri
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Denise Lavezzari
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Barbara Iadarola
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Marta Paterno
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Elena Segala
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Barbara Giovannone
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Martina Gallinaro
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| | - Marzia Rossato
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy,Genartis srl, via IV Novembre 24, 37126 Verona, Italy
| | - Massimo Delledonne
- Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy,Genartis srl, via IV Novembre 24, 37126 Verona, Italy,Corresponding author at: Department of Biotechnology, University of Verona, Strada le Grazie 15, 37134 Verona, Italy
| |
Collapse
|
43
|
Kwon SB, Ernst J. Single-nucleotide conservation state annotation of the SARS-CoV-2 genome. Commun Biol 2021; 4:698. [PMID: 34083758 PMCID: PMC8175581 DOI: 10.1038/s42003-021-02231-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 05/14/2021] [Indexed: 11/09/2022] Open
Abstract
Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.
Collapse
Affiliation(s)
- Soo Bin Kwon
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
- Department of Biological Chemistry, University of California, Los Angeles, CA, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA.
- Department of Biological Chemistry, University of California, Los Angeles, CA, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, CA, USA.
- Computer Science Department, University of California, Los Angeles, CA, USA.
- Department of Computational Medicine, University of California, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA.
- Molecular Biology Institute, University of California, Los Angeles, CA, USA.
| |
Collapse
|
44
|
Qutob N, Salah Z, Richard D, Darwish H, Sallam H, Shtayeh I, Najjar O, Ruzayqat M, Najjar D, Balloux F, van Dorp L. Genomic epidemiology of the first epidemic wave of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Palestine. Microb Genom 2021; 7:000584. [PMID: 34156923 PMCID: PMC8461465 DOI: 10.1099/mgen.0.000584] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 04/16/2021] [Indexed: 11/26/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the novel coronavirus responsible for the COVID-19 pandemic, continues to cause a significant public-health burden and disruption globally. Genomic epidemiology approaches point to most countries in the world having experienced many independent introductions of SARS-CoV-2 during the early stages of the pandemic. However, this situation may change with local lockdown policies and restrictions on travel, leading to the emergence of more geographically structured viral populations and lineages transmitting locally. Here, we report the first SARS-CoV-2 genomes from Palestine sampled from early March 2020, when the first cases were observed, through to August of 2020. SARS-CoV-2 genomes from Palestine fall across the diversity of the global phylogeny, consistent with at least nine independent introductions into the region. We identify one locally predominant lineage in circulation represented by 50 Palestinian SARS-CoV-2, grouping with genomes generated from Israel and the UK. We estimate the age of introduction of this lineage to 05/02/2020 (16/01/2020-19/02/2020), suggesting SARS-CoV-2 was already in circulation in Palestine predating its first detection in Bethlehem in early March. Our work highlights the value of ongoing genomic surveillance and monitoring to reconstruct the epidemiology of COVID-19 at both local and global scales.
Collapse
Affiliation(s)
- Nouar Qutob
- Department of Health Sciences, Faculty of Graduate Studies, Arab American University, Ramallah, Palestine
| | - Zaidoun Salah
- Department of Health Sciences, Faculty of Graduate Studies, Arab American University, Ramallah, Palestine
- Present address: Al Quds Bard College, Al-Quds University, East Jerusalem, Palestine
| | - Damien Richard
- Institute of Child Health, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Hisham Darwish
- Department of Health Sciences, Faculty of Graduate Studies, Arab American University, Ramallah, Palestine
| | - Husam Sallam
- Department of Health Sciences, Faculty of Graduate Studies, Arab American University, Ramallah, Palestine
| | - Issa Shtayeh
- Palestinian Ministry of Health, Ramallah, Palestine
| | - Osama Najjar
- Palestinian Ministry of Health, Ramallah, Palestine
| | | | - Dana Najjar
- Department of Health Sciences, Faculty of Graduate Studies, Arab American University, Ramallah, Palestine
- Palestinian Ministry of Health, Ramallah, Palestine
| | | | - Lucy van Dorp
- UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
45
|
Page AJ, Mather AE, Le-Viet T, Meader EJ, Alikhan NF, Kay GL, de Oliveira Martins L, Aydin A, Baker DJ, Trotter AJ, Rudder S, Tedim AP, Kolyva A, Stanley R, Yasir M, Diaz M, Potter W, Stuart C, Meadows L, Bell A, Gutierrez AV, Thomson NM, Adriaenssens EM, Swingler T, Gilroy RAJ, Griffith L, Sethi DK, Aggarwal D, Brown CS, Davidson RK, Kingsley RA, Bedford L, Coupland LJ, Charles IG, Elumogo N, Wain J, Prakash R, Webber MA, Smith SJL, Chand M, Dervisevic S, O’Grady J. Large-scale sequencing of SARS-CoV-2 genomes from one region allows detailed epidemiology and enables local outbreak management. Microb Genom 2021; 7:000589. [PMID: 34184982 PMCID: PMC8461472 DOI: 10.1099/mgen.0.000589] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 04/19/2021] [Indexed: 01/28/2023] Open
Abstract
The COVID-19 pandemic has spread rapidly throughout the world. In the UK, the initial peak was in April 2020; in the county of Norfolk (UK) and surrounding areas, which has a stable, low-density population, over 3200 cases were reported between March and August 2020. As part of the activities of the national COVID-19 Genomics Consortium (COG-UK) we undertook whole genome sequencing of the SARS-CoV-2 genomes present in positive clinical samples from the Norfolk region. These samples were collected by four major hospitals, multiple minor hospitals, care facilities and community organizations within Norfolk and surrounding areas. We combined clinical metadata with the sequencing data from regional SARS-CoV-2 genomes to understand the origins, genetic variation, transmission and expansion (spread) of the virus within the region and provide context nationally. Data were fed back into the national effort for pandemic management, whilst simultaneously being used to assist local outbreak analyses. Overall, 1565 positive samples (172 per 100 000 population) from 1376 cases were evaluated; for 140 cases between two and six samples were available providing longitudinal data. This represented 42.6 % of all positive samples identified by hospital testing in the region and encompassed those with clinical need, and health and care workers and their families. In total, 1035 cases had genome sequences of sufficient quality to provide phylogenetic lineages. These genomes belonged to 26 distinct global lineages, indicating that there were multiple separate introductions into the region. Furthermore, 100 genetically distinct UK lineages were detected demonstrating local evolution, at a rate of ~2 SNPs per month, and multiple co-occurring lineages as the pandemic progressed. Our analysis: identified a discrete sublineage associated with six care facilities; found no evidence of reinfection in longitudinal samples; ruled out a nosocomial outbreak; identified 16 lineages in key workers which were not in patients, indicating infection control measures were effective; and found the D614G spike protein mutation which is linked to increased transmissibility dominates the samples and rapidly confirmed relatedness of cases in an outbreak at a food processing facility. The large-scale genome sequencing of SARS-CoV-2-positive samples has provided valuable additional data for public health epidemiology in the Norfolk region, and will continue to help identify and untangle hidden transmission chains as the pandemic evolves.
Collapse
Affiliation(s)
- Andrew J. Page
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Alison E. Mather
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Thanh Le-Viet
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Emma J. Meader
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | | | - Gemma L. Kay
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | | | - Alp Aydin
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - David J. Baker
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Alexander J. Trotter
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Steven Rudder
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Ana P. Tedim
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- Grupo de Investigación Biomédica en Sepsis - BioSepsis, Hospital Universitario Rio Hortega/Instituto de Investigación Biomédica de Salamanca (IBSAL), Valladolid/Salamanca, Spain
| | - Anastasia Kolyva
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Rachael Stanley
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Muhammad Yasir
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Maria Diaz
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Will Potter
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Claire Stuart
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Lizzie Meadows
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | - Andrew Bell
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
| | | | | | | | - Tracey Swingler
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | | | - Luke Griffith
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Dheeraj K. Sethi
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Dinesh Aggarwal
- Public Health England, 61 Colindale Ave., London, NW9 5EQ, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
- Cambridge University Hospital NHS Foundation Trust, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Colin S. Brown
- Public Health England, 61 Colindale Ave., London, NW9 5EQ, UK
| | - Rose K. Davidson
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Robert A. Kingsley
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Luke Bedford
- Ipswich Hospital, Heath Road, Ipswich, IP4 5PD, UK
| | | | - Ian G. Charles
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Ngozi Elumogo
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - John Wain
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Reenesh Prakash
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Mark A. Webber
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | | | - Meera Chand
- Public Health England, 61 Colindale Ave., London, NW9 5EQ, UK
| | - Samir Dervisevic
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
| | - Justin O’Grady
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - The COVID-19 Genomics UK (COG-UK) Consortium
- Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7UQ, UK
- University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
- Norfolk and Norwich University Hospital, Colney Lane, Norwich, NR4 7UY, UK
- Grupo de Investigación Biomédica en Sepsis - BioSepsis, Hospital Universitario Rio Hortega/Instituto de Investigación Biomédica de Salamanca (IBSAL), Valladolid/Salamanca, Spain
- Public Health England, 61 Colindale Ave., London, NW9 5EQ, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
- Cambridge University Hospital NHS Foundation Trust, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Ipswich Hospital, Heath Road, Ipswich, IP4 5PD, UK
- Public Health, County Hall, Martineau Lane, Norwich, NR1 2DH, UK
| |
Collapse
|
46
|
Turakhia Y, Thornlow B, Hinrichs AS, De Maio N, Gozashti L, Lanfear R, Haussler D, Corbett-Detig R. Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat Genet 2021; 53:809-816. [PMID: 33972780 PMCID: PMC9248294 DOI: 10.1038/s41588-021-00862-7] [Citation(s) in RCA: 183] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/31/2021] [Indexed: 02/03/2023]
Abstract
As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.
Collapse
Affiliation(s)
- Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Landen Gozashti
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA.
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
- National Research University Higher School of Economics, Moscow, Russian Federation.
| |
Collapse
|
47
|
Gozashti L, Corbett-Detig R. Shortcomings of SARS-CoV-2 genomic metadata. BMC Res Notes 2021; 14:189. [PMID: 34001211 PMCID: PMC8128092 DOI: 10.1186/s13104-021-05605-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 05/06/2021] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The SARS-CoV-2 pandemic has prompted one of the most extensive and expeditious genomic sequencing efforts in history. Each viral genome is accompanied by a set of metadata which supplies important information such as the geographic origin of the sample, age of the host, and the lab at which the sample was sequenced, and is integral to epidemiological efforts and public health direction. Here, we interrogate some shortcomings of metadata within the GISAID database to raise awareness of common errors and inconsistencies that may affect data-driven analyses and provide possible avenues for resolutions. RESULTS Our analysis reveals a startling prevalence of spelling errors and inconsistent naming conventions, which together occur in an estimated ~ 9.8% and ~ 11.6% of "originating lab" and "submitting lab" GISAID metadata entries respectively. We also find numerous ambiguous entries which provide very little information about the actual source of a sample and could easily associate with multiple sources worldwide. Importantly, all of these issues can impair the ability and accuracy of association studies by deceptively causing a group of samples to identify with multiple sources when they truly all identify with one source, or vice versa.
Collapse
Affiliation(s)
- Landen Gozashti
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, MA, 02138, USA. .,Department of Biomolecular Engineering and Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA.
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering and Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| |
Collapse
|
48
|
Pattabiraman C, Prasad P, George AK, Sreenivas D, Rasheed R, Reddy NVK, Desai A, Vasanthapuram R. Importation, circulation, and emergence of variants of SARS-CoV-2 in the South Indian state of Karnataka. Wellcome Open Res 2021; 6:110. [PMID: 35243004 PMCID: PMC8857524 DOI: 10.12688/wellcomeopenres.16768.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2021] [Indexed: 10/07/2023] Open
Abstract
Background: As the coronavirus disease 2019 (COVID-19) pandemic continues, the selection of genomic variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) associated with higher transmission, more severe disease, re-infection, and immune escape are a cause for concern. Such variants have been reported from the UK (B.1.1.7), South Africa (B.1.351) and, Brazil (P.1/B.1.1.28). We performed this study to track the importation, spread, and emergence of variants locally. Methods: We sequenced whole genomes of SARS-CoV-2 from international travellers (n=75) entering Karnataka, South India, between Dec 22, 2020 and Jan 31, 2021, and from positive cases in the city of Bengaluru (n=108), between Nov 22, 2020- Jan 22, 2021, as well as a local outbreak. We present the lineage distribution and analysis of these sequences. Results: Genomes from the study group into 34 lineages. Variant B.1.1.7 was introduced by international travel (24/73, 32.9%). Lineage B.1.36 and B.1 formed a major fraction of both imported (B.136: 20/73, 27.4%; B.1: 14/73, 19.2%), and circulating viruses (B.1.36: 45/103; 43.7%, B.1: 26/103; 25.2%). The lineage B.1.36 was also associated with a local outbreak. We detected nine amino acid changes, previously associated with immune escape, spread across multiple lineages. The N440K change was detected in 45/162 (27.7%) of the sequences. Conclusions: Our data support the idea that variants of concern spread by travel. Viruses with amino acid replacements associated with immune escape are already circulating. It is critical to check transmission and monitor changes in SARS-CoV-2 locally.
Collapse
Affiliation(s)
- Chitra Pattabiraman
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Pramada Prasad
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Anson K. George
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Darshan Sreenivas
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Risha Rasheed
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Nakka Vijay Kiran Reddy
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Anita Desai
- Neurovirology, National Institute of Mental Health and Neurosciences, India, Bangalore, Karnataka, 560029, India
| | - Ravi Vasanthapuram
- Nodal Officer Genetic Confirmation of SARS-CoV-2, Government of Karnataka, Bengaluru, India
| |
Collapse
|
49
|
De Maio N, Walker CR, Turakhia Y, Lanfear R, Corbett-Detig R, Goldman N. Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2. Genome Biol Evol 2021; 13:evab087. [PMID: 33895815 PMCID: PMC8135539 DOI: 10.1093/gbe/evab087] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2021] [Indexed: 12/23/2022] Open
Abstract
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
| | - Conor R Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
- Department of Genetics, University of Cambridge, United Kingdom
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
| |
Collapse
|
50
|
Chan WM, Ip JD, Chu AWH, Tse H, Tam AR, Li X, Kwan MYW, Yau YS, Leung WS, Chik TSH, To WK, Ng ACK, Yip CCY, Poon RWS, Chan KH, Wong SCY, Choi GKY, Lung DC, Cheng VCC, Hung IFN, Yuen KY, To KKW. Phylogenomic analysis of COVID-19 summer and winter outbreaks in Hong Kong: An observational study. THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2021; 10:100130. [PMID: 33778795 PMCID: PMC7985010 DOI: 10.1016/j.lanwpc.2021.100130] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 01/04/2023]
Abstract
BACKGROUND Viral genomic surveillance is vital for understanding the transmission of COVID-19. In Hong Kong, breakthrough outbreaks have occurred in July (third wave) and November (fourth wave) 2020. We used whole viral genome analysis to study the characteristics of these waves. METHODS We analyzed 509 SARS-CoV-2 genomes collected from Hong Kong patients between 22nd January and 29th November, 2020. Phylogenetic and phylodynamic analyses were performed, and were interpreted with epidemiological information. FINDINGS During the third and fourth waves, diverse SARS-CoV-2 genomes were identified among imported infections. Conversely, local infections were dominated by a single lineage during each wave, with 96.6% (259/268) in the third wave and 100% (73/73) in the fourth wave belonging to B.1.1.63 and B.1.36.27 lineages, respectively. While B.1.1.63 lineage was imported 2 weeks before the beginning of the third wave, B.1.36.27 lineage has circulated in Hong Kong for 2 months prior to the fourth wave. During the fourth wave, 50.7% (37/73) of local infections in November was identical to the viral genome from an imported case in September. Within B.1.1.63 or B.1.36.27 lineage in our cohort, the most common non-synonymous mutations occurred at the helicase (nsp13) gene. INTERPRETATION Although stringent measures have prevented most imported cases from spreading in Hong Kong, a single lineage with low-level local transmission in October and early November was responsible for the fourth wave. A superspreading event or lower temperature in November may have facilitated the spread of the B.1.36.27 lineage.
Collapse
Affiliation(s)
- Wan-Mui Chan
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Jonathan Daniel Ip
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Allen Wing-Ho Chu
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Herman Tse
- Department of Pathology, Hong Kong Children's Hospital, Kowloon, Hong Kong Special Administrative Region, China
| | - Anthony Raymond Tam
- Department of Medicine, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| | - Xin Li
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
- Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| | - Mike Yat-Wah Kwan
- Department of Paediatrics and Adolescent Medicine, Princess Margaret Hospital, Hong Kong Special Administrative Region, China
| | - Yat-Sun Yau
- Department of Paediatrics, Queen Elizabeth Hospital, Kowloon, Hong Kong Special Administrative Region, China
| | - Wai-Shing Leung
- Department of Medicine and Geriatrics, Princess Margaret Hospital, Hong Kong Special Administrative Region, China
| | - Thomas Shiu-Hong Chik
- Department of Medicine and Geriatrics, Princess Margaret Hospital, Hong Kong Special Administrative Region, China
| | - Wing-Kin To
- Department of Pathology, Princess Margaret Hospital, Hong Kong Special Administrative Region, China
| | - Anthony Chin-Ki Ng
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Cyril Chik-Yan Yip
- Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| | - Rosana Wing-Shan Poon
- Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| | - Kwok-Hung Chan
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Sally Cheuk-Ying Wong
- Department of Pathology, Hong Kong Children's Hospital, Kowloon, Hong Kong Special Administrative Region, China
| | - Garnet Kwan-Yue Choi
- Department of Pathology, Hong Kong Children's Hospital, Kowloon, Hong Kong Special Administrative Region, China
- Department of Pathology, Queen Elizabeth Hospital, Kowloon, Hong Kong Special Administrative Region, China
| | - David Christopher Lung
- Department of Pathology, Hong Kong Children's Hospital, Kowloon, Hong Kong Special Administrative Region, China
- Department of Pathology, Queen Elizabeth Hospital, Kowloon, Hong Kong Special Administrative Region, China
| | - Vincent Chi-Chung Cheng
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
- Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| | - Ivan Fan-Ngai Hung
- Department of Medicine, Queen Mary Hospital, Hong Kong Special Administrative Region, China
- Department of Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
| | - Kwok-Yung Yuen
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
- Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| | - Kelvin Kai-Wang To
- State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China
- Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China
| |
Collapse
|