1
|
Shiryev SA, Agarwala R. Indexing and searching petabase-scale nucleotide resources. Nat Methods 2024; 21:994-1002. [PMID: 38755321 PMCID: PMC11166510 DOI: 10.1038/s41592-024-02280-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 04/08/2024] [Indexed: 05/18/2024]
Abstract
Searching vast and rapidly growing nucleotide content in resources, such as runs in the Sequence Read Archive and assemblies for whole-genome shotgun sequencing projects in GenBank, is currently impractical for most researchers. Here we present Pebblescout, a tool that navigates such content by providing indexing and search capabilities. Indexing uses dense sampling of the sequences in the resource. Search finds subjects (runs or assemblies) that have short sequence matches to a user query, with well-defined guarantees and ranks them using informativeness of the matches. We illustrate the functionality of Pebblescout by creating eight databases that index over 3.7 petabases. The web service of Pebblescout can be reached at https://pebblescout.ncbi.nlm.nih.gov . We show that for a wide range of query lengths, Pebblescout provides a data-driven way for finding relevant subsets of large nucleotide resources, reducing the effort for downstream analysis substantially. We also show that Pebblescout results compare favorably to MetaGraph and Sourmash.
Collapse
Affiliation(s)
- Sergey A Shiryev
- Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Richa Agarwala
- Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
Ferreira RC, Alves GV, Ramon M, Antoneli F, Briones MRS. Reconstructing Prehistoric Viral Genomes from Neanderthal Sequencing Data. Viruses 2024; 16:856. [PMID: 38932149 DOI: 10.3390/v16060856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/20/2024] [Accepted: 05/24/2024] [Indexed: 06/28/2024] Open
Abstract
DNA viruses that produce persistent infections have been proposed as potential causes for the extinction of Neanderthals, and, therefore, the identification of viral genome remnants in Neanderthal sequence reads is an initial step to address this hypothesis. Here, as proof of concept, we searched for viral remnants in sequence reads of Neanderthal genome data by mapping to adenovirus, herpesvirus and papillomavirus, which are double-stranded DNA viruses that may establish lifelong latency and can produce persistent infections. The reconstructed ancient viral genomes of adenovirus, herpesvirus and papillomavirus revealed conserved segments, with nucleotide identity to extant viral genomes and variable regions in coding regions with substantial divergence to extant close relatives. Sequence reads mapped to extant viral genomes showed deamination patterns of ancient DNA, and these ancient viral genomes showed divergence consistent with the age of these samples (≈50,000 years) and viral evolutionary rates (10-5 to 10-8 substitutions/site/year). Analysis of random effects showed that the Neanderthal mapping to genomes of extant persistent viruses is above what is expected by random similarities of short reads. Also, negative control with a nonpersistent DNA virus does not yield statistically significant assemblies. This work demonstrates the feasibility of identifying viral genome remnants in archaeological samples with signal-to-noise assessment.
Collapse
Affiliation(s)
- Renata C Ferreira
- Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo, SP 04039-032, Brazil
- Epigene LLC, São Paulo, SP 04537-080, Brazil
| | - Gustavo V Alves
- Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo, SP 04039-032, Brazil
| | | | - Fernando Antoneli
- Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo, SP 04039-032, Brazil
| | - Marcelo R S Briones
- Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo, SP 04039-032, Brazil
| |
Collapse
|
3
|
Gupta P, Hiller A, Chowdhury J, Lim D, Lim DY, Saeij JPJ, Babaian A, Rodriguez F, Pereira L, Morales-Tapia A. A parasite odyssey: An RNA virus concealed in Toxoplasma gondii. Virus Evol 2024; 10:veae040. [PMID: 38817668 PMCID: PMC11137675 DOI: 10.1093/ve/veae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 03/05/2024] [Accepted: 05/10/2024] [Indexed: 06/01/2024] Open
Abstract
We are entering a 'Platinum Age of Virus Discovery', an era marked by exponential growth in the discovery of virus biodiversity, and driven by advances in metagenomics and computational analysis. In the ecosystem of a human (or any animal) there are more species of viruses than simply those directly infecting the animal cells. Viruses can infect all organisms constituting the microbiome, including bacteria, fungi, and unicellular parasites. Thus the complexity of possible interactions between host, microbe, and viruses is unfathomable. To understand this interaction network we must employ computationally assisted virology as a means of analyzing and interpreting the millions of available samples to make inferences about the ways in which viruses may intersect human health. From a computational viral screen of human neuronal datasets, we identified a novel narnavirus Apocryptovirus odysseus (Ao) which likely infects the neurotropic parasite Toxoplasma gondii. Previously, several parasitic protozoan viruses (PPVs) have been mechanistically established as triggers of host innate responses, and here we present in silico evidence that Ao is a plausible pro-inflammatory factor in human and mouse cells infected by T. gondii. T. gondii infects billions of people worldwide, yet the prognosis of toxoplasmosis disease is highly variable, and PPVs like Ao could function as a hitherto undescribed hypervirulence factor. In a broader screen of over 7.6 million samples, we explored phylogenetically proximal viruses to Ao and discovered nineteen Apocryptovirus species, all found in libraries annotated as vertebrate transcriptome or metatranscriptomes. While samples containing this genus of narnaviruses are derived from sheep, goat, bat, rabbit, chicken, and pigeon samples, the presence of virus is strongly predictive of parasitic Apicomplexa nucleic acid co-occurrence, supporting the fact that Apocryptovirus is a genus of parasite-infecting viruses. This is a computational proof-of-concept study in which we rapidly analyze millions of datasets from which we distilled a mechanistically, ecologically, and phylogenetically refined hypothesis. We predict that this highly diverged Ao RNA virus is biologically a T. gondii infection, and that Ao, and other viruses like it, will modulate this disease which afflicts billions worldwide.
Collapse
Affiliation(s)
- Purav Gupta
- The Woodlands Secondary School, 3225 Erindale Station Rd,Mississauga, ON L5C 1Y5, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| | - Aiden Hiller
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| | - Jawad Chowdhury
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| | - Declan Lim
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| | - Dillon Yee Lim
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
- Department of Physiology, Anatomy and Genetics, University of Oxford, Sherrington Building, Sherrington Road, Oxford, Oxfordshire, OX1 3PT, UK
| | - Jeroen P J Saeij
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
- Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, 1 Shields Ave, Davis, CA 95616, USA
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| | - Felipe Rodriguez
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
- Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, 1 Shields Ave, Davis, CA 95616, USA
| | - Luke Pereira
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| | - Alejandro Morales-Tapia
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Toronto, ON M5S 1A8, Canada
- The Donnelly Centre for Cellular + Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- The Woodlands Secondary School, 3225 Erindale Station Rd, Mississauga, ON L5C 1Y5, Canada
| |
Collapse
|
4
|
Luescher AM, Gimpel AL, Stark WJ, Heckel R, Grass RN. Chemical unclonable functions based on operable random DNA pools. Nat Commun 2024; 15:2955. [PMID: 38580696 PMCID: PMC10997750 DOI: 10.1038/s41467-024-47187-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/25/2024] [Indexed: 04/07/2024] Open
Abstract
Physical unclonable functions (PUFs) based on unique tokens generated by random manufacturing processes have been proposed as an alternative to mathematical one-way algorithms. However, these tokens are not distributable, which is a disadvantage for decentralized applications. Finding unclonable, yet distributable functions would help bridge this gap and expand the applications of object-bound cryptography. Here we show that large random DNA pools with a segmented structure of alternating constant and randomly generated portions are able to calculate distinct outputs from millions of inputs in a specific and reproducible manner, in analogy to physical unclonable functions. Our experimental data with pools comprising up to >1010 unique sequences and encompassing >750 comparisons of resulting outputs demonstrate that the proposed chemical unclonable function (CUF) system is robust, distributable, and scalable. Based on this proof of concept, CUF-based anti-counterfeiting systems, non-fungible objects and decentralized multi-user authentication are conceivable.
Collapse
Affiliation(s)
- Anne M Luescher
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Andreas L Gimpel
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Wendelin J Stark
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Reinhard Heckel
- Department of Computer Engineering, Technical University of Munich, Arcisstrasse 21, 80333, Munich, Germany
| | - Robert N Grass
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland.
| |
Collapse
|
5
|
Fitak RR. The magneto-microbiome: A dataset of the metagenomic distribution of magnetotactic bacteria. Data Brief 2024; 53:110073. [PMID: 38317726 PMCID: PMC10838685 DOI: 10.1016/j.dib.2024.110073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/09/2023] [Accepted: 01/15/2024] [Indexed: 02/07/2024] Open
Abstract
Magnetotactic bacteria (MTB) are diverse prokaryotes characterized by their ability to generate biogenic magnetic iron crystals. MTB are ubiquitous across aquatic environments, and growing evidence has indicated they may be present in association with animal microbiomes. Unfortunately, they are difficult to culture in vitro and more studies understanding their biogeographical distribution and ecological roles are needed. To provide data regarding the patterns of diversity and distribution of MTB, we screened the entire Sequence Read Archive (SRA) from the National Center for Biotechnology Information for DNA sequencing reads matching known MTB taxa. The dataset summarizes the count of reads assigned to MTB from more than 26 million SRA accessions comprising approximately 80 petabases (7.98 × 1016) of DNA. More than 396 million DNA sequencing reads were assigned to 214 MTB taxa in 691,086 (2.65 %) SRA accessions. The final dataset can be utilized by researchers to narrow their efforts in examination of both environmental and ecological roles of specific MTB or to identify potential host organisms. These data will be instrumental to further elucidating the importance and utility of these enigmatic bacteria.
Collapse
Affiliation(s)
- Robert R. Fitak
- Department of Biology, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
6
|
Connor R, Shakya M, Yarmosh DA, Maier W, Martin R, Bradford R, Brister JR, Chain PSG, Copeland CA, di Iulio J, Hu B, Ebert P, Gunti J, Jin Y, Katz KS, Kochergin A, LaRosa T, Li J, Li PE, Lo CC, Rashid S, Maiorova ES, Xiao C, Zalunin V, Purcell L, Pruitt KD. Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows. Viruses 2024; 16:430. [PMID: 38543795 PMCID: PMC10975397 DOI: 10.3390/v16030430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/12/2024] [Accepted: 02/16/2024] [Indexed: 04/01/2024] Open
Abstract
Genomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants. Here, we show that for both Illumina and Oxford Nanopore sequencing platforms, downstream bioinformatic protocols used by industry, government, and academic groups resulted in different virus sequences from same sample. These bioinformatic workflows produced consensus genomes with differences in single nucleotide polymorphisms, inclusion and exclusion of insertions, and/or deletions, despite using the same raw sequence as input datasets. Here, we compared and characterized such discrepancies and propose a specific suite of parameters and protocols that should be adopted across the field. Consistent results from bioinformatic workflows are fundamental to SARS-CoV-2 and future pathogen surveillance efforts, including pandemic preparation, to allow for a data-driven and timely public health response.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - David A. Yarmosh
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
- BEI Resources, Manassas, VA 20110, USA
| | - Wolfgang Maier
- Galaxy Europe Team, University of Freiburg, 79085 Freiburg, Germany;
| | - Ross Martin
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Rebecca Bradford
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
- BEI Resources, Manassas, VA 20110, USA
| | - J. Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Patrick S. G. Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | | | - Julia di Iulio
- Vir Biotechnology Inc., San Francisco, CA 94158, USA; (J.d.I.); (L.P.)
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Philip Ebert
- Eli Lilly and Company, Indianapolis, IN 46225, USA;
| | - Jonathan Gunti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Yumi Jin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Kenneth S. Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Andrey Kochergin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Tré LaRosa
- Deloitte Consulting LLP, Rosslyn, VA 22209, USA; (C.A.C.); (T.L.)
| | - Jiani Li
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Po-E Li
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA; (M.S.); (P.S.G.C.); (B.H.); (P.-E.L.); (C.-C.L.)
| | - Sujatha Rashid
- American Type Culture Collection, Manassas, VA 20110, USA; (D.A.Y.); (R.B.); (S.R.)
| | - Evguenia S. Maiorova
- Clinical Virology Department, Gilead Sciences, Foster City, CA 94404, USA; (R.M.); (J.L.); (E.S.M.)
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| | - Lisa Purcell
- Vir Biotechnology Inc., San Francisco, CA 94158, USA; (J.d.I.); (L.P.)
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA; (R.C.); (J.R.B.); (J.G.); (Y.J.); (K.S.K.); (A.K.); (C.X.); (V.Z.)
| |
Collapse
|
7
|
Guinet B, Leobold M, Herniou EA, Bloin P, Burlet N, Bredlau J, Navratil V, Ravallec M, Uzbekov R, Kester K, Gundersen Rindal D, Drezen JM, Varaldi J, Bézier A. A novel and diverse family of filamentous DNA viruses associated with parasitic wasps. Virus Evol 2024; 10:veae022. [PMID: 38617843 PMCID: PMC11013392 DOI: 10.1093/ve/veae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/20/2023] [Accepted: 02/23/2024] [Indexed: 04/16/2024] Open
Abstract
Large dsDNA viruses from the Naldaviricetes class are currently composed of four viral families infecting insects and/or crustaceans. Since the 1970s, particles described as filamentous viruses (FVs) have been observed by electronic microscopy in several species of Hymenoptera parasitoids but until recently, no genomic data was available. This study provides the first comparative morphological and genomic analysis of these FVs. We analyzed the genomes of seven FVs, six of which were newly obtained, to gain a better understanding of their evolutionary history. We show that these FVs share all genomic features of the Naldaviricetes while encoding five specific core genes that distinguish them from their closest relatives, the Hytrosaviruses. By mining public databases, we show that FVs preferentially infect Hymenoptera with parasitoid lifestyle and that these viruses have been repeatedly integrated into the genome of many insects, particularly Hymenoptera parasitoids, overall suggesting a long-standing specialization of these viruses to parasitic wasps. Finally, we propose a taxonomical revision of the class Naldaviricetes in which FVs related to the Leptopilina boulardi FV constitute a fifth family. We propose to name this new family, Filamentoviridae.
Collapse
Affiliation(s)
- Benjamin Guinet
- LBBE, UMR CNRS 5558, Universite Claude Bernard Lyon 1, 43 bd du 11 novembre 1918, Villeurbanne CEDEX F-69622, France
| | - Matthieu Leobold
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS-Université de Tours, 20 Avenue Monge, Parc de Grandmont, Tours 37200, France
| | - Elisabeth A Herniou
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS-Université de Tours, 20 Avenue Monge, Parc de Grandmont, Tours 37200, France
| | - Pierrick Bloin
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS-Université de Tours, 20 Avenue Monge, Parc de Grandmont, Tours 37200, France
| | - Nelly Burlet
- LBBE, UMR CNRS 5558, Universite Claude Bernard Lyon 1, 43 bd du 11 novembre 1918, Villeurbanne CEDEX F-69622, France
| | - Justin Bredlau
- Department of Biology, Virginia Commonwealth University, 1000 W. Cary Street, Room 126, Richmond, VA 23284-9067, USA
| | - Vincent Navratil
- PRABI, Rhône-Alpes Bioinformatics Center, Université Lyon 1, 43 bd du 11 novembre 1918, Villeurbanne CEDEX 69622, France
- UMS 3601, Institut Français de Bioinformatique, IFB-Core, 2 rue Gaston Crémieu, Évry CEDEX 91057, France
- European Virus Bioinformatics Center, Leutragraben 1, Jena 07743, Germany
| | - Marc Ravallec
- Diversité, génomes et interactions microorganismes insectes (DGIMI), UMR 1333 INRA, Université de Montpellier 2, 2 Place Eugène Bataillon cc101, Montpellier CEDEX 5 34095, France
| | - Rustem Uzbekov
- Laboratory of Cell Biology and Electron Microscopy, Faculty of Medicine, Université de Tours, 10 bd Tonnelle, BP 3223, Tours CEDEX 37032, France
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskye Gory 73, Moscow 119992, Russia
| | - Karen Kester
- Department of Biology, Virginia Commonwealth University, 1000 W. Cary Street, Room 126, Richmond, VA 23284-9067, USA
| | - Dawn Gundersen Rindal
- USDA-ARS Invasive Insect Biocontrol and Behavior Laboratory, Beltsville, MD 20705, USA
| | - Jean-Michel Drezen
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS-Université de Tours, 20 Avenue Monge, Parc de Grandmont, Tours 37200, France
| | - Julien Varaldi
- LBBE, UMR CNRS 5558, Universite Claude Bernard Lyon 1, 43 bd du 11 novembre 1918, Villeurbanne CEDEX F-69622, France
| | - Annie Bézier
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS-Université de Tours, 20 Avenue Monge, Parc de Grandmont, Tours 37200, France
| |
Collapse
|
8
|
Alvarez RV, Landsman D. GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024; 25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open
Abstract
The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA.
| |
Collapse
|
9
|
Sayers E, Beck J, Bolton E, Brister J, Chan J, Comeau D, Connor R, DiCuccio M, Farrell C, Feldgarden M, Fine A, Funk K, Hatcher E, Hoeppner M, Kane M, Kannan S, Katz K, Kelly C, Klimke W, Kim S, Kimchi A, Landrum M, Lathrop S, Lu Z, Malheiro A, Marchler-Bauer A, Murphy T, Phan L, Prasad A, Pujar S, Sawyer A, Schmieder E, Schneider V, Schoch C, Sharma S, Thibaud-Nissen F, Trawick B, Venkatapathi T, Wang J, Pruitt K, Sherry S. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2024; 52:D33-D43. [PMID: 37994677 PMCID: PMC10767890 DOI: 10.1093/nar/gkad1044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/24/2023] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jeff Beck
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jessica Chan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael DiCuccio
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael Feldgarden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anna M Fine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Eneida Hatcher
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Marilu Hoeppner
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Megan Kane
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sivakumar Kannan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Christopher Kelly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Avi Kimchi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Melissa Landrum
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Adriana Malheiro
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Arjun B Prasad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Amanda Sawyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Erin Schmieder
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Conrad L Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shobha Sharma
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Barton W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Thilakam Venkatapathi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
10
|
Hall MB, Coin LJM. Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data. Gigascience 2024; 13:giae010. [PMID: 38573185 PMCID: PMC10993716 DOI: 10.1093/gigascience/giae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/10/2024] [Accepted: 02/27/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. RESULTS We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. CONCLUSIONS Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.
Collapse
Affiliation(s)
- Michael B Hall
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, 3000 Victoria, Australia
| | - Lachlan J M Coin
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, 3000 Victoria, Australia
| |
Collapse
|
11
|
Power JF, Carere CR, Welford HE, Hudson DT, Lee KC, Moreau JW, Ettema TJG, Reysenbach AL, Lee CK, Colman DR, Boyd ES, Morgan XC, McDonald IR, Craig Cary S, Stott MB. A genus in the bacterial phylum Aquificota appears to be endemic to Aotearoa-New Zealand. Nat Commun 2024; 15:179. [PMID: 38167814 PMCID: PMC10762115 DOI: 10.1038/s41467-023-43960-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 11/24/2023] [Indexed: 01/05/2024] Open
Abstract
Allopatric speciation has been difficult to examine among microorganisms, with prior reports of endemism restricted to sub-genus level taxa. Previous microbial community analysis via 16S rRNA gene sequencing of 925 geothermal springs from the Taupō Volcanic Zone (TVZ), Aotearoa-New Zealand, revealed widespread distribution and abundance of a single bacterial genus across 686 of these ecosystems (pH 1.2-9.6 and 17.4-99.8 °C). Here, we present evidence to suggest that this genus, Venenivibrio (phylum Aquificota), is endemic to Aotearoa-New Zealand. A specific environmental niche that increases habitat isolation was identified, with maximal read abundance of Venenivibrio occurring at pH 4-6, 50-70 °C, and low oxidation-reduction potentials. This was further highlighted by genomic and culture-based analyses of the only characterised species for the genus, Venenivibrio stagnispumantis CP.B2T, which confirmed a chemolithoautotrophic metabolism dependent on hydrogen oxidation. While similarity between Venenivibrio populations illustrated that dispersal is not limited across the TVZ, extensive amplicon, metagenomic, and phylogenomic analyses of global microbial communities from DNA sequence databases indicates Venenivibrio is geographically restricted to the Aotearoa-New Zealand archipelago. We conclude that geographic isolation, complemented by physicochemical constraints, has resulted in the establishment of an endemic bacterial genus.
Collapse
Affiliation(s)
- Jean F Power
- Thermophile Research Unit, Te Aka Mātuatua | School of Science, Te Whare Wānanga o Waikato | University of Waikato, Hamilton, 3240, Aotearoa New Zealand
| | - Carlo R Carere
- Te Tari Pūhanga Tukanga Matū | Department of Chemical and Process Engineering, Te Whare Wānanga o Waitaha | University of Canterbury, Christchurch, 8140, Aotearoa New Zealand
| | - Holly E Welford
- Te Kura Pūtaiao Koiora | School of Biological Sciences, Te Whare Wānanga o Waitaha | University of Canterbury, Christchurch, 8140, Aotearoa New Zealand
| | - Daniel T Hudson
- Te Tari Moromoroiti me te Ārai Mate | Department of Microbiology and Immunology, Te Whare Wānanga o Ōtākou | University of Otago, Dunedin, 9054, Aotearoa New Zealand
| | - Kevin C Lee
- Te Kura Pūtaiao | School of Science, Te Wānanga Aronui o Tāmaki Makau Rau | Auckland University of Technology, Auckland, 1010, Aotearoa New Zealand
| | - John W Moreau
- School of Geographical & Earth Sciences, University of Glasgow, Glasgow, G12 8RZ, UK
| | - Thijs J G Ettema
- Laboratory of Microbiology, Wageningen University & Research, 6708, WE, Wageningen, the Netherlands
| | | | - Charles K Lee
- Thermophile Research Unit, Te Aka Mātuatua | School of Science, Te Whare Wānanga o Waikato | University of Waikato, Hamilton, 3240, Aotearoa New Zealand
| | - Daniel R Colman
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, MT, 59717, USA
| | - Eric S Boyd
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, MT, 59717, USA
| | - Xochitl C Morgan
- Te Tari Moromoroiti me te Ārai Mate | Department of Microbiology and Immunology, Te Whare Wānanga o Ōtākou | University of Otago, Dunedin, 9054, Aotearoa New Zealand
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Ian R McDonald
- Thermophile Research Unit, Te Aka Mātuatua | School of Science, Te Whare Wānanga o Waikato | University of Waikato, Hamilton, 3240, Aotearoa New Zealand
| | - S Craig Cary
- Thermophile Research Unit, Te Aka Mātuatua | School of Science, Te Whare Wānanga o Waikato | University of Waikato, Hamilton, 3240, Aotearoa New Zealand.
| | - Matthew B Stott
- Te Kura Pūtaiao Koiora | School of Biological Sciences, Te Whare Wānanga o Waitaha | University of Canterbury, Christchurch, 8140, Aotearoa New Zealand.
| |
Collapse
|
12
|
Curd EE, Gal L, Gallego R, Silliman K, Nielsen S, Gold Z. rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R. ENVIRONMENTAL DNA (HOBOKEN, N.J.) 2024; 6:e489. [PMID: 38370872 PMCID: PMC10871694 DOI: 10.1002/edn3.489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 10/19/2023] [Indexed: 02/20/2024]
Abstract
The sequencing revolution requires accurate taxonomic classification of DNA sequences. Key to making accurate taxonomic assignments are curated, comprehensive reference barcode databases. However, the generation and curation of such databases has remained challenging given the large and continuously growing volumes of both DNA sequence data and novel reference barcode targets. Monitoring and research applications require a greater diversity of specialized gene regions and targeted taxa then are currently curated by professional staff. Thus there is a growing need for an easy to implement computational tool that can generate comprehensive metabarcoding reference libraries for any bespoke locus. We address this need by reimagining CRUX from the Anacapa Toolkit and present the rCRUX package in R which, like it's predecessor, relies on sequence homology and PCR primer compatibility instead of keyword-searches to avoid limitations of user-defined metadata. The typical workflow involves searching for plausible seed amplicons (get_seeds_local() or get_seeds_remote()) by simulating in silico PCR to acquire a set of sequences analogous to PCR products containing a user-defined set of primer sequences. Next, these seeds are used to iteratively blast search seed sequences against a local copy of the National Center for Biotechnology Information (NCBI) formatted nt database using a taxonomic-rank based stratified random sampling approach ( blast_seeds() ). This results in a comprehensive set of sequence matches. This database is dereplicated and cleaned (derep_and_clean_db()) by identifying identical reference sequences and collapsing the taxonomic path to the lowest taxonomic agreement across all matching reads. This results in a curated, comprehensive database of primer-specific reference barcode sequences from NCBI. Databases can then be compared (compare_db()) to determine read and taxonomic overlap. We demonstrate that rCRUX provides more comprehensive reference databases for the MiFish Universal Teleost 12S, Taberlet trnl, fungal ITS, and Leray CO1 loci than CRABS, MetaCurator, RESCRIPt, and ecoPCR reference databases. We then further demonstrate the utility of rCRUX by generating 24 reference databases for 20 metabarcoding loci, many of which lack dedicated reference database curation efforts. The rCRUX package provides a simple to use tool for the generation of curated, comprehensive reference databases for user-defined loci, facilitating accurate and effective taxonomic classification of metabarcoding and DNA sequence efforts broadly.
Collapse
Affiliation(s)
- Emily E. Curd
- Vermont Biomedical Research Network, University of Vermont, VT, USA
| | - Luna Gal
- Landmark College, VT, USA
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
| | - Ramon Gallego
- Departamento de Biología, Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain
| | - Katherine Silliman
- Northern Gulf Institute, Mississippi State University, Starkville, MS, USA
- NOAA Atlantic Oceanographic and Meteorological Laboratory, Miami, FL, USA
| | | | - Zachary Gold
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
- NOAA Pacific Marine Environmental Laboratory, Seattle, WA, USA
| |
Collapse
|
13
|
Ospino MC, Engel K, Ruiz-Navas S, Binns WJ, Doxey AC, Neufeld JD. Evaluation of multiple displacement amplification for metagenomic analysis of low biomass samples. ISME COMMUNICATIONS 2024; 4:ycae024. [PMID: 38500705 PMCID: PMC10945365 DOI: 10.1093/ismeco/ycae024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 02/05/2024] [Accepted: 02/12/2024] [Indexed: 03/20/2024]
Abstract
Combining multiple displacement amplification (MDA) with metagenomics enables the analysis of samples with extremely low DNA concentrations, making them suitable for high-throughput sequencing. Although amplification bias and nonspecific amplification have been reported from MDA-amplified samples, the impact of MDA on metagenomic datasets is not well understood. We compared three MDA methods (i.e. bulk MDA, emulsion MDA, and primase MDA) for metagenomic analysis of two DNA template concentrations (approx. 1 and 100 pg) derived from a microbial community standard "mock community" and two low biomass environmental samples (i.e. borehole fluid and groundwater). We assessed the impact of MDA on metagenome-based community composition, assembly quality, functional profiles, and binning. We found amplification bias against high GC content genomes but relatively low nonspecific amplification such as chimeras, artifacts, or contamination for all MDA methods. We observed MDA-associated representational bias for microbial community profiles, especially for low-input DNA and with the primase MDA method. Nevertheless, similar taxa were represented in MDA-amplified libraries to those of unamplified samples. The MDA libraries were highly fragmented, but similar functional profiles to the unamplified libraries were obtained for bulk MDA and emulsion MDA at higher DNA input and across these MDA libraries for the groundwater sample. Medium to low-quality bins were possible for the high input bulk MDA metagenomes for the most simple microbial communities, borehole fluid, and mock community. Although MDA-based amplification should be avoided, it can still reveal meaningful taxonomic and functional information from samples with extremely low DNA concentration where direct metagenomics is otherwise impossible.
Collapse
Affiliation(s)
| | - Katja Engel
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Santiago Ruiz-Navas
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Jeffrey Binns
- Safety and Technical Research, Nuclear Waste Management Organization of Canada, Toronto, Ontario M4T 2S3, Canada
| | - Andrew C Doxey
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Josh D Neufeld
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
14
|
Wu-Woods NJ, Barlow JT, Trigodet F, Shaw DG, Romano AE, Jabri B, Eren AM, Ismagilov RF. Microbial-enrichment method enables high-throughput metagenomic characterization from host-rich samples. Nat Methods 2023; 20:1672-1682. [PMID: 37828152 PMCID: PMC10885704 DOI: 10.1038/s41592-023-02025-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 08/27/2023] [Indexed: 10/14/2023]
Abstract
Host-microbe interactions have been linked to health and disease states through the use of microbial taxonomic profiling, mostly via 16S ribosomal RNA gene sequencing. However, many mechanistic insights remain elusive, in part because studying the genomes of microbes associated with mammalian tissue is difficult due to the high ratio of host to microbial DNA in such samples. Here we describe a microbial-enrichment method (MEM), which we demonstrate on a wide range of sample types, including saliva, stool, intestinal scrapings, and intestinal mucosal biopsies. MEM enabled high-throughput characterization of microbial metagenomes from human intestinal biopsies by reducing host DNA more than 1,000-fold with minimal microbial community changes (roughly 90% of taxa had no significant differences between MEM-treated and untreated control groups). Shotgun sequencing of MEM-treated human intestinal biopsies enabled characterization of both high- and low-abundance microbial taxa, pathways and genes longitudinally along the gastrointestinal tract. We report the construction of metagenome-assembled genomes directly from human intestinal biopsies for bacteria and archaea at relative abundances as low as 1%. Analysis of metagenome-assembled genomes reveals distinct subpopulation structures between the small and large intestine for some taxa. MEM opens a path for the microbiome field to acquire deeper insights into host-microbe interactions by enabling in-depth characterization of host-tissue-associated microbial communities.
Collapse
Affiliation(s)
- Natalie J Wu-Woods
- Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA
| | - Jacob T Barlow
- Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA
| | - Florian Trigodet
- Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Dustin G Shaw
- Department of Medicine, The University of Chicago, Chicago, IL, USA
- Committee on Immunology, The University of Chicago, Chicago, IL, USA
- Department of Pathology, The University of Chicago, Chicago, IL, USA
| | - Anna E Romano
- Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, USA
| | - Bana Jabri
- Department of Medicine, The University of Chicago, Chicago, IL, USA
- Committee on Immunology, The University of Chicago, Chicago, IL, USA
- Department of Pathology, The University of Chicago, Chicago, IL, USA
| | - A Murat Eren
- Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA, USA
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, Oldenburg, Germany
- Alfred-Wegener-Institute for Marine and Polar Research, Bremerhaven, Germany
- Helmholtz Institute for Functional Marine Biodiversity, Oldenburg, Germany
| | - Rustem F Ismagilov
- Biology and Bioengineering, California Institute of Technology, Pasadena, CA, USA.
- Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, USA.
| |
Collapse
|
15
|
Wiegand S, Sobol M, Schnepp-Pesch LK, Yan G, Iqbal S, Vollmers J, Müller JA, Kaster AK. Taxonomic Re-Classification and Expansion of the Phylum Chloroflexota Based on over 5000 Genomes and Metagenome-Assembled Genomes. Microorganisms 2023; 11:2612. [PMID: 37894270 PMCID: PMC10608941 DOI: 10.3390/microorganisms11102612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/20/2023] [Accepted: 10/21/2023] [Indexed: 10/29/2023] Open
Abstract
The phylum Chloroflexota (formerly Chloroflexi) encompasses metabolically diverse bacteria that often have high prevalence in terrestrial and aquatic habitats, some even with biotechnological application. However, there is substantial disagreement in public databases which lineage should be considered a member of the phylum and at what taxonomic level. Here, we addressed these issues through extensive phylogenomic analyses. The analyses were based on a collection of >5000 Chloroflexota genomes and metagenome-assembled genomes (MAGs) from public databases, novel environmental sites, as well as newly generated MAGs from publicly available sequence reads via an improved binning approach incorporating covariance information. Based on calculated relative evolutionary divergence, we propose that Candidatus Dormibacterota should be listed as a class (i.e., Ca. Dormibacteria) within Chloroflexota together with the classes Anaerolineae, Chloroflexia, Dehalococcoidia, Ktedonobacteria, Ca. Limnocylindria, Thermomicrobia, and two other classes containing only uncultured members. All other Chloroflexota lineages previously listed at the class rank appear to be rather orders or families in the Anaerolineae and Dehalococcoidia, which contain the vast majority of genomes and exhibited the strongest phylogenetic radiation within the phylum. Furthermore, the study suggests that a common ecophysiological capability of members of the phylum is to successfully cope with low energy fluxes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Anne-Kristin Kaster
- Institute for Biological Interfaces (IBG 5), Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany; (S.W.); (M.S.); (L.K.S.-P.); (G.Y.); (S.I.); (J.V.); (J.A.M.)
| |
Collapse
|
16
|
Lim HGM, Fann YC, Lee YCG. COWID: an efficient cloud-based genomics workflow for scalable identification of SARS-COV-2. Brief Bioinform 2023; 24:bbad280. [PMID: 37738400 PMCID: PMC10516370 DOI: 10.1093/bib/bbad280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 07/15/2023] [Accepted: 07/19/2023] [Indexed: 09/24/2023] Open
Abstract
Implementing a specific cloud resource to analyze extensive genomic data on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses a challenge when resources are limited. To overcome this, we repurposed a cloud platform initially designed for use in research on cancer genomics (https://cgc.sbgenomics.com) to enable its use in research on SARS-CoV-2 to build Cloud Workflow for Viral and Variant Identification (COWID). COWID is a workflow based on the Common Workflow Language that realizes the full potential of sequencing technology for use in reliable SARS-CoV-2 identification and leverages cloud computing to achieve efficient parallelization. COWID outperformed other contemporary methods for identification by offering scalable identification and reliable variant findings with no false-positive results. COWID typically processed each sample of raw sequencing data within 5 min at a cost of only US$0.01. The COWID source code is publicly available (https://github.com/hendrick0403/COWID) and can be accessed on any computer with Internet access. COWID is designed to be user-friendly; it can be implemented without prior programming knowledge. Therefore, COWID is a time-efficient tool that can be used during a pandemic.
Collapse
Affiliation(s)
- Hendrick Gao-Min Lim
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan 11031
- Department of Medical Research, Tzu Chi Hospital Indonesia, Pantai Indah Kapuk, Greater Jakarta, Indonesia 14470
| | - Yang C Fann
- IT and Bioinformatics Program, Division of Intramural, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA 20892
| | - Yuan-Chii Gladys Lee
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan 11031
| |
Collapse
|
17
|
Hodgins HP, Chen P, Lobb B, Wei X, Tremblay BJM, Mansfield MJ, Lee VCY, Lee PG, Coffin J, Duggan AT, Dolphin AE, Renaud G, Dong M, Doxey AC. Ancient Clostridium DNA and variants of tetanus neurotoxins associated with human archaeological remains. Nat Commun 2023; 14:5475. [PMID: 37673908 PMCID: PMC10482840 DOI: 10.1038/s41467-023-41174-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
The analysis of microbial genomes from human archaeological samples offers a historic snapshot of ancient pathogens and provides insights into the origins of modern infectious diseases. Here, we analyze metagenomic datasets from 38 human archaeological samples and identify bacterial genomic sequences related to modern-day Clostridium tetani, which produces the tetanus neurotoxin (TeNT) and causes the disease tetanus. These genomic assemblies had varying levels of completeness, and a subset of them displayed hallmarks of ancient DNA damage. Phylogenetic analyses revealed known C. tetani clades as well as potentially new Clostridium lineages closely related to C. tetani. The genomic assemblies encode 13 TeNT variants with unique substitution profiles, including a subgroup of TeNT variants found exclusively in ancient samples from South America. We experimentally tested a TeNT variant selected from an ancient Chilean mummy sample and found that it induced tetanus muscle paralysis in mice, with potency comparable to modern TeNT. Thus, our ancient DNA analysis identifies DNA from neurotoxigenic C. tetani in archaeological human samples, and a novel variant of TeNT that can cause disease in mammals.
Collapse
Affiliation(s)
- Harold P Hodgins
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Pengsheng Chen
- Department of Urology, Boston Children's Hospital, Boston, MA, USA
- Department of Surgery and Department of Microbiology, Harvard Medical School, Boston, MA, USA
| | - Briallen Lobb
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Xin Wei
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Benjamin J M Tremblay
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Michael J Mansfield
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
| | - Victoria C Y Lee
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Pyung-Gang Lee
- Department of Urology, Boston Children's Hospital, Boston, MA, USA
- Department of Surgery and Department of Microbiology, Harvard Medical School, Boston, MA, USA
| | - Jeffrey Coffin
- Department of Anthropology, University of Waterloo, Waterloo, ON, Canada
| | - Ana T Duggan
- McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, ON, Canada
| | - Alexis E Dolphin
- Department of Anthropology, University of Waterloo, Waterloo, ON, Canada
| | - Gabriel Renaud
- Department of Health Technology, Section of Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark.
| | - Min Dong
- Department of Urology, Boston Children's Hospital, Boston, MA, USA.
- Department of Surgery and Department of Microbiology, Harvard Medical School, Boston, MA, USA.
| | - Andrew C Doxey
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada.
| |
Collapse
|
18
|
Baker JL. Illuminating the oral microbiome and its host interactions: recent advancements in omics and bioinformatics technologies in the context of oral microbiome research. FEMS Microbiol Rev 2023; 47:fuad051. [PMID: 37667515 PMCID: PMC10503653 DOI: 10.1093/femsre/fuad051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/02/2023] [Accepted: 09/01/2023] [Indexed: 09/06/2023] Open
Abstract
The oral microbiota has an enormous impact on human health, with oral dysbiosis now linked to many oral and systemic diseases. Recent advancements in sequencing, mass spectrometry, bioinformatics, computational biology, and machine learning are revolutionizing oral microbiome research, enabling analysis at an unprecedented scale and level of resolution using omics approaches. This review contains a comprehensive perspective of the current state-of-the-art tools available to perform genomics, metagenomics, phylogenomics, pangenomics, transcriptomics, proteomics, metabolomics, lipidomics, and multi-omics analysis on (all) microbiomes, and then provides examples of how the techniques have been applied to research of the oral microbiome, specifically. Key findings of these studies and remaining challenges for the field are highlighted. Although the methods discussed here are placed in the context of their contributions to oral microbiome research specifically, they are pertinent to the study of any microbiome, and the intended audience of this includes researchers would simply like to get an introduction to microbial omics and/or an update on the latest omics methods. Continued research of the oral microbiota using omics approaches is crucial and will lead to dramatic improvements in human health, longevity, and quality of life.
Collapse
Affiliation(s)
- Jonathon L Baker
- Department of Oral Rehabilitation & Biosciences, School of Dentistry, Oregon Health & Science University, 3181 Sam Jackson Park Road, Portland, OR 97202, United States
- Genomic Medicine Group, J. Craig Venter Institute, La Jolla, CA 92037, United States
- Department of Pediatrics, UC San Diego School of Medicine, La Jolla, CA 92093, United States
| |
Collapse
|
19
|
Yang S, Multani A, Garrigues JM, Oh MS, Hemarajata P, Burleson T, Green NM, Oliai C, Gaynor PT, Beaird OE, Winston DJ, Seet CS, Schaenman JM. Transient SARS-CoV-2 RNA-Dependent RNA Polymerase Mutations after Remdesivir Treatment for Chronic COVID-19 in Two Transplant Recipients: Case Report and Intra-Host Viral Genomic Investigation. Microorganisms 2023; 11:2096. [PMID: 37630656 PMCID: PMC10460003 DOI: 10.3390/microorganisms11082096] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/10/2023] [Accepted: 08/14/2023] [Indexed: 08/27/2023] Open
Abstract
Remdesivir is the first FDA-approved drug for treating severe SARS-CoV-2 infection and targets RNA-dependent RNA polymerase (RdRp) that is required for viral replication. To monitor for the development of mutations that may result in remdesivir resistance during prolonged treatment, we sequenced SARS-CoV-2 specimens collected at different treatment time points in two transplant patients with severe COVID-19. In the first patient, an allogeneic hematopoietic stem cell transplant recipient, a transient RdRp catalytic subunit mutation (nsp12:A449V) was observed that has not previously been associated with remdesivir resistance. As no in vitro study had been conducted to elucidate the phenotypic effect of nsp12:A449V, its clinical significance is unclear. In the second patient, two other transient RdRp mutations were detected: one in the catalytic subunit (nsp12:V166A) and the other in an accessory subunit important for processivity (nsp7:D67N). This is the first case report for a potential link between the nsp12:V166A mutation and remdesivir resistance in vivo, which had only been previously described by in vitro studies. The nsp7:D67N mutation has not previously been associated with remdesivir resistance, and whether it has a phenotypic effect is unknown. Our study revealed SARS-CoV-2 genetic dynamics during remdesivir treatment in transplant recipients that involved mutations in the RdRp complex (nsp7 and nsp12), which may be the result of selective pressure. These results suggest that close monitoring for potential resistance during the course of remdesivir treatment in highly vulnerable patient populations may be beneficial. Development and utilization of diagnostic RdRp genotyping tests may be a future direction for improving the management of chronic COVID-19.
Collapse
Affiliation(s)
- Shangxin Yang
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA
| | - Ashrit Multani
- Division of Infectious Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (A.M.); (P.T.G.); (O.E.B.)
| | - Jacob M. Garrigues
- Public Health Laboratories, Los Angeles County Department of Public Health, Downey, CA 90242, USA (P.H.); (T.B.); (N.M.G.)
| | - Michael S. Oh
- Division of Hematology-Oncology, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (M.S.O.); (C.O.); (D.J.W.)
| | - Peera Hemarajata
- Public Health Laboratories, Los Angeles County Department of Public Health, Downey, CA 90242, USA (P.H.); (T.B.); (N.M.G.)
| | - Taylor Burleson
- Public Health Laboratories, Los Angeles County Department of Public Health, Downey, CA 90242, USA (P.H.); (T.B.); (N.M.G.)
| | - Nicole M. Green
- Public Health Laboratories, Los Angeles County Department of Public Health, Downey, CA 90242, USA (P.H.); (T.B.); (N.M.G.)
| | - Caspian Oliai
- Division of Hematology-Oncology, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (M.S.O.); (C.O.); (D.J.W.)
| | - Pryce T. Gaynor
- Division of Infectious Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (A.M.); (P.T.G.); (O.E.B.)
| | - Omer E. Beaird
- Division of Infectious Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (A.M.); (P.T.G.); (O.E.B.)
| | - Drew J. Winston
- Division of Hematology-Oncology, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (M.S.O.); (C.O.); (D.J.W.)
| | - Christopher S. Seet
- Division of Hematology-Oncology, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (M.S.O.); (C.O.); (D.J.W.)
| | - Joanna M. Schaenman
- Division of Infectious Diseases, Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA; (A.M.); (P.T.G.); (O.E.B.)
| |
Collapse
|
20
|
Libuit KG, Doughty EL, Otieno JR, Ambrosio F, Kapsak CJ, Smith EA, Wright SM, Scribner MR, Petit III RA, Mendes CI, Huergo M, Legacki G, Loreth C, Park DJ, Sevinsky JR. Accelerating bioinformatics implementation in public health. Microb Genom 2023; 9:mgen001051. [PMID: 37428142 PMCID: PMC10438813 DOI: 10.1099/mgen.0.001051] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 05/24/2023] [Indexed: 07/11/2023] Open
Abstract
We have adopted an open bioinformatics ecosystem to address the challenges of bioinformatics implementation in public health laboratories (PHLs). Bioinformatics implementation for public health requires practitioners to undertake standardized bioinformatic analyses and generate reproducible, validated and auditable results. It is essential that data storage and analysis are scalable, portable and secure, and that implementation of bioinformatics fits within the operational constraints of the laboratory. We address these requirements using Terra, a web-based data analysis platform with a graphical user interface connecting users to bioinformatics analyses without the use of code. We have developed bioinformatics workflows for use with Terra that specifically meet the needs of public health practitioners. These Theiagen workflows perform genome assembly, quality control, and characterization, as well as construction of phylogeny for insights into genomic epidemiology. Additonally, these workflows use open-source containerized software and the WDL workflow language to ensure standardization and interoperability with other bioinformatics solutions, whilst being adaptable by the user. They are all open source and publicly available in Dockstore with the version-controlled code available in public GitHub repositories. They have been written to generate outputs in standardized file formats to allow for further downstream analysis and visualization with separate genomic epidemiology software. Testament to this solution meeting the requirements for bioinformatic implementation in public health, Theiagen workflows have collectively been used for over 5 million sample analyses in the last 2 years by over 90 public health laboratories in at least 40 different countries. Continued adoption of technological innovations and development of further workflows will ensure that this ecosystem continues to benefit PHLs.
Collapse
Affiliation(s)
- Kevin G. Libuit
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Emma L. Doughty
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - James R. Otieno
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Frank Ambrosio
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Curtis J. Kapsak
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Emily A. Smith
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Sage M. Wright
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Michelle R. Scribner
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Robert A. Petit III
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
- Wyoming Public Health Laboratory, 208 S College Dr, Cheyenne, WY 82007, USA
| | - Catarina Inês Mendes
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Marcela Huergo
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Gregory Legacki
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| | - Christine Loreth
- Broad Institute of Harvard and MIT, 415 Main St, Cambridge, MA 02142, USA
| | - Daniel J. Park
- Broad Institute of Harvard and MIT, 415 Main St, Cambridge, MA 02142, USA
| | - Joel R. Sevinsky
- Theiagen Genomics, Suite 400, 1745 Shea Center Drive, Highlands Ranch, CO, 80129, USA
| |
Collapse
|
21
|
Papatheodorou EM, Papakostas S, Stamou GP. Fire and Rhizosphere Effects on Bacterial Co-Occurrence Patterns. Microorganisms 2023; 11:microorganisms11030790. [PMID: 36985363 PMCID: PMC10052084 DOI: 10.3390/microorganisms11030790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/10/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023] Open
Abstract
Fires are common in Mediterranean soils and constitute an important driver of their evolution. Although fire effects on vegetation dynamics are widely studied, their influence on the assembly rules of soil prokaryotes in a small-scale environment has attracted limited attention. In the present study, we reanalyzed the data from Aponte et al. (2022) to test whether the direct and/or indirect effects of fire are reflected in the network of relationships among soil prokaryotes in a Chilean sclerophyllous ecosystem. We focused on bacterial (genus and species level) co-occurrence patterns in the rhizospheres and bulk soils in burned and unburned plots. Four soils were considered: bulk-burnt (BB), bulk-unburnt (BU), rhizosphere-burnt (RB), and rhizosphere-unburnt (RU). The largest differences in network parameters were recorded between RU and BB soils, while RB and BU networks exhibited similar values. The network in the BB soil was the most compact and centralized, while the RU network was the least connected, with no central nodes. The robustness of bacterial communities was enhanced in burnt soils, but this was more pronounced in BB soil. The mechanisms mainly responsible for bacterial community structure were stochastic in all soils, whether burnt or unburnt; however, communities in RB were much more stochastic than in RU.
Collapse
Affiliation(s)
| | - Spiros Papakostas
- Department of Science and Technology, School of Science and Technology, University Center of International Programmes of Studies, International Hellenic University, 57001 Thessaloniki, Greece
| | - George P Stamou
- Department of Ecology, School of Biology, AUTH, 54124 Thessaloniki, Greece
| |
Collapse
|
22
|
Whitten MMA, Xue Q, Taning CNT, James R, Smagghe G, del Sol R, Hitchings M, Dyson P. A narrow host-range and lack of persistence in two non-target insect species of a bacterial symbiont exploited to deliver insecticidal RNAi in Western Flower Thrips. FRONTIERS IN INSECT SCIENCE 2023; 3:1093970. [PMID: 38469480 PMCID: PMC10926499 DOI: 10.3389/finsc.2023.1093970] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 01/31/2023] [Indexed: 03/13/2024]
Abstract
Introduction Insecticidal RNAi is a targeted pest insect population control measure. The specificity of insecticidal RNAi can theoretically be enhanced by using symbiotic bacteria with a narrow host range to deliver RNAi, an approach termed symbiont-mediated RNAi (SMR), a technology we have previously demonstrated in the globally-invasive pest species Western Flower Thrips (WFT). Methods Here we examine distribution of the two predominant bacterial symbionts of WFT, BFo1 and BFo2, among genome-sequenced insects. Moreover, we have challenged two non-target insect species with both bacterial species, namely the pollinating European bumblebee, Bombus terrestris, and an insect predator of WFT, the pirate bug Orius laevigatus. Results Our data indicate a very limited distribution of either symbiont among insects other than WFT. Moreover, whereas BFo1 could establish itself in both bees and pirate bugs, albeit with no significant effects on insect fitness, BFo2 was unable to persist in either species. Discussion In terms of biosafety, these data, together with its more specific growth requirements, vindicate the choice of BFo2 for delivery of RNAi and precision pest management of WFT.
Collapse
Affiliation(s)
- Miranda M. A. Whitten
- Institute of Life Science, Swansea University Medical School, Singleton Park, Swansea, United Kingdom
| | - Qi Xue
- Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Clauvis Nji Tizi Taning
- Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Reuben James
- Institute of Life Science, Swansea University Medical School, Singleton Park, Swansea, United Kingdom
| | - Guy Smagghe
- Department of Plants and Crops, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Ricardo del Sol
- Institute of Life Science, Swansea University Medical School, Singleton Park, Swansea, United Kingdom
| | - Matthew Hitchings
- Institute of Life Science, Swansea University Medical School, Singleton Park, Swansea, United Kingdom
| | - Paul Dyson
- Institute of Life Science, Swansea University Medical School, Singleton Park, Swansea, United Kingdom
| |
Collapse
|
23
|
Jones A, Zhang D, Massey SE, Deigin Y, Nemzer LR, Quay SC. Discovery of a novel merbecovirus DNA clone contaminating agricultural rice sequencing datasets from Wuhan, China. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.12.528210. [PMID: 36865340 PMCID: PMC9979991 DOI: 10.1101/2023.02.12.528210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
HKU4-related coronaviruses are a group of betacoronaviruses belonging to the same merbecovirus subgenus as Middle Eastern Respiratory Syndrome coronavirus (MERS-CoV), which causes severe respiratory illness in humans with a mortality rate of over 30%. The high genetic similarity between HKU4-related coronaviruses and MERS-CoV makes them an attractive subject of research for modeling potential zoonotic spillover scenarios. In this study, we identify a novel coronavirus contaminating agricultural rice RNA sequencing datasets from Wuhan, China. The datasets were generated by the Huazhong Agricultural University in early 2020. We were able to assemble the complete viral genome sequence, which revealed that it is a novel HKU4-related merbecovirus. The assembled genome is 98.38% identical to the closest known full genome sequence, Tylonycteris pachypus bat isolate BtTp-GX2012. Using in silico modeling, we identified that the novel HKU4-related coronavirus spike protein likely binds to human dipeptidyl peptidase 4 (DPP4), the receptor used by MERS-CoV. We further identified that the novel HKU4-related coronavirus genome has been inserted into a bacterial artificial chromosome in a format consistent with previously published coronavirus infectious clones. Additionally, we have found a near complete read coverage of the spike gene of the MERS-CoV reference strain HCoV-EMC/2012, and identify the likely presence of a HKU4-related-MERS chimera in the datasets. Our findings contribute to the knowledge of HKU4-related coronaviruses and document the use of a previously unpublished HKU4 reverse genetics system in apparent MERS-CoV related gain-of-function research. Our study also emphasizes the importance of improved biosafety protocols in sequencing centers and coronavirus research facilities.
Collapse
|
24
|
Siriarchawatana P, Pumkaeo P, Harnpicharnchai P, Likhitrattanapisal S, Mayteeworakoon S, Boonsin W, Zhou X, Liang J, Cai L, Ingsriswang S. Temporal, compositional, and functional differences in the microbiome of Bangkok subway air environment. ENVIRONMENTAL RESEARCH 2023; 219:115065. [PMID: 36535389 DOI: 10.1016/j.envres.2022.115065] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 12/09/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
With the growing numbers of the urban population, an increasing number of commuters have relied on subway systems for rapid transportation in daily life. Analyzing the temporal distribution of air microbiomes in subway environments is crucial for the assessment and monitoring of air quality in the subway system, especially with regard to public health. This study employed culture-independent metabarcode sequencing to analyze bacterial diversity and variations in bacterial compositions associated with bioaerosols collected from a subway station in Bangkok over a four-month period. The bacteria obtained were found to consist primarily of Proteobacteria, Firmicutes, and Actinobacteria, with variations at the family, genus, and species levels among samples obtained in different months. The vast majority of these bacteria are most likely derived from outside environments and human body sources. Many of the bacteria found in Bangkok subway station were also identified as "core microorganisms" of subway environments around the world, as suggested by the MetaSUB Consortium. The diversity of bacterial communities was shown to be influenced by several air quality variables, especially ambient temperature and the quantity of particulate matters, which showed positive correlations with several bacterial species such as Acinetobacter lwoffii, Staphylococcus spp., and Moraxella osloensis. In addition, metabolic profiles inferred from metabarcode-derived bacterial diversity showed significant variations across different sampling times and sites and can be used as a starting point to further explore the functional roles of specific groups of bacteria in the subway environment. This study thus introduced the information required for surveillance of microbiological impacts and their contributions to the well-being of subway commuters in Bangkok.
Collapse
Affiliation(s)
- Paopit Siriarchawatana
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand
| | - Panyapon Pumkaeo
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand
| | - Piyanun Harnpicharnchai
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand
| | - Somsak Likhitrattanapisal
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand
| | - Sermsiri Mayteeworakoon
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand
| | - Worawongsin Boonsin
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand
| | - Xin Zhou
- Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Junmin Liang
- Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Lei Cai
- Institute of Microbiology, Chinese Academy of Sciences, No.1 Beichen West Road, Chaoyang District, Beijing, 100101, China
| | - Supawadee Ingsriswang
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, 113 Thailand Science Park, Phahonyothin Road Khlong Nueng, Khlong Luang, Pathum Thani, 12120, Thailand.
| |
Collapse
|
25
|
Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database. Microbiol Spectr 2023; 11:e0342622. [PMID: 36622170 PMCID: PMC9927258 DOI: 10.1128/spectrum.03426-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
SARS-CoV-2 has infected more than 600 million people. However, the origin of the virus is still unclear; knowing where the virus came from could help us prevent future zoonotic epidemics. Sequencing data, particularly metagenomic data, can profile the genomes of all species in the sample, including those not recognized at the time, thus allowing for the identification of the progenitor of SARS-CoV-2 in samples collected before the pandemic. We analyzed the data from 5,196 SARS-CoV-2-positive sequencing runs in the NCBI's SRA database with collection dates prior to 2020 or unknown. We found that the mutation patterns obtained from these suspicious SARS-CoV-2 reads did not match the genome characteristics of an unknown progenitor of the virus, suggesting that they may derive from circulating SARS-CoV-2 variants or other coronaviruses. Despite a negative result for tracking the progenitor of SARS-CoV-2, the methods developed in the study could assist in pinpointing the origin of various pathogens in the future. IMPORTANCE Sequences that are homologous to the SARS-CoV-2 genome were found in numerous sequencing runs that were not associated with the SARS-CoV-2 studies in the public database. It is unclear whether they are derived from the possible progenitor of SARS-CoV-2 or contamination of more recent SARS-CoV-2 variants circulated in the population due to the lack of information on the collection, library preparation, and sequencing processes. We have developed a computational framework to infer the evolutionary relationship between sequences based on the comparison of mutations, which enabled us to rule out the possibility that these suspicious sequences originate from unknown progenitors of SARS-CoV-2.
Collapse
|
26
|
Kawasaki J, Tomonaga K, Horie M. Large-scale investigation of zoonotic viruses in the era of high-throughput sequencing. Microbiol Immunol 2023; 67:1-13. [PMID: 36259224 DOI: 10.1111/1348-0421.13033] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 09/28/2022] [Accepted: 10/16/2022] [Indexed: 01/10/2023]
Abstract
Zoonotic diseases considerably impact public health and socioeconomics. RNA viruses reportedly caused approximately 94% of zoonotic diseases documented from 1990 to 2010, emphasizing the importance of investigating RNA viruses in animals. Furthermore, it has been estimated that hundreds of thousands of animal viruses capable of infecting humans are yet to be discovered, warning against the inadequacy of our understanding of viral diversity. High-throughput sequencing (HTS) has enabled the identification of viral infections with relatively little bias. Viral searches using both symptomatic and asymptomatic animal samples by HTS have revealed hidden viral infections. This review introduces the history of viral searches using HTS, current analytical limitations, and future potentials. We primarily summarize recent research on large-scale investigations on viral infections reusing HTS data from public databases. Furthermore, considering the accumulation of uncultivated viruses, we discuss current studies and challenges for connecting viral sequences to their phenotypes using various approaches: performing data analysis, developing predictive modeling, or implementing high-throughput platforms of virological experiments. We believe that this article provides a future direction in large-scale investigations of potential zoonotic viruses using the HTS technology.
Collapse
Affiliation(s)
- Junna Kawasaki
- Laboratory of RNA Viruses, Department of Virus Research, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan.,Laboratory of RNA Viruses, Department of Mammalian Regulatory Network, Graduate School of Biostudies, Kyoto University, Kyoto, Japan.,Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Keizo Tomonaga
- Laboratory of RNA Viruses, Department of Virus Research, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan.,Laboratory of RNA Viruses, Department of Mammalian Regulatory Network, Graduate School of Biostudies, Kyoto University, Kyoto, Japan.,Department of Molecular Virology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Masayuki Horie
- Division of Veterinary Sciences, Graduate School of Life and Environmental Sciences, Osaka Prefecture University, Osaka, Japan.,Osaka International Research Center for Infectious Diseases, Osaka Prefecture University, Osaka, Japan
| |
Collapse
|
27
|
Babaian A, Edgar R. Ribovirus classification by a polymerase barcode sequence. PeerJ 2022; 10:e14055. [PMID: 36258794 PMCID: PMC9573346 DOI: 10.7717/peerj.14055] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/24/2022] [Indexed: 01/19/2023] Open
Abstract
RNA viruses encoding a polymerase gene (riboviruses) dominate the known eukaryotic virome. High-throughput sequencing is revealing a wealth of new riboviruses known only from sequence, precluding classification by traditional taxonomic methods. Sequence classification is often based on polymerase sequences, but standardised methods to support this approach are currently lacking. To address this need, we describe the polymerase palmprint, a segment of the palm sub-domain robustly delineated by well-conserved catalytic motifs. We present an algorithm, Palmscan, which identifies palmprints in nucleotide and amino acid sequences; PALMdb, a collection of palmprints derived from public sequence databases; and palmID, a public website implementing palmprint identification, search, and annotation. Together, these methods demonstrate a proof-of-concept workflow for high-throughput characterisation of RNA viruses, paving the path for the continued rapid growth in RNA virus discovery anticipated in the coming decade.
Collapse
Affiliation(s)
- Artem Babaian
- St Edmunds College, Cambridge, United Kingdom,Department of Haematology, University of Cambridge, Cambridge, United Kingdom
| | - Robert Edgar
- Corte Madera, California, United States of America
| |
Collapse
|
28
|
Bartoszewicz JM, Nasri F, Nowicka M, Renard BY. Detecting DNA of novel fungal pathogens using ResNets and a curated fungi-hosts data collection. Bioinformatics 2022; 38:ii168-ii174. [PMID: 36124807 DOI: 10.1093/bioinformatics/btac495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Emerging pathogens are a growing threat, but large data collections and approaches for predicting the risk associated with novel agents are limited to bacteria and viruses. Pathogenic fungi, which also pose a constant threat to public health, remain understudied. Relevant data remain comparatively scarce and scattered among many different sources, hindering the development of sequencing-based detection workflows for novel fungal pathogens. No prediction method working for agents across all three groups is available, even though the cause of an infection is often difficult to identify from symptoms alone. RESULTS We present a curated collection of fungal host range data, comprising records on human, animal and plant pathogens, as well as other plant-associated fungi, linked to publicly available genomes. We show that it can be used to predict the pathogenic potential of novel fungal species directly from DNA sequences with either sequence homology or deep learning. We develop learned, numerical representations of the collected genomes and visualize the landscape of fungal pathogenicity. Finally, we train multi-class models predicting if next-generation sequencing reads originate from novel fungal, bacterial or viral threats. CONCLUSIONS The neural networks trained using our data collection enable accurate detection of novel fungal pathogens. A curated set of over 1400 genomes with host and pathogenicity metadata supports training of machine-learning models and sequence comparison, not limited to the pathogen detection task. AVAILABILITY AND IMPLEMENTATION The data, models and code are hosted at https://zenodo.org/record/5846345, https://zenodo.org/record/5711877 and https://gitlab.com/dacs-hpi/deepac. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jakub M Bartoszewicz
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Ferdous Nasri
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Melania Nowicka
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany.,Department of Mathematics and Computer Science, Free University of Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
29
|
Xiaoli L, Hagey JV, Park DJ, Gulvik CA, Young EL, Alikhan NF, Lawsin A, Hassell N, Knipe K, Oakeson KF, Retchless AC, Shakya M, Lo CC, Chain P, Page AJ, Metcalf BJ, Su M, Rowell J, Vidyaprakash E, Paden CR, Huang AD, Roellig D, Patel K, Winglee K, Weigand MR, Katz LS. Benchmark datasets for SARS-CoV-2 surveillance bioinformatics. PeerJ 2022; 10:e13821. [PMID: 36093336 PMCID: PMC9454940 DOI: 10.7717/peerj.13821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 07/08/2022] [Indexed: 01/18/2023] Open
Abstract
Background Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. Methods We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. Results The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. Discussion The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines.
Collapse
Affiliation(s)
- Lingzi Xiaoli
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Jill V. Hagey
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Daniel J. Park
- Broad Institute of MIT and Harvard, Cambridge, MA, United States of America
| | - Christopher A. Gulvik
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Erin L. Young
- Utah Public Health Laboratory, Salt Lake City, UT, United States of America
| | | | - Adrian Lawsin
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Norman Hassell
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Kristen Knipe
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Kelly F. Oakeson
- Utah Public Health Laboratory, Salt Lake City, UT, United States of America
| | - Adam C. Retchless
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Migun Shakya
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Chien-Chi Lo
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Patrick Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, United States of America
| | - Andrew J. Page
- Quadram Institute Bioscience, Norwich Research Park, Norwich, United Kingdom
| | - Benjamin J. Metcalf
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Michelle Su
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Jessica Rowell
- SARS-CoV-2 Emerging Variant Sequencing Project Dry Lab Group Laboratory and Testing Task Force COVID-19 Emergency Response, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Eshaw Vidyaprakash
- SARS-CoV-2 Emerging Variant Sequencing Project Dry Lab Group Laboratory and Testing Task Force COVID-19 Emergency Response, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Clinton R. Paden
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Andrew D. Huang
- SARS-CoV-2 Emerging Variant Sequencing Project Dry Lab Group Laboratory and Testing Task Force COVID-19 Emergency Response, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Dawn Roellig
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Ketan Patel
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Kathryn Winglee
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Michael R. Weigand
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Lee S. Katz
- Strain Surveillance and Emerging Variant Team, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| |
Collapse
|
30
|
Assessment of Rapid MinION Nanopore DNA Virus Meta-Genomics Using Calves Experimentally Infected with Bovine Herpes Virus-1. Viruses 2022; 14:v14091859. [PMID: 36146668 PMCID: PMC9501177 DOI: 10.3390/v14091859] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/19/2022] [Accepted: 08/20/2022] [Indexed: 11/16/2022] Open
Abstract
Bovine respiratory disease (BRD), which is the leading cause of morbidity and mortality in cattle, is caused by numerous known and unknown viruses and is responsible for the widespread use of broad-spectrum antibiotics despite the use of polymicrobial BRD vaccines. Viral metagenomics sequencing on the portable, inexpensive Oxford Nanopore Technologies MinION sequencer and sequence analysis with its associated user-friendly point-and-click Epi2ME cloud-based pathogen identification software has the potential for point-of-care/same-day/sample-to-result metagenomic sequence diagnostics of known and unknown BRD pathogens to inform a rapid response and vaccine design. We assessed this potential using in vitro viral cell cultures and nasal swabs taken from calves that were experimentally challenged with a single known BRD-associated DNA virus, namely, bovine herpes virus 1. Extensive optimisation of the standard Oxford Nanopore library preparation protocols, particularly a reduction in the PCR bias of library amplification, was required before BoHV-1 could be identified as the main virus in the in vitro cell cultures and nasal swab samples within approximately 7 h from sample to result. In addition, we observed incorrect assignment of the bovine sequence to bacterial and viral taxa due to the presence of poor-quality bacterial and viral genome assemblies in the RefSeq database used by the EpiME Fastq WIMP pathogen identification software.
Collapse
|
31
|
Complete Genome Sequences of Mycobacteriophages SynergyX, Abinghost, Bananafish, and Delton. Microbiol Resour Announc 2022; 11:e0028622. [PMID: 35863046 PMCID: PMC9387249 DOI: 10.1128/mra.00286-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Four lytic mycobacteriophages, namely, SynergyX, Abinghost, Bananafish, and Delton, were isolated from soil in Washington, DC, using the bacterial host Mycobacterium smegmatis mc2155. Analysis of the genomes revealed that they belong to two subclusters of actinobacteriophage cluster B (subclusters B2 and B3) and subcluster D1 of cluster D.
Collapse
|
32
|
Rosani U. Tracing RNA viruses associated with Nudibranchia gastropods. PeerJ 2022; 10:e13410. [PMID: 35586129 PMCID: PMC9109684 DOI: 10.7717/peerj.13410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 04/19/2022] [Indexed: 01/14/2023] Open
Abstract
Background Nudibranchia is an under-studied taxonomic group of gastropods, including more than 3,000 species with colourful and extravagant body shapes and peculiar predatory and defensive strategies. Although symbiosis with bacteria has been reported, no data are available for the nudibranch microbiome nor regarding viruses possibly associated with these geographically widespread species. Methods Based on 47 available RNA sequencing datasets including more than two billion reads of 35 nudibranch species, a meta-transcriptome assembly was constructed. Taxonomic searches with DIAMOND, RNA-dependent-RNA-polymerase identification with palmscan and viral hallmark genes identification by VirSorter2 in combination with CheckV were applied to identify genuine viral genomes, which were then annotated using CAT. Results A total of 20 viral genomes were identified as bona fide viruses, among 552 putative viral contigs resembling both RNA viruses of the Negarnaviricota, Pisuviricota, Kitrinoviricota phyla and actively transcribing DNA viruses of the Cossaviricota and Nucleocytoviricota phyla. The 20 commonly identified viruses showed similarity with RNA viruses identified in other RNA-seq experiments and can be putatively associated with bacteria, plant and arthropod hosts by co-occurence analysis. The RNA samples having the highest viral abundances showed a heterogenous and mostly sample-specific distribution of the identified viruses, suggesting that nudibranchs possess diversified and mostly unknown viral communities.
Collapse
|
33
|
Edgar RC, Taylor B, Lin V, Altman T, Barbera P, Meleshko D, Lohr D, Novakovsky G, Buchfink B, Al-Shayeb B, Banfield JF, de la Peña M, Korobeynikov A, Chikhi R, Babaian A. Petabase-scale sequence alignment catalyses viral discovery. Nature 2022; 602:142-147. [PMID: 35082445 DOI: 10.1038/s41586-021-04332-2] [Citation(s) in RCA: 148] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 12/10/2021] [Indexed: 01/20/2023]
Abstract
Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.
Collapse
Affiliation(s)
| | - Brie Taylor
- Independent researcher, Vancouver, British Columbia, Canada
| | - Victor Lin
- Independent researcher, Seattle, WA, USA
| | | | - Pierre Barbera
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia
- Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | - Gherman Novakovsky
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benjamin Buchfink
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany
| | - Basem Al-Shayeb
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, CA, USA
| | - Marcos de la Peña
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, St Petersburg State University, St Petersburg, Russia
- Department of Statistical Modelling, St Petersburg State University, St Petersburg, Russia
| | - Rayan Chikhi
- G5 Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Artem Babaian
- Independent researcher, Vancouver, British Columbia, Canada.
| |
Collapse
|
34
|
McKenzie R, Maarsingh JD, Łaniewski P, Herbst-Kralovetz MM. Immunometabolic Analysis of Mobiluncus mulieris and Eggerthella sp. Reveals Novel Insights Into Their Pathogenic Contributions to the Hallmarks of Bacterial Vaginosis. Front Cell Infect Microbiol 2022; 11:759697. [PMID: 35004344 PMCID: PMC8733642 DOI: 10.3389/fcimb.2021.759697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/06/2021] [Indexed: 01/11/2023] Open
Abstract
The cervicovaginal microbiome plays an important role in protecting women from dysbiosis and infection caused by pathogenic microorganisms. In healthy reproductive-age women the cervicovaginal microbiome is predominantly colonized by protective Lactobacillus spp. The loss of these protective bacteria leads to colonization of the cervicovaginal microenvironment by pathogenic microorganisms resulting in dysbiosis and bacterial vaginosis (BV). Mobiluncus mulieris and Eggerthella sp. are two of the many anaerobes that can contribute to BV, a condition associated with multiple adverse obstetric and gynecological outcomes. M. mulieris has been linked to high Nugent scores (relating to BV morphotypes) and preterm birth (PTB), whilst some bacterial members of the Eggerthellaceae family are highly prevalent in BV, and identified in ~85-95% of cases. The functional impact of M. mulieris and Eggerthella sp. in BV is still poorly understood. To determine the individual immunometabolic contributions of Eggerthella sp. and M. mulieris within the cervicovaginal microenvironment, we utilized our well-characterized human three-dimensional (3-D) cervical epithelial cell model in combination with multiplex immunoassays and global untargeted metabolomics approaches to identify key immune mediators and metabolites related to M. mulieris and Eggerthella sp. infections. We found that infection with M. mulieris significantly elevated multiple proinflammatory markers (IL-6, IL-8, TNF-α and MCP-1) and altered metabolites related to energy metabolism (nicotinamide and succinate) and oxidative stress (cysteinylglycine, cysteinylglycine disulfide and 2-hydroxygluatrate). Eggerthella sp. infection significantly elevated multiple sphingolipids and glycerolipids related to epithelial barrier function, and biogenic amines (putrescine and cadaverine) associated with elevated vaginal pH, vaginal amine odor and vaginal discharge. Our study elucidated that M. mulieris elevated multiple proinflammatory markers relating to PTB and STI acquisition, as well as altered energy metabolism and oxidative stress, whilst Eggerthella sp. upregulated multiple biogenic amines associated with the clinical diagnostic criteria of BV. Future studies are needed to evaluate how these bacteria interact with other BV-associated bacteria within the cervicovaginal microenvironment.
Collapse
Affiliation(s)
- Ross McKenzie
- Department of Obstetrics and Gynecology, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States.,Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Jason D Maarsingh
- Department of Obstetrics and Gynecology, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States
| | - Paweł Łaniewski
- Department of Basic Medical Sciences, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States
| | - Melissa M Herbst-Kralovetz
- Department of Obstetrics and Gynecology, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States.,Department of Basic Medical Sciences, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States
| |
Collapse
|
35
|
Schmartz GP, Hirsch P, Amand J, Dastbaz J, Fehlmann T, Kern F, Müller R, Keller A. OUP accepted manuscript. Nucleic Acids Res 2022; 50:W132-W137. [PMID: 35489067 PMCID: PMC9252796 DOI: 10.1093/nar/gkac298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/07/2022] [Accepted: 04/14/2022] [Indexed: 11/13/2022] Open
Abstract
Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of metagenomic analysis workflows. In case reference information is lacking, genomic binning is considered to be a state-of-the-art method in mixed culture metagenomic data analysis. In this light, our previously published tool BusyBee Web implements a composition-based binning method efficient enough to function as a rapid online utility. Handling assembled contigs and long nanopore generated reads alike, the webserver provides a wide range of supplementary annotations and visualizations. Half a decade after the initial publication, we revisited existing functionality, added comprehensive visualizations, and increased the number of data analysis customization options for further experimentation. The webserver now allows for visualization-supported differential analysis of samples, which is computationally expensive and typically only performed in coverage-based binning methods. Further, users may now optionally check their uploaded samples for plasmid sequences using PLSDB as a reference database. Lastly, a new application programming interface with a supporting python package was implemented, to allow power users fully automated access to the resource and integration into existing workflows. The webserver is freely available under: https://www.ccb.uni-saarland.de/busybee.
Collapse
Affiliation(s)
- Georges P Schmartz
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Pascal Hirsch
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Clinical Bioinformatics (CLIB), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Jérémy Amand
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Clinical Bioinformatics (CLIB), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Jan Dastbaz
- Microbial Natural Products (MINS), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF), Standort Hannover-Braunschweig, 38124 Braunschweig, Germany
| | - Tobias Fehlmann
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Fabian Kern
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Clinical Bioinformatics (CLIB), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Rolf Müller
- Microbial Natural Products (MINS), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF), Standort Hannover-Braunschweig, 38124 Braunschweig, Germany
| | - Andreas Keller
- To whom correspondence should be addressed. Tel: +49 681 30268611; Fax: +49 681 30268610;
| |
Collapse
|
36
|
Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res 2021; 50:D20-D26. [PMID: 34850941 DOI: 10.1093/nar/gkab1112] [Citation(s) in RCA: 773] [Impact Index Per Article: 257.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 10/20/2021] [Accepted: 11/18/2021] [Indexed: 11/14/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, RefSeq, SRA, Virus, dbSNP, dbVar, ClinicalTrials.gov, MMDB, iCn3D and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathi Canese
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jessica Chan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Chris Kelly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Tom Madej
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Christopher Lanczycki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yuri Skripchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Tony Tse
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Rebecca Williams
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Barton W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
37
|
Katz K, Shutov O, Lapoint R, Kimelman M, Brister JR, O'Sullivan C. The Sequence Read Archive: a decade more of explosive growth. Nucleic Acids Res 2021; 50:D387-D390. [PMID: 34850094 DOI: 10.1093/nar/gkab1053] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 11/13/2022] Open
Abstract
The Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. Here we note changes in storage designed to increase access and highlight analyses that augment metadata with taxonomic insight to help users select data. In addition, we present three unanticipated applications of taxonomic analysis.
Collapse
Affiliation(s)
- Kenneth Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Oleg Shutov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Richard Lapoint
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Michael Kimelman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Christopher O'Sullivan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|