1
|
MacDonald ML, Polson SW, Lee KH. k-mer-Based Metagenomics Tools Provide a Fast and Sensitive Approach for the Detection of Viral Contaminants in Biopharmaceutical and Vaccine Manufacturing Applications Using Next-Generation Sequencing. mSphere 2021; 6:e01336-20. [PMID: 33883263 PMCID: PMC8546726 DOI: 10.1128/msphere.01336-20] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 03/30/2021] [Indexed: 11/20/2022] Open
Abstract
Adventitious agent detection during the production of vaccines and biotechnology-based medicines is of critical importance to ensure the final product is free from any possible viral contamination. Increasing the speed and accuracy of viral detection is beneficial as a means to accelerate development timelines and to ensure patient safety. Here, several rapid viral metagenomics approaches were tested on simulated next-generation sequencing (NGS) data sets and existing data sets from virus spike-in studies done in CHO-K1 and HeLa cell lines. It was observed that these rapid methods had comparable sensitivity to full-read alignment methods used for NGS viral detection for these data sets, but their specificity could be improved. A method that first filters host reads using KrakenUniq and then selects the virus classification tool based on the number of remaining reads is suggested as the preferred approach among those tested to detect nonlatent and nonendogenous viruses. Such an approach shows reasonable sensitivity and specificity for the data sets examined and requires less time and memory as full-read alignment methods.IMPORTANCE Next-generation sequencing (NGS) has been proposed as a complementary method to detect adventitious viruses in the production of biotherapeutics and vaccines to current in vivo and in vitro methods. Before NGS can be established in industry as a main viral detection technology, further investigation into the various aspects of bioinformatics analyses required to identify and classify viral NGS reads is needed. In this study, the ability of rapid metagenomics tools to detect viruses in biopharmaceutical relevant samples is tested and compared to recommend an efficient approach. The results showed that KrakenUniq can quickly and accurately filter host sequences and classify viral reads and had comparable sensitivity and specificity to slower full read alignment approaches, such as BLASTn, for the data sets examined.
Collapse
Affiliation(s)
- Madolyn L MacDonald
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, Delaware, USA
- Ammon Pinizzotto Biopharmaceutical Innovation Center, Newark, Delaware, USA
| | - Shawn W Polson
- Ammon Pinizzotto Biopharmaceutical Innovation Center, Newark, Delaware, USA
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, USA
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, USA
| | - Kelvin H Lee
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, Delaware, USA
- Ammon Pinizzotto Biopharmaceutical Innovation Center, Newark, Delaware, USA
| |
Collapse
|
2
|
Mpangase PT, Frost J, Ramsay M, Hazelhurst S. nf-rnaSeqMetagen: A nextflow metagenomics pipeline for identifying and characterizing microbial sequences from RNA-seq data. MEDICINE IN MICROECOLOGY 2020. [DOI: 10.1016/j.medmic.2020.100011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
3
|
Detecting viral sequences in NGS data. Curr Opin Virol 2019; 39:41-48. [DOI: 10.1016/j.coviro.2019.07.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 07/29/2019] [Accepted: 07/30/2019] [Indexed: 01/03/2023]
|
4
|
Farina R, Severi M, Carrieri A, Miotto E, Sabbioni S, Trombelli L, Scapoli C. Whole metagenomic shotgun sequencing of the subgingival microbiome of diabetics and non-diabetics with different periodontal conditions. Arch Oral Biol 2019; 104:13-23. [PMID: 31153098 DOI: 10.1016/j.archoralbio.2019.05.025] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 05/22/2019] [Accepted: 05/23/2019] [Indexed: 01/01/2023]
Abstract
OBJECTIVE The aim of this study was to use high-resolution whole metagenomic shotgun sequencing to characterize the subgingival microbiome of patients with/without type 2 Diabetes Mellitus and with/without periodontitis. DESIGN Twelve subjects, falling into one of the four study groups based on the presence/absence of poorly controlled type 2 Diabetes Mellitus and moderate-severe periodontitis, were selected. For each eligible subject, subgingival plaque samples were collected at 4 sites, all representative of the periodontal condition of the individual (i.e., non-bleeding sulci in subjects without a history of periodontitis, bleeding pockets in patients with moderate-severe periodontitis). The subgingival microbiome was evaluated using high-resolution whole metagenomic shotgun sequencing. RESULTS The results showed that: (i) the presence of type 2 Diabetes Mellitus and/or periodontitis were associated with a tendency of the subgingival microbiome to decrease in richness and diversity; (ii) the presence of type 2 Diabetes Mellitus was not associated with significant differences in the relative abundance of one or more species in patients either with or without periodontitis; (iii) the presence of periodontitis was associated with a significantly higher relative abundance of Anaerolineaceae bacterium oral taxon 439 in type 2 Diabetes Mellitus patients. CONCLUSIONS Whole metagenomic shotgun sequencing of the subgingival microbiome was extremely effective in the detection of low-abundant taxon. Our results point out a significantly higher relative abundance of Anaerolineaceae bacterium oral taxon 439 in patients with moderate to severe periodontitis vs patients without history of periodontitis, which was maintained when the comparison was restricted to type 2 diabetics.
Collapse
Affiliation(s)
- Roberto Farina
- Research Centre for the Study of Periodontal and Peri-Implant Diseases, University of Ferrara, Italy; Operative Unit of Dentistry, University-Hospital of Ferrara, Italy.
| | - Mattia Severi
- Research Centre for the Study of Periodontal and Peri-Implant Diseases, University of Ferrara, Italy
| | - Alberto Carrieri
- Department of Life Sciences and Biotechnology - Section of Biology and Evolution, University of Ferrara, Italy
| | - Elena Miotto
- Department of Life Sciences and Biotechnology - Section of Pathology and Applied Microbiology,University of Ferrara, Italy
| | - Silvia Sabbioni
- Department of Life Sciences and Biotechnology - Section of Pathology and Applied Microbiology,University of Ferrara, Italy
| | - Leonardo Trombelli
- Research Centre for the Study of Periodontal and Peri-Implant Diseases, University of Ferrara, Italy; Operative Unit of Dentistry, University-Hospital of Ferrara, Italy
| | - Chiara Scapoli
- Research Centre for the Study of Periodontal and Peri-Implant Diseases, University of Ferrara, Italy; Department of Life Sciences and Biotechnology - Section of Biology and Evolution, University of Ferrara, Italy
| |
Collapse
|
5
|
Johnson ME, Franks JM, Cai G, Mehta BK, Wood TA, Archambault K, Pioli PA, Simms RW, Orzechowski N, Arron S, Whitfield ML. Microbiome dysbiosis is associated with disease duration and increased inflammatory gene expression in systemic sclerosis skin. Arthritis Res Ther 2019; 21:49. [PMID: 30728065 PMCID: PMC6366065 DOI: 10.1186/s13075-019-1816-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 01/08/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Infectious agents have long been postulated to be disease triggers for systemic sclerosis (SSc), but a definitive link has not been found. Metagenomic analyses of high-throughput data allows for the unbiased identification of potential microbiome pathogens in skin biopsies of SSc patients and allows insight into the relationship with host gene expression. METHODS We examined skin biopsies from a diverse cohort of 23 SSc patients (including lesional forearm and non-lesional back samples) by RNA-seq. Metagenomic filtering and annotation was performed using the Integrated Metagenomic Sequencing Analysis (IMSA). Associations between microbiome composition and gene expression were analyzed using single-sample gene set enrichment analysis (ssGSEA). RESULTS We find the skin of SSc patients exhibits substantial changes in microbial composition relative to controls, characterized by sharp decreases in lipophilic taxa, such as Propionibacterium, combined with increases in a wide range of gram-negative taxa, including Burkholderia, Citrobacter, and Vibrio. CONCLUSIONS Microbiome dysbiosis is associated with disease duration and increased inflammatory gene expression. These data provide a comprehensive portrait of the SSc skin microbiome and its association with local gene expression, which mirrors the molecular changes in lesional skin.
Collapse
Affiliation(s)
- Michael E Johnson
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Jennifer M Franks
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Guoshuai Cai
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Environmental Health Science, University of South Carolina Arnold School of Public Health, Columbia, SC, USA
| | - Bhaven K Mehta
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Tammara A Wood
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Kimberly Archambault
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Patricia A Pioli
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Robert W Simms
- Division of Rheumatology, Arthritis Center, Boston University Medical Center, Boston, MA, USA
| | - Nicole Orzechowski
- Division of Rheumatology, Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA
| | - Sarah Arron
- Division of Dermatology, University of California, San Francisco, USA
| | - Michael L Whitfield
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA. .,Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, NH, USA. .,Department of Biomedical Data Science, Program in Quantitative Biomedical Sciences, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
| |
Collapse
|
6
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
7
|
Cox JW, Ballweg RA, Taft DH, Velayutham P, Haslam DB, Porollo A. A fast and robust protocol for metataxonomic analysis using RNAseq data. MICROBIOME 2017; 5:7. [PMID: 28103917 PMCID: PMC5244565 DOI: 10.1186/s40168-016-0219-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/05/2016] [Indexed: 05/03/2023]
Abstract
BACKGROUND Metagenomics is a rapidly emerging field aimed to analyze microbial diversity and dynamics by studying the genomic content of the microbiota. Metataxonomics tools analyze high-throughput sequencing data, primarily from 16S rRNA gene sequencing and DNAseq, to identify microorganisms and viruses within a complex mixture. With the growing demand for analysis of the functional microbiome, metatranscriptome studies attract more interest. To make metatranscriptomic data sufficient for metataxonomics, new analytical workflows are needed to deal with sparse and taxonomically less informative sequencing data. RESULTS We present a new protocol, IMSA+A, for accurate taxonomy classification based on metatranscriptome data of any read length that can efficiently and robustly identify bacteria, fungi, and viruses in the same sample. The new protocol improves accuracy by using a conservative reference database, employing a new counting scheme, and by assembling shotgun reads. Assembly also reduces analysis runtime. Simulated data were utilized to evaluate the protocol by permuting common experimental variables. When applied to the real metatranscriptome data for mouse intestines colonized by ASF, the protocol showed superior performance in detection of the microorganisms compared to the existing metataxonomics tools. IMSA+A is available at https://github.com/JeremyCoxBMI/IMSA-A . CONCLUSIONS The developed protocol addresses the need for taxonomy classification from RNAseq data. Previously not utilized, i.e., unmapped to a reference genome, RNAseq reads can now be used to gather taxonomic information about the microbiota present in a biological sample without conducting additional sequencing. Any metatranscriptome pipeline that includes assembly of reads can add this analysis with minimal additional cost of compute time. The new protocol also creates an opportunity to revisit old metatranscriptome data, where taxonomic content may be important but was not analyzed.
Collapse
Affiliation(s)
- Jeremy W Cox
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Richard A Ballweg
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Diana H Taft
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Prakash Velayutham
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - David B Haslam
- Division of Infectious Diseases, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - Aleksey Porollo
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA.
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
8
|
Zhang C, Cleveland K, Schnoll-Sussman F, McClure B, Bigg M, Thakkar P, Schultz N, Shah MA, Betel D. Identification of low abundance microbiome in clinical samples using whole genome sequencing. Genome Biol 2015; 16:265. [PMID: 26614063 PMCID: PMC4661937 DOI: 10.1186/s13059-015-0821-z] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 11/02/2015] [Indexed: 12/19/2022] Open
Abstract
Identifying the microbiome composition from primary tissues directly affords an opportunity to study the causative relationships between the host microbiome and disease. However, this is challenging due the low abundance of microbial DNA relative to the host. We present a systematic evaluation of microbiome profiling directly from endoscopic biopsies by whole genome sequencing. We compared our methods with other approaches on datasets with previously identified microbial composition. We applied this approach to identify the microbiome from 27 stomach biopsies, and validated the presence of Helicobacter pylori by quantitative PCR. Finally, we profiled the microbial composition in The Cancer Genome Atlas gastric adenocarcinoma cohort.
Collapse
Affiliation(s)
- Chao Zhang
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA.,Department of Medicine, Division of Hematology and Medical Oncology, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA
| | - Kyle Cleveland
- Department of Medicine, Division of Hematology and Medical Oncology, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA
| | - Felice Schnoll-Sussman
- Department of Medicine, Division of Hematology and Medical Oncology, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA.,The Jay Monahan Center for Gastrointestinal Health, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA
| | - Bridget McClure
- Center for Advanced Digestive Care, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA
| | - Michelle Bigg
- The Jay Monahan Center for Gastrointestinal Health, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA
| | - Prashant Thakkar
- Department of Medicine, Division of Hematology and Medical Oncology, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA
| | - Nikolaus Schultz
- Kravis Center for Molecular Oncology, Memorial Sloan-Kettering Cancer Center, New York, NY, 10065, USA.,Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, 10065, USA
| | - Manish A Shah
- Department of Medicine, Division of Hematology and Medical Oncology, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA. .,Center for Advanced Digestive Care, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA.
| | - Doron Betel
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA. .,Department of Medicine, Division of Hematology and Medical Oncology, New York-Presbyterian Hospital/Weill Cornell Medicine, New York, NY, 10021, USA.
| |
Collapse
|
9
|
Rawat A, Engelthaler DM, Driebe EM, Keim P, Foster JT. MetaGeniE: characterizing human clinical samples using deep metagenomic sequencing. PLoS One 2014; 9:e110915. [PMID: 25365329 PMCID: PMC4218713 DOI: 10.1371/journal.pone.0110915] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 09/19/2014] [Indexed: 11/19/2022] Open
Abstract
With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from high-throughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies “all against all” relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture.
Collapse
Affiliation(s)
- Arun Rawat
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
- * E-mail: (AR); (JTF)
| | - David M. Engelthaler
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
| | - Elizabeth M. Driebe
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
| | - Paul Keim
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona, United States of America
- Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Jeffrey T. Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, United States of America
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, New Hampshire, United States of America
- * E-mail: (AR); (JTF)
| |
Collapse
|
10
|
Byrd AL, Perez-Rogers JF, Manimaran S, Castro-Nallar E, Toma I, McCaffrey T, Siegel M, Benson G, Crandall KA, Johnson WE. Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data. BMC Bioinformatics 2014; 15:262. [PMID: 25091138 PMCID: PMC4131054 DOI: 10.1186/1471-2105-15-262] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 07/31/2014] [Indexed: 11/17/2022] Open
Abstract
Background The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens. Results Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity. Conclusions Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at:
http://sourceforge.net/projects/pathoscope/. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-262) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Keith A Crandall
- Department of Bioinformatics, Boston University, Boston, MA, USA.
| | | |
Collapse
|
11
|
Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk KC, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E, Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett J, Miller S, Chiu CY. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 2014; 24:1180-92. [PMID: 24899342 PMCID: PMC4079973 DOI: 10.1101/gr.171934.113] [Citation(s) in RCA: 311] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.
Collapse
Affiliation(s)
- Samia N Naccache
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Scot Federman
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Narayanan Veeraraghavan
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Matei Zaharia
- Department of Computer Science, University of California, Berkeley, California 94720, USA
| | - Deanna Lee
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Erik Samayoa
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Jerome Bouquet
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | | | - Ka-Cheung Luk
- Abbott Diagnostics, Abbott Park, Illinois 60064, USA
| | - Barryett Enge
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Debra A Wadford
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Sharon L Messenger
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Gillian L Genrich
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA
| | - Kristen Pellegrino
- Department of Family and Community Medicine, UCSF, San Francisco, California 94143, USA
| | - Gilda Grard
- Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon
| | - Eric Leroy
- Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon
| | | | - Joseph N Fair
- Metabiota, Inc., San Francisco, California 94104, USA
| | - Miguel A Martínez
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico
| | - Pavel Isa
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico
| | - John A Crump
- Division of Infectious Diseases and International Health and the Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina 27708, USA; Kilimanjaro Christian Medical Centre, Moshi, Kilimanjaro, 7393, Tanzania; Centre for International Health, University of Otago, Dunedin, 9054, New Zealand
| | - Joseph L DeRisi
- Department of Biochemistry, UCSF, San Francisco, California 94107, USA
| | - Taylor Sittler
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA
| | - John Hackett
- Abbott Diagnostics, Abbott Park, Illinois 60064, USA
| | - Steve Miller
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Charles Y Chiu
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA; Department of Medicine, Division of Infectious Diseases, UCSF, San Francisco, California 94143, USA
| |
Collapse
|
12
|
High Rhodotorula sequences in skin transcriptome of patients with diffuse systemic sclerosis. J Invest Dermatol 2014; 134:2138-2145. [PMID: 24608988 PMCID: PMC4102619 DOI: 10.1038/jid.2014.127] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Revised: 02/03/2014] [Accepted: 02/19/2014] [Indexed: 11/08/2022]
Abstract
Previous studies have suggested a role for pathogens as a trigger of systemic sclerosis (SSc), although neither a pathogen nor a mechanism of pathogenesis is known. Here we show enrichment of Rhodotorula sequences in the skin of patients with early, diffuse SSc compared with that in normal controls. RNA-seq was performed on four SSc patients and four controls, to a depth of 200 million reads per patient. Data were analyzed to quantify the nonhuman sequence reads in each sample. We found little difference between bacterial microbiome and viral read counts, but found a significant difference between the read counts for a mycobiome component, R. glutinis. Normal samples contained almost no detected R. glutinis or other Rhodotorula sequence reads (mean score 0.021 for R. glutinis, 0.024 for all Rhodotorula). In contrast, SSc samples had a mean score of 5.039 for R. glutinis (5.232 for Rhodotorula). We were able to assemble the D1-D2 hypervariable region of the 28S ribosomal RNA (rRNA) of R. glutinis from each of the SSc samples. Taken together, these results suggest that R. glutinis may be present in the skin of early SSc patients at higher levels than in normal skin, raising the possibility that it may be triggering the inflammatory response found in SSc.
Collapse
|
13
|
Martín R, Miquel S, Langella P, Bermúdez-Humarán LG. The role of metagenomics in understanding the human microbiome in health and disease. Virulence 2014; 5:413-23. [PMID: 24429972 DOI: 10.4161/viru.27864] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The term microbiome refers to the genetic material of the catalog of microbial taxa associated with humans. As in all ecosystems, the microbiota reaches a dynamic equilibrium in the human body, which can be altered by environmental factors and external stimuli. Metagenomics is a relatively new field of study of microbial genomes within diverse environmental samples, which is of increasing importance in microbiology. The introduction of this ecological perception of microbiology is the key to achieving real knowledge about the influence of the microbiota in human health and disease. The aim of this review is to summarize the link between the human microbiota (focusing on the intestinal, vaginal, skin, and airway body sites) and health from this ecological point of view, highlighting the contribution of metagenomics in the advance of this field.
Collapse
Affiliation(s)
- Rebeca Martín
- INRA; UMR1319 Micalis; Jouy-en-Josas, France; AgroParisTech; UMR Micalis; Jouy-en-Josas, France
| | - Sylvie Miquel
- INRA; UMR1319 Micalis; Jouy-en-Josas, France; AgroParisTech; UMR Micalis; Jouy-en-Josas, France
| | - Philippe Langella
- INRA; UMR1319 Micalis; Jouy-en-Josas, France; AgroParisTech; UMR Micalis; Jouy-en-Josas, France
| | - Luis G Bermúdez-Humarán
- INRA; UMR1319 Micalis; Jouy-en-Josas, France; AgroParisTech; UMR Micalis; Jouy-en-Josas, France
| |
Collapse
|
14
|
Dimon MT, Wood HM, Rabbitts PH, Liao W, Cho RJ, Arron ST. No evidence for integrated viral DNA in the genome sequence of cutaneous squamous cell carcinoma. J Invest Dermatol 2014; 134:2055-2057. [PMID: 24480882 PMCID: PMC4057961 DOI: 10.1038/jid.2014.52] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Michelle T Dimon
- Department of Dermatology, University of California, San Francisco, San Francisco, California, USA
| | - Henry M Wood
- Pre-Cancer Genomics, Leeds Institute of Cancer Studies and Pathology, Leeds, UK
| | - Pamela H Rabbitts
- Pre-Cancer Genomics, Leeds Institute of Cancer Studies and Pathology, Leeds, UK
| | - Wilson Liao
- Department of Dermatology, University of California, San Francisco, San Francisco, California, USA
| | - Raymond J Cho
- Department of Dermatology, University of California, San Francisco, San Francisco, California, USA
| | - Sarah T Arron
- Department of Dermatology, University of California, San Francisco, San Francisco, California, USA.
| |
Collapse
|