1
|
Faith JJ. Assessing live microbial therapeutic transmission. Gut Microbes 2025; 17:2447836. [PMID: 39746875 DOI: 10.1080/19490976.2024.2447836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 12/09/2024] [Accepted: 12/23/2024] [Indexed: 01/04/2025] Open
Abstract
The development of fecal microbiota transplantation and defined live biotherapeutic products for the treatment of human disease has been an empirically driven process yielding a notable success of approved drugs for the treatment of recurrent Clostridioides difficile infection. Assessing the potential of this therapeutic modality in other indications with mixed clinical results would benefit from consistent quantitative frameworks to characterize drug potency and composition and to assess the impact of dose and composition on the frequency and duration of strain engraftment. Monitoring these drug properties and engraftment outcomes would help identify minimally sufficient sets of microbial strains to treat disease and provide insights into the intersection between microbial function and host physiology. Broad and correct usage of strain detection methods is essential to this advancement. This article describes strain detection approaches, where they are best applied, what data they require, and clinical trial designs that are best suited to their application.
Collapse
Affiliation(s)
- Jeremiah J Faith
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Yang M, Wang Z, Yan Z, Wang W, Zhu Q, Jin C. DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification. BMC Bioinformatics 2024; 25:328. [PMID: 39402441 PMCID: PMC11476100 DOI: 10.1186/s12859-024-05955-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 10/09/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction. RESULTS DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability. CONCLUSIONS DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics.
Collapse
Affiliation(s)
- Minghao Yang
- Shandong University, Weihai, People's Republic of China
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Zehua Wang
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Zizhuo Yan
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Wenxiang Wang
- Beijing Research Institute of Automation for Machinery Industry, Beijing, People's Republic of China
| | - Qian Zhu
- Shandong University, Weihai, People's Republic of China
| | - Changlong Jin
- Shandong University, Weihai, People's Republic of China.
| |
Collapse
|
3
|
Odom AR, Faits T, Castro-Nallar E, Crandall KA, Johnson WE. Metagenomic profiling pipelines improve taxonomic classification for 16S amplicon sequencing data. Sci Rep 2023; 13:13957. [PMID: 37633998 PMCID: PMC10460424 DOI: 10.1038/s41598-023-40799-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 08/16/2023] [Indexed: 08/28/2023] Open
Abstract
Most experiments studying bacterial microbiomes rely on the PCR amplification of all or part of the gene for the 16S rRNA subunit, which serves as a biomarker for identifying and quantifying the various taxa present in a microbiome sample. Several computational methods exist for analyzing 16S amplicon sequencing. However, the most-used bioinformatics tools cannot produce high quality genus-level or species-level taxonomic calls and may underestimate the potential accuracy of these calls. We used 16S sequencing data from mock bacterial communities to evaluate the sensitivity and specificity of several bioinformatics pipelines and genomic reference libraries used for microbiome analyses, concentrating on measuring the accuracy of species-level taxonomic assignments of 16S amplicon reads. We evaluated the tools DADA2, QIIME 2, Mothur, PathoScope 2, and Kraken 2 in conjunction with reference libraries from Greengenes, SILVA, Kraken 2, and RefSeq. Profiling tools were compared using publicly available mock community data from several sources, comprising 136 samples with varied species richness and evenness, several different amplified regions within the 16S rRNA gene, and both DNA spike-ins and cDNA from collections of plated cells. PathoScope 2 and Kraken 2, both tools designed for whole-genome metagenomics, outperformed DADA2, QIIME 2 using the DADA2 plugin, and Mothur, which are theoretically specialized for 16S analyses. Evaluations of reference libraries identified the SILVA and RefSeq/Kraken 2 Standard libraries as superior in accuracy compared to Greengenes. These findings support PathoScope and Kraken 2 as fully capable, competitive options for genus- and species-level 16S amplicon sequencing data analysis, whole genome sequencing, and metagenomics data tools.
Collapse
Affiliation(s)
- Aubrey R Odom
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Tyler Faits
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Eduardo Castro-Nallar
- Departamento de Microbiología, Facultad de Ciencias de la Salud, Universidad de Talca, Campus Talca, Avda. Lircay S/N, Talca, Chile
- Centro de Ecología Integrativa, Universidad de Talca, Campus Talca, Avda. Lircay S/N, Talca, Chile
| | - Keith A Crandall
- Department of Biostatistics & Bioinformatics, Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| | - W Evan Johnson
- Division of Infectious Disease, Center for Data Science, Rutgers University - New Jersey Medical School, Newark, NJ, USA.
| |
Collapse
|
4
|
McClintock J, Odom-Mabey AR, Kebere N, Ismail A, Mwananyanda L, Gill CJ, MacLeod WB, Pieciak RC, Lapidot R, Johnson WE. Postmortem Nasopharyngeal Microbiome Analysis of Zambian Infants With and Without Respiratory Syncytial Virus Disease: A Nested Case Control Study. Pediatr Infect Dis J 2023; 42:637-643. [PMID: 37093853 PMCID: PMC10348642 DOI: 10.1097/inf.0000000000003941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/31/2023] [Indexed: 04/25/2023]
Abstract
BACKGROUND Respiratory syncytial virus (RSV) is the most common cause of bronchiolitis and lower respiratory tract infections in children in their first year of life, disproportionately affecting infants in developing countries. Previous studies have found that the nasopharyngeal (NP) microbiome of infants with RSV infection has specific characteristics that correlate with disease severity, including lower biodiversity, perturbations of the microbiota and differences in relative abundance. These studies have focused on infants seen in clinical or hospital settings, predominantly in developed countries. METHODS We conducted a nested case control study within a random sample of 50 deceased RSV+ infants with age at death ranging from 4 days to 6 months and 50 matched deceased RSV- infants who were all previously enrolled in the Zambia Pertussis and RSV Infant Mortality Estimation (ZPRIME) study. All infants died within the community or within 48 hours of facility admittance. As part of the ZPRIME study procedures, all decedents underwent one-time, postmortem NP sampling. The current analysis explored the differences between the NP microbiome profiles of RSV+ and RSV- decedents using the 16S ribosomal DNA sequencing. RESULTS We found that Moraxella was more abundant in the NP microbiome of RSV+ decedents than in the RSV- decedents. Additionally, Gemella and Staphylococcus were less abundant in RSV+ decedents than in the RSV- decedents. CONCLUSIONS These results support previously reported findings of the association between the NP microbiome and RSV and suggest that changes in the abundance of these microbes are likely specific to RSV and may correlate with mortality associated with the disease.
Collapse
Affiliation(s)
- Jessica McClintock
- From the Division of Infectious Disease, Center for Data Science, Rutgers New Jersey Medical School, Newark, New Jersey
| | | | - Nitsueh Kebere
- Bioinformatics Program, Boston University, Boston, Massachusetts
| | - Arshad Ismail
- Sequencing Core Facility, National Institute for Communicable Diseases of the National Health Laboratory Service, Johannesburg, South Africa
- Department of Biochemistry and Microbiology, University of Venda, Thohoyandou, South Africa
| | - Lawrence Mwananyanda
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts
| | - Christopher J. Gill
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts
| | - William B. MacLeod
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts
| | - Rachel C. Pieciak
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts
| | - Rotem Lapidot
- Pediatric Infectious Diseases, Boston Medical Center, Boston, Massachusetts
- Pediatrics, Boston University School of Medicine, Boston, Massachusetts
| | - W. Evan Johnson
- From the Division of Infectious Disease, Center for Data Science, Rutgers New Jersey Medical School, Newark, New Jersey
- Bioinformatics Program, Boston University, Boston, Massachusetts
| |
Collapse
|
5
|
PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets. Life (Basel) 2022; 12:life12091345. [PMID: 36143382 PMCID: PMC9505849 DOI: 10.3390/life12091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.
Collapse
|
6
|
Balaji A, Kille B, Kappell AD, Godbold GD, Diep M, Elworth RAL, Qian Z, Albin D, Nasko DJ, Shah N, Pop M, Segarra S, Ternus KL, Treangen TJ. SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning. Genome Biol 2022; 23:133. [PMID: 35725628 PMCID: PMC9208262 DOI: 10.1186/s13059-022-02695-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen .
Collapse
Affiliation(s)
- Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Anthony D Kappell
- Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA
| | - Gene D Godbold
- Signature Science, LLC, 1670 Discovery Drive, Charlottesville, VA, USA
| | - Madeline Diep
- Fraunhofer USA Center Mid-Atlantic CMA, Riverdale, MD, USA
| | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhiqin Qian
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dreycey Albin
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Daniel J Nasko
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Nidhi Shah
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Krista L Ternus
- Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA.
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
7
|
Abstract
Candida auris is a human fungal pathogen classified as an urgent threat to the delivery of health care due to its extensive antimicrobial resistance and the high mortality rates associated with invasive infections. Global outbreaks have occurred in health care facilities, particularly, long-term care hospitals and nursing homes. Skin is the primary site of colonization for C. auris. To accelerate research studies, we developed microbiome sequencing protocols, including amplicon and metagenomic sequencing, directly from patient samples at health care facilities with ongoing C. auris outbreaks. We characterized the skin mycobiome with a database optimized to classify Candida species and C. auris to the clade level. While Malassezia species were the predominant skin-associated fungi, nursing home residents also harbored Candida species, including C. albicans, and C. parapsilosis. Amplicon sequencing was concordant with culturing studies to identify C. auris-colonized patients and provided further resolution that distinct clades of C. auris are colonizing facilities in New York and Illinois. Shotgun metagenomic sequencing from a clinical sample with a high fungal bioburden generated a skin-associated profile of the C. auris genome. Future larger scale clinical studies are warranted to more systematically investigate the effects of commensal microbes and patient risk factors on the colonization and transmission of C. auris. IMPORTANCECandida auris is a human pathogen of high concern due to its extensive antifungal drug resistance and high mortality rates associated with invasive infections. Candida auris skin colonization and persistence on environmental surfaces make this pathogen difficult to control once it enters a health care facility. Residents in long-term care hospitals and nursing homes are especially vulnerable. In this study, we developed microbiome sequencing protocols directly from surveillance samples, including amplicon and metagenomic sequencing, demonstrating concordance between sequencing results and culturing.
Collapse
|
8
|
Raita Y, Pérez-Losada M, Freishtat RJ, Harmon B, Mansbach JM, Piedra PA, Zhu Z, Camargo CA, Hasegawa K. Integrated omics endotyping of infants with respiratory syncytial virus bronchiolitis and risk of childhood asthma. Nat Commun 2021; 12:3601. [PMID: 34127671 PMCID: PMC8203688 DOI: 10.1038/s41467-021-23859-6] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 05/17/2021] [Indexed: 02/04/2023] Open
Abstract
Respiratory syncytial virus (RSV) bronchiolitis is not only the leading cause of hospitalization in U.S. infants, but also a major risk factor for asthma development. While emerging evidence suggests clinical heterogeneity within RSV bronchiolitis, little is known about its biologically-distinct endotypes. Here, we integrated clinical, virus, airway microbiome (species-level), transcriptome, and metabolome data of 221 infants hospitalized with RSV bronchiolitis in a multicentre prospective cohort study. We identified four biologically- and clinically-meaningful endotypes: A) clinicalclassicmicrobiomeM. nonliquefaciensinflammationIFN-intermediate, B) clinicalatopicmicrobiomeS. pneumoniae/M. catarrhalisinflammationIFN-high, C) clinicalseveremicrobiomemixedinflammationIFN-low, and D) clinicalnon-atopicmicrobiomeM.catarrhalisinflammationIL-6. Particularly, compared with endotype A infants, endotype B infants-who are characterized by a high proportion of IgE sensitization and rhinovirus coinfection, S. pneumoniae/M. catarrhalis codominance, and high IFN-α and -γ response-had a significantly higher risk for developing asthma (9% vs. 38%; OR, 6.00: 95%CI, 2.08-21.9; P = 0.002). Our findings provide an evidence base for the early identification of high-risk children during a critical period of airway development.
Collapse
Affiliation(s)
- Yoshihiko Raita
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
| | - Marcos Pérez-Losada
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, The George Washington University, Washington, DC, USA
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Robert J Freishtat
- Center for Genetic Medicine Research, Children's National Hospital, Washington, DC, USA
- Division of Emergency Medicine, Children's National Hospital, Washington, DC, USA
- Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Brennan Harmon
- Center for Genetic Medicine Research, Children's National Hospital, Washington, DC, USA
| | - Jonathan M Mansbach
- Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Pedro A Piedra
- Departments of Molecular Virology and Microbiology and Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Zhaozhong Zhu
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Carlos A Camargo
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Kohei Hasegawa
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Karagöz MA, Nalbantoglu OU. Taxonomic classification of metagenomic sequences from Relative Abundance Index profiles using deep learning. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102539] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
10
|
Integrative Transkingdom Analysis of the Gut Microbiome in Antibiotic Perturbation and Critical Illness. mSystems 2021; 6:6/2/e01148-20. [PMID: 33727397 PMCID: PMC8546997 DOI: 10.1128/msystems.01148-20] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Bacterial microbiota play a critical role in mediating local and systemic immunity, and shifts in these microbial communities have been linked to impaired outcomes in critical illness. Emerging data indicate that other intestinal organisms, including bacteriophages, viruses of eukaryotes, fungi, and protozoa, are closely interlinked with the bacterial microbiota and their host, yet their collective role during antibiotic perturbation and critical illness remains to be elucidated. We employed multi-omics factor analysis (MOFA) to systematically integrate the bacterial (16S rRNA), fungal (intergenic transcribed spacer 1 rRNA), and viral (virus discovery next-generation sequencing) components of the intestinal microbiota of 33 critically ill patients with and without sepsis and 13 healthy volunteers. In addition, we quantified the absolute abundances of bacteria and fungi using 16S and 18S rRNA PCRs and characterized the short-chain fatty acids (SCFAs) butyrate, acetate, and propionate using nuclear magnetic resonance spectroscopy. We observe that a loss of the anaerobic intestinal environment is directly correlated with an overgrowth of aerobic pathobionts and their corresponding bacteriophages as well as an absolute enrichment of opportunistic yeasts capable of causing invasive disease. We also observed a strong depletion of SCFAs in both disease states, which was associated with an increased absolute abundance of fungi with respect to bacteria. Therefore, these findings illustrate the complexity of transkingdom changes following disruption of the intestinal bacterial microbiome. IMPORTANCE While numerous studies have characterized antibiotic-induced disruptions of the bacterial microbiome, few studies describe how these disruptions impact the composition of other kingdoms such as viruses, fungi, and protozoa. To address this knowledge gap, we employed MOFA to systematically integrate viral, fungal, and bacterial sequence data from critically ill patients (with and without sepsis) and healthy volunteers, both prior to and following exposure to broad-spectrum antibiotics. In doing so, we show that modulation of the bacterial component of the microbiome has implications extending beyond this kingdom alone, enabling the overgrowth of potentially invasive fungi and viruses. While numerous preclinical studies have described similar findings in vitro, we confirm these observations in humans using an integrative analytic approach. These findings underscore the potential value of multi-omics data integration tools in interrogating how different components of the microbiota contribute to disease states. In addition, our findings suggest that there is value in further studying potential adjunctive therapies using anaerobic bacteria or SCFAs to reduce fungal expansion after antibiotic exposure, which could ultimately lead to improved outcomes in the intensive care unit (ICU).
Collapse
|
11
|
Anyansi C, Straub TJ, Manson AL, Earl AM, Abeel T. Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data. Front Microbiol 2020; 11:1925. [PMID: 33013732 PMCID: PMC7507117 DOI: 10.3389/fmicb.2020.01925] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 07/22/2020] [Indexed: 01/17/2023] Open
Abstract
Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.
Collapse
Affiliation(s)
- Christine Anyansi
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Timothy J. Straub
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Abigail L. Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Ashlee M. Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| |
Collapse
|
12
|
Fujiogi M, Camargo CA, Bernot JP, Freishtat RJ, Harmon B, Mansbach JM, Castro-Nallar E, Perez-Losada M, Hasegawa K. In infants with severe bronchiolitis: dual-transcriptomic profiling of nasopharyngeal microbiome and host response. Pediatr Res 2020; 88:144-146. [PMID: 31905367 PMCID: PMC7335686 DOI: 10.1038/s41390-019-0742-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Revised: 12/11/2019] [Accepted: 12/15/2019] [Indexed: 01/28/2023]
Affiliation(s)
- Michimasa Fujiogi
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
| | - Carlos A. Camargo
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | - James P. Bernot
- Department of Biostatistics and Bioinformatics, Computational Biology Institute, George Washington University, Washington, DC
| | - Robert J Freishtat
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC,Division of Emergency Medicine, Children’s National Hospital, Washington, DC,Departments of Pediatrics and Integrative Systems Biology and Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC
| | - Brennan Harmon
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC
| | - Jonathan M. Mansbach
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA
| | - Eduardo Castro-Nallar
- Department of Biostatistics and Bioinformatics, Computational Biology Institute, George Washington University, Washington, DC,Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andrés Bello, Santiago, Chile
| | - Marcos Perez-Losada
- Department of Biostatistics and Bioinformatics, Computational Biology Institute, George Washington University, Washington, DC,Department of Pediatrics, George Washington University School of Medicine and Health Sciences and the Division of Emergency Medicine, Children’s National Hospital, Washington, DC,CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Kohei Hasegawa
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
13
|
Hu Y, Fang L, Nicholson C, Wang K. Implications of Error-Prone Long-Read Whole-Genome Shotgun Sequencing on Characterizing Reference Microbiomes. iScience 2020; 23:101223. [PMID: 32563152 PMCID: PMC7305381 DOI: 10.1016/j.isci.2020.101223] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 05/09/2020] [Accepted: 05/28/2020] [Indexed: 01/16/2023] Open
Abstract
Long-read sequencing techniques, such as the Oxford Nanopore Technology, can generate reads that are tens of kilobases in length and are therefore particularly relevant for microbiome studies. However, owing to the higher per-base error rates than typical short-read sequencing, the application of long-read sequencing on microbiomes remains largely unexplored. Here we deeply sequenced two human microbiota mock community samples (HM-276D and HM-277D) from the Human Microbiome Project. We showed that assembly programs consistently achieved high accuracy (∼99%) and completeness (∼99%) for bacterial strains with adequate coverage. We also found that long-read sequencing provides accurate estimates of species-level abundance (R = 0.94 for 20 bacteria with abundance ranging from 0.005% to 64%). Our results not only demonstrate the feasibility of characterizing complete microbial genomes and populations from error-prone Nanopore sequencing data but also highlight necessary bioinformatics improvements for future metagenomics tool development.
Collapse
Affiliation(s)
- Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Christopher Nicholson
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
14
|
Barash E, Sal-Man N, Sabato S, Ziv-Ukelson M. BacPaCS-Bacterial Pathogenicity Classification via Sparse-SVM. Bioinformatics 2020; 35:2001-2008. [PMID: 30407484 DOI: 10.1093/bioinformatics/bty928] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 08/30/2018] [Accepted: 11/07/2018] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. RESULTS We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes. Our approach is based on sparse Support Vector Machines (SVM), which autonomously selects a small set of genes that are related to bacterial pathogenicity. We implement our approach as a tool-'Bacterial Pathogenicity Classification via sparse-SVM' (BacPaCS)-which is fully automated and handles datasets significantly larger than those previously used. BacPaCS shows high accuracy in distinguishing pathogenic from non-pathogenic bacteria, in a clinically relevant dataset, comprising only human-hosted bacteria. Among the genes that received the highest positive weight in the resulting classifier, we found genes that are known to be related to bacterial pathogenicity, in addition to novel candidates, whose involvement in bacterial virulence was never reported. AVAILABILITY AND IMPLEMENTATION The code and the resulting model are available at: https://github.com/barashe/bacpacs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eran Barash
- Department of Computer Science, Faculty of Natural Sciences
| | - Neta Sal-Man
- The Shraga Segal Department of Microbiology Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, BeerSheva, Israel
| | - Sivan Sabato
- Department of Computer Science, Faculty of Natural Sciences
| | | |
Collapse
|
15
|
Uelze L, Grützke J, Borowiak M, Hammerl JA, Juraschek K, Deneke C, Tausch SH, Malorny B. Typing methods based on whole genome sequencing data. ONE HEALTH OUTLOOK 2020; 2:3. [PMID: 33829127 PMCID: PMC7993478 DOI: 10.1186/s42522-020-0010-1] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/08/2020] [Indexed: 05/12/2023]
Abstract
Whole genome sequencing (WGS) of foodborne pathogens has become an effective method for investigating the information contained in the genome sequence of bacterial pathogens. In addition, its highly discriminative power enables the comparison of genetic relatedness between bacteria even on a sub-species level. For this reason, WGS is being implemented worldwide and across sectors (human, veterinary, food, and environment) for the investigation of disease outbreaks, source attribution, and improved risk characterization models. In order to extract relevant information from the large quantity and complex data produced by WGS, a host of bioinformatics tools has been developed, allowing users to analyze and interpret sequencing data, starting from simple gene-searches to complex phylogenetic studies. Depending on the research question, the complexity of the dataset and their bioinformatics skill set, users can choose between a great variety of tools for the analysis of WGS data. In this review, we describe the relevant approaches for phylogenomic studies for outbreak studies and give an overview of selected tools for the characterization of foodborne pathogens based on WGS data. Despite the efforts of the last years, harmonization and standardization of typing tools are still urgently needed to allow for an easy comparison of data between laboratories, moving towards a one health worldwide surveillance system for foodborne pathogens.
Collapse
Affiliation(s)
- Laura Uelze
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Josephine Grützke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Maria Borowiak
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Jens Andre Hammerl
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Katharina Juraschek
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Carlus Deneke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Simon H. Tausch
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Burkhard Malorny
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| |
Collapse
|
16
|
Conlan S, Lau AF, Deming C, Spalding CD, Lee-Lin S, Thomas PJ, Park M, Dekker JP, Frank KM, Palmore TN, Segre JA. Plasmid Dissemination and Selection of a Multidrug-Resistant Klebsiella pneumoniae Strain during Transplant-Associated Antibiotic Therapy. mBio 2019; 10:e00652-19. [PMID: 31594809 PMCID: PMC6786864 DOI: 10.1128/mbio.00652-19] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 09/03/2019] [Indexed: 12/14/2022] Open
Abstract
Antibiotics, which are used both to prevent and to treat infections, are a mainstay therapy for lifesaving procedures such as transplantation. For this reason, and many others, increased antibiotic resistance among human-associated pathogens, such as the carbapenem-resistant Enterobacteriaceae species, is of grave concern. In this study, we report on a hematopoietic stem cell transplant recipient in whom cultures detected the emergence of carbapenem resistance and spread across five strains of bacteria that persisted for over a year. Carbapenem resistance in Citrobacter freundii, Enterobacter cloacae, Klebsiella aerogenes, and Klebsiella pneumoniae was linked to a pair of plasmids, each carrying the Klebsiella pneumoniae carbapenemase gene (blaKPC). Surveillance cultures identified a carbapenem-susceptible strain of Citrobacter freundii that may have become resistant through horizontal gene transfer of these plasmids. Selection of a multidrug-resistant Klebsiella pneumoniae strain was also detected following combination antibiotic therapy. Here we report a plasmid carrying the blaKPC gene with broad host range that poses the additional threat of spreading to endogenous members of the human gut microbiome.IMPORTANCE Antibiotic-resistant bacteria are a serious threat to medically fragile patient populations. The spread of antibiotic resistance through plasmid-mediated mechanisms is of grave concern as it can lead to the conversion of endogenous patient-associated strains to difficult-to-treat pathogens.
Collapse
Affiliation(s)
- Sean Conlan
- National Human Genome Research Institute, Bethesda, Maryland, USA
| | - Anna F Lau
- National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Clay Deming
- National Human Genome Research Institute, Bethesda, Maryland, USA
| | | | | | - Pamela J Thomas
- National Institutes of Health Intramural Sequencing Center (NISC), Rockville, Maryland, USA
| | - Morgan Park
- National Institutes of Health Intramural Sequencing Center (NISC), Rockville, Maryland, USA
| | - John P Dekker
- National Institutes of Health Clinical Center, Bethesda, Maryland, USA
- National Institute of Allergy and Infectious Diseases, Bethesda, Maryland, USA
| | - Karen M Frank
- National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Tara N Palmore
- National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Julia A Segre
- National Human Genome Research Institute, Bethesda, Maryland, USA
| |
Collapse
|
17
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
18
|
Pereira De Martinis EC, Almeida OGGD. Relating next-generation sequencing and bioinformatics concepts to routine microbiological testing. ELECTRONIC JOURNAL OF GENERAL MEDICINE 2019. [DOI: 10.29333/ejgm/108690] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
19
|
Yang W, Huang L, Shi C, Wang L, Yu R. UltraStrain: An NGS-Based Ultra Sensitive Strain Typing Method for Salmonella enterica. Front Genet 2019; 10:276. [PMID: 31001322 PMCID: PMC6456706 DOI: 10.3389/fgene.2019.00276] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/12/2019] [Indexed: 11/13/2022] Open
Abstract
In the last few years, advances in next-generation sequencing (NGS) technology for whole genome sequencing (WGS) of foodborne pathogens have provided drastic improvements in food pathogen outbreak surveillance. WGS of foodborne pathogen enables identification of pathogens from food or environmental samples, including difficult-to-detect pathogens in culture-negative infections. Compared to traditional low-resolution methods such as the pulsed-field gel electrophoresis (PFGE), WGS provides advantages to differentiate even closely related strains of the same species, thus enables rapid identification of food-source associated with pathogen outbreak events for a fast mitigation plan. In this paper, we present UltraStrain, which is a fast and ultra sensitive pathogen detection and strain typing method for Salmonella enterica (S. enterica) based on WGS data analysis. In the proposed method, a noise filtering step is first performed where the raw sequencing data are mapped to a synthetic species-specific reference genome generated from S. enterica specific marker sequences to avoid potential interference from closely related species for low spike samples. After that, a statistical learning based method is used to identify candidate strains, from a database of known S. enterica strains, that best explain the retained S. enterica specific reads.Finally, a refinement step is further performed by mapping all the reads before filtering onto the identified top candidate strains, and recalculating the probability of presence for each candidate strain. Experiment results using both synthetic and real sequencing data show that the proposed method is able to identify the correct S. enterica strains from low-spike samples, and outperforms several existing strain-typing methods in terms of sensitivity and accuracy.
Collapse
Affiliation(s)
- Wenxian Yang
- Aginome-XMU Joint Lab, Xiamen University, Xiamen, China
| | - Lihong Huang
- School of Information Science and Engineering, Xiamen University, Xiamen, China
| | - Chong Shi
- School of Information Science and Engineering, Xiamen University, Xiamen, China
| | - Liansheng Wang
- School of Information Science and Engineering, Xiamen University, Xiamen, China
| | - Rongshan Yu
- Aginome-XMU Joint Lab, Xiamen University, Xiamen, China
- School of Information Science and Engineering, Xiamen University, Xiamen, China
| |
Collapse
|
20
|
Tirosh O, Conlan S, Deming C, Lee-Lin SQ, Huang X, Su HC, Freeman AF, Segre JA, Kong HH. Expanded skin virome in DOCK8-deficient patients. Nat Med 2018; 24:1815-1821. [PMID: 30397357 PMCID: PMC6286253 DOI: 10.1038/s41591-018-0211-7] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022]
Abstract
Human microbiome studies have revealed the intricate interplay of host immunity and bacterial communities to achieve homeostatic balance. Healthy skin microbial communities are dominated by bacteria with low viral representation1-3, mainly bacteriophage. Specific eukaryotic viruses have been implicated in both common and rare skin diseases, but cataloging skin viral communities has been limited. Alterations in host immunity provide an opportunity to expand our understanding of microbial-host interactions. Primary immunodeficient patients manifest with various viral, bacterial, fungal, and parasitic infections, including skin infections4. Dedicator of cytokinesis 8 (DOCK8) deficiency is a rare primary human immunodeficiency characterized by recurrent cutaneous and systemic infections, as well as atopy and cancer susceptibility5. DOCK8, encoding a guanine nucleotide exchange factor highly expressed in lymphocytes, regulates actin cytoskeleton, which is critical for migration through collagen-dense tissues such as skin6. Analyzing deep metagenomic sequencing data from DOCK8-deficient skin samples demonstrated a notable increase in eukaryotic viral representation and diversity compared with healthy volunteers. De novo assembly approaches identified hundreds of novel human papillomavirus genomes, illuminating microbial dark matter. Expansion of the skin virome in DOCK8-deficient patients underscores the importance of immune surveillance in controlling eukaryotic viral colonization and infection.
Collapse
Affiliation(s)
- Osnat Tirosh
- Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Sean Conlan
- Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Clay Deming
- Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Shih-Queen Lee-Lin
- Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Xin Huang
- Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Helen C Su
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD, USA
| | - Alexandra F Freeman
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD, USA
| | - Julia A Segre
- Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA.
| | - Heidi H Kong
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA.
- Dermatology Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, NIH, Bethesda, MD, USA.
| |
Collapse
|
21
|
LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach. Virus Res 2018; 258:55-63. [PMID: 30291874 DOI: 10.1016/j.virusres.2018.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/25/2018] [Accepted: 10/02/2018] [Indexed: 11/22/2022]
Abstract
Unbiased sequencing is an upcoming method to gain information of the microbiome in a sample and for the detection of unrecognized pathogens. There are many software tools for a taxonomic classification of such metagenomics datasets available. Numerous of them have a satisfactory sensitivity and specificity for known organisms, but they fail if the sample contains unknown organisms, which cannot be detected by similarity-based classification employing available databases. However, recognition of unknowns is especially important for the detection of newly emerging pathogens, which are often RNA viruses. Here we present the composition-based analysis tool LVQ-KNN for binning unclassified nucleotide sequence reads into their provenance classes DNA or RNA. With a 5-fold cross-validation, LVQ-KNN reached correct classification rates (CCR) of up to 99.9% for the classification into DNA/RNA. Real datasets gained CCRs of up to 94.5%. Comparing the method to another composition-based analysis tool, similar or better classification results were reached. LVQ-KNN is a new tool for DNA/RNA classification of sequence reads from unbiased sequencing approaches that could be applicable for the detection of yet unknown RNA viruses in metagenomic samples. The source-code, training and test data for LVQ-KNN is available at Github (https://github.com/ab1989/LVQ-KNN).
Collapse
|
22
|
Metagenomics for Clinical Infectious Disease Diagnostics Steps Closer to Reality. J Clin Microbiol 2018; 56:JCM.00850-18. [PMID: 29976592 DOI: 10.1128/jcm.00850-18] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Metagenomics approaches based on shotgun next-generation sequencing hold promise for infectious disease diagnostics. Despite substantial challenges that remain, work done over the past few years justifies excitement about the potential for these approaches to transform how clinical pathogen identification and analysis are performed. In an article in this issue of the Journal of Clinical Microbiology, M. I. Ivy et al. (J Clin Microbiol 56:e00402-18, 2018, https://doi.org/10.1128/JCM.00402-18) have applied a shotgun metagenomics approach to the diagnosis of prosthetic joint infections directly from synovial fluid. The results from this work demonstrate both the potentials and challenges of this approach applied in the clinical microbiology laboratory.
Collapse
|
23
|
Greathouse KL, White JR, Vargas AJ, Bliskovsky VV, Beck JA, von Muhlinen N, Polley EC, Bowman ED, Khan MA, Robles AI, Cooks T, Ryan BM, Padgett N, Dzutsev AH, Trinchieri G, Pineda MA, Bilke S, Meltzer PS, Hokenstad AN, Stickrod TM, Walther-Antonio MR, Earl JP, Mell JC, Krol JE, Balashov SV, Bhat AS, Ehrlich GD, Valm A, Deming C, Conlan S, Oh J, Segre JA, Harris CC. Interaction between the microbiome and TP53 in human lung cancer. Genome Biol 2018; 19:123. [PMID: 30143034 PMCID: PMC6109311 DOI: 10.1186/s13059-018-1501-6] [Citation(s) in RCA: 264] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 08/02/2018] [Indexed: 12/19/2022] Open
Abstract
Background Lung cancer is the leading cancer diagnosis worldwide and the number one cause of cancer deaths. Exposure to cigarette smoke, the primary risk factor in lung cancer, reduces epithelial barrier integrity and increases susceptibility to infections. Herein, we hypothesize that somatic mutations together with cigarette smoke generate a dysbiotic microbiota that is associated with lung carcinogenesis. Using lung tissue from 33 controls and 143 cancer cases, we conduct 16S ribosomal RNA (rRNA) bacterial gene sequencing, with RNA-sequencing data from lung cancer cases in The Cancer Genome Atlas serving as the validation cohort. Results Overall, we demonstrate a lower alpha diversity in normal lung as compared to non-tumor adjacent or tumor tissue. In squamous cell carcinoma specifically, a separate group of taxa are identified, in which Acidovorax is enriched in smokers. Acidovorax temporans is identified within tumor sections by fluorescent in situ hybridization and confirmed by two separate 16S rRNA strategies. Further, these taxa, including Acidovorax, exhibit higher abundance among the subset of squamous cell carcinoma cases with TP53 mutations, an association not seen in adenocarcinomas. Conclusions The results of this comprehensive study show both microbiome-gene and microbiome-exposure interactions in squamous cell carcinoma lung cancer tissue. Specifically, tumors harboring TP53 mutations, which can impair epithelial function, have a unique bacterial consortium that is higher in relative abundance in smoking-associated tumors of this type. Given the significant need for clinical diagnostic tools in lung cancer, this study may provide novel biomarkers for early detection. Electronic supplementary material The online version of this article (10.1186/s13059-018-1501-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- K Leigh Greathouse
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA.,Present Address: Nutrition Sciences, Baylor University, Waco, TX, 97346, USA
| | | | - Ashely J Vargas
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Valery V Bliskovsky
- Center for Cancer Research Genomics Core, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Jessica A Beck
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Natalia von Muhlinen
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Eric C Polley
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA
| | - Elise D Bowman
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Mohammed A Khan
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Ana I Robles
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Tomer Cooks
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Bríd M Ryan
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA
| | - Noah Padgett
- Department of Educational Psychology, Baylor University, Waco, TX, 97346, USA
| | - Amiran H Dzutsev
- Laboratory of Experimental Immunology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Giorgio Trinchieri
- Laboratory of Experimental Immunology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Marbin A Pineda
- Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health Bethesda, Bethesda, MD, 20892, USA
| | - Sven Bilke
- Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health Bethesda, Bethesda, MD, 20892, USA
| | - Paul S Meltzer
- Genetics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health Bethesda, Bethesda, MD, 20892, USA
| | - Alexis N Hokenstad
- Department of Obstetrics and Gynecology, Mayo Clinic, Rochester, MN, USA
| | | | - Marina R Walther-Antonio
- Department of Obstetrics and Gynecology, Mayo Clinic, Rochester, MN, USA.,Department of Surgery, Mayo Clinic, Rochester, MN, 55905, USA
| | - Joshua P Earl
- Department of Microbiology and Immunology, Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA, 19129, USA
| | - Joshua C Mell
- Department of Microbiology and Immunology, Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA, 19129, USA
| | - Jaroslaw E Krol
- Department of Microbiology and Immunology, Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA, 19129, USA
| | - Sergey V Balashov
- Department of Microbiology and Immunology, Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA, 19129, USA
| | - Archana S Bhat
- Department of Microbiology and Immunology, Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA, 19129, USA
| | - Garth D Ehrlich
- Department of Microbiology and Immunology, Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA, 19129, USA
| | - Alex Valm
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Clayton Deming
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Sean Conlan
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Julia Oh
- Jackson Laboratory, Framingham, CT, 06032, USA
| | - Julie A Segre
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Curtis C Harris
- Laboratory of Human Carcinogenesis, Center for Cancer, Research, National Cancer Institute, National Institutes of Health, 37 Convent Dr., Rm 3068A, MSC 4258, Bethesda, MD, 20892-4258, USA.
| |
Collapse
|
24
|
Alves G, Wang G, Ogurtsov AY, Drake SK, Gucek M, Sacks DB, Yu YK. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:1721-1737. [PMID: 29873019 PMCID: PMC6061032 DOI: 10.1007/s13361-018-1986-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/30/2018] [Accepted: 04/25/2018] [Indexed: 05/30/2023]
Abstract
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Guanghui Wang
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Steven K Drake
- Critical Care Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Marjan Gucek
- Proteomics Core, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David B Sacks
- Department of Laboratory Medicine, Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
25
|
Impact of an Interdisciplinary Computational Research Section in a Department of Medicine: An 8-Year Perspective. Am J Med 2018; 131:846-851. [PMID: 29601802 DOI: 10.1016/j.amjmed.2018.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Accepted: 03/22/2018] [Indexed: 12/20/2022]
|
26
|
Wu JH, Wang CH, Ma YD, Lee GB. A nitrocellulose membrane-based integrated microfluidic system for bacterial detection utilizing magnetic-composite membrane microdevices and bacteria-specific aptamers. LAB ON A CHIP 2018; 18:1633-1640. [PMID: 29766180 DOI: 10.1039/c8lc00251g] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Bacteria such as Acinetobacter baumannii (AB) can cause serious infections, resulting in high mortality if not diagnosed early and treated properly; there is consequently a need for rapid and accurate detection of this bacterial species. Therefore, we developed a new, nitrocellulose-based microfluidic system featuring AB-specific aptamers capable of automating the bacterial detection process via the activity of microfluidic devices composed of magnetic-composite membranes. Electromagnets were used to actuate these microfluidic devices such that the entire diagnostic process could be conducted in the integrated microfluidic system within 40 minutes with a limit of detection as low as 450 CFU per reaction for AB. Aptamers were used to capture AB in complex samples on nitrocellulose membranes, and a simple colorimetric assay was used to estimate bacterial loads. Given the ease of use, portability, and sensitivity of this aptamer-based microfluidic system, it may hold great promise for point-of-care diagnostics.
Collapse
Affiliation(s)
- Jia-Han Wu
- Department of Power Mechanical Engineering, National Tsing Hua University, Hsinchu, 30013 Taiwan.
| | | | | | | |
Collapse
|
27
|
Hahn A, Bendall ML, Gibson KM, Chaney H, Sami I, Perez GF, Koumbourlis AC, McCaffrey TA, Freishtat RJ, Crandall KA. Benchmark Evaluation of True Single Molecular Sequencing to Determine Cystic Fibrosis Airway Microbiome Diversity. Front Microbiol 2018; 9:1069. [PMID: 29887843 PMCID: PMC5980964 DOI: 10.3389/fmicb.2018.01069] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 05/04/2018] [Indexed: 11/30/2022] Open
Abstract
Cystic fibrosis (CF) is an autosomal recessive disease associated with recurrent lung infections that can lead to morbidity and mortality. The impact of antibiotics for treatment of acute pulmonary exacerbations on the CF airway microbiome remains unclear with prior studies giving conflicting results and being limited by their use of 16S ribosomal RNA sequencing. Our primary objective was to validate the use of true single molecular sequencing (tSMS) and PathoScope in the analysis of the CF airway microbiome. Three control samples were created with differing amounts of Burkholderia cepacia, Pseudomonas aeruginosa, and Prevotella melaninogenica, three common bacteria found in cystic fibrosis lungs. Paired sputa were also obtained from three study participants with CF before and >6 days after initiation of antibiotics. Antibiotic resistant B. cepacia and P. aeruginosa were identified in concurrently obtained respiratory cultures. Direct sequencing was performed using tSMS, and filtered reads were aligned to reference genomes from NCBI using PathoScope and Kraken and unique clade-specific marker genes using MetaPhlAn. A total of 180–518 K of 6–12 million filtered reads were aligned for each sample. Detection of known pathogens in control samples was most successful using PathoScope. In the CF sputa, alpha diversity measures varied based on the alignment method used, but similar trends were found between pre- and post-antibiotic samples. PathoScope outperformed Kraken and MetaPhlAn in our validation study of artificial bacterial community controls and also has advantages over Kraken and MetaPhlAn of being able to determine bacterial strains and the presence of fungal organisms. PathoScope can be confidently used when evaluating metagenomic data to determine CF airway microbiome diversity.
Collapse
Affiliation(s)
- Andrea Hahn
- Division of Infectious Diseases, Children's National Health System, Washington, DC, United States.,Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, United States
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States.,Department of Microbiology, Immunology and Tropical Medicine, George Washington University School of Medicine and Health Sciences, Washington, DC, United States
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States
| | - Hollis Chaney
- Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, United States.,Division of Pulmonary and Sleep Medicine, Children's National Health System, Washington, DC, United States
| | - Iman Sami
- Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, United States.,Division of Pulmonary and Sleep Medicine, Children's National Health System, Washington, DC, United States
| | - Geovanny F Perez
- Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, United States.,Division of Pulmonary and Sleep Medicine, Children's National Health System, Washington, DC, United States
| | - Anastassios C Koumbourlis
- Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, United States.,Division of Pulmonary and Sleep Medicine, Children's National Health System, Washington, DC, United States
| | - Timothy A McCaffrey
- Division of Genomic Medicine, The George Washington University, Washington, DC, United States.,Department of Medicine, George Washington University School of Medicine and Health Sciences, Washington, DC, United States
| | - Robert J Freishtat
- Department of Pediatrics, George Washington University School of Medicine and Health Sciences, Washington, DC, United States.,Division of Emergency Medicine, Children's National Health System, Washington, DC, United States
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, United States
| |
Collapse
|
28
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
29
|
Quainoo S, Coolen JPM, van Hijum SAFT, Huynen MA, Melchers WJG, van Schaik W, Wertheim HFL. Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis. Clin Microbiol Rev 2017; 30:1015-1063. [PMID: 28855266 PMCID: PMC5608882 DOI: 10.1128/cmr.00016-17] [Citation(s) in RCA: 247] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Outbreaks of multidrug-resistant bacteria present a frequent threat to vulnerable patient populations in hospitals around the world. Intensive care unit (ICU) patients are particularly susceptible to nosocomial infections due to indwelling devices such as intravascular catheters, drains, and intratracheal tubes for mechanical ventilation. The increased vulnerability of infected ICU patients demonstrates the importance of effective outbreak management protocols to be in place. Understanding the transmission of pathogens via genotyping methods is an important tool for outbreak management. Recently, whole-genome sequencing (WGS) of pathogens has become more accessible and affordable as a tool for genotyping. Analysis of the entire pathogen genome via WGS could provide unprecedented resolution in discriminating even highly related lineages of bacteria and revolutionize outbreak analysis in hospitals. Nevertheless, clinicians have long been hesitant to implement WGS in outbreak analyses due to the expensive and cumbersome nature of early sequencing platforms. Recent improvements in sequencing technologies and analysis tools have rapidly increased the output and analysis speed as well as reduced the overall costs of WGS. In this review, we assess the feasibility of WGS technologies and bioinformatics analysis tools for nosocomial outbreak analyses and provide a comparison to conventional outbreak analysis workflows. Moreover, we review advantages and limitations of sequencing technologies and analysis tools and present a real-world example of the implementation of WGS for antimicrobial resistance analysis. We aimed to provide health care professionals with a guide to WGS outbreak analysis that highlights its benefits for hospitals and assists in the transition from conventional to WGS-based outbreak analysis.
Collapse
Affiliation(s)
- Scott Quainoo
- Department of Microbiology, Radboud University, Nijmegen, The Netherlands
| | - Jordy P M Coolen
- Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Sacha A F T van Hijum
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
- NIZO, Ede, The Netherlands
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Willem J G Melchers
- Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Willem van Schaik
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, United Kingdom
| | - Heiman F L Wertheim
- Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlands
| |
Collapse
|
30
|
Abstract
A new world of possibilities for “virus discovery” was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of “virus discovery” for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research.
Collapse
Affiliation(s)
- Dirk Höper
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany.
| | - Claudia Wylezich
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany
| | - Martin Beer
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany
| |
Collapse
|
31
|
Doggett NA, Mukundan H, Lefkowitz EJ, Slezak TR, Chain PS, Morse S, Anderson K, Hodge DR, Pillai S. Culture-Independent Diagnostics for Health Security. Health Secur 2017; 14:122-42. [PMID: 27314653 DOI: 10.1089/hs.2015.0074] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The past decade has seen considerable development in the diagnostic application of nonculture methods, including nucleic acid amplification-based methods and mass spectrometry, for the diagnosis of infectious diseases. The implications of these new culture-independent diagnostic tests (CIDTs) include bypassing the need to culture organisms, thus potentially affecting public health surveillance systems, which continue to use isolates as the basis of their surveillance programs and to assess phenotypic resistance to antimicrobial agents. CIDTs may also affect the way public health practitioners detect and respond to a bioterrorism event. In response to a request from the Department of Homeland Security, Los Alamos National Laboratory and the Centers for Disease Control and Prevention cosponsored a workshop to review the impact of CIDTs on the rapid detection and identification of biothreat agents. Four panel discussions were held that covered nucleic acid amplification-based diagnostics, mass spectrometry, antibody-based diagnostics, and next-generation sequencing. Exploiting the extensive expertise available at this workshop, we identified the key features, benefits, and limitations of the various CIDT methods for providing rapid pathogen identification that are critical to the response and mitigation of a bioterrorism event. After the workshop we conducted a thorough review of the literature, investigating the current state of these 4 culture-independent diagnostic methods. This article combines information from the literature review and the insights obtained at the workshop.
Collapse
|
32
|
Cox JW, Ballweg RA, Taft DH, Velayutham P, Haslam DB, Porollo A. A fast and robust protocol for metataxonomic analysis using RNAseq data. MICROBIOME 2017; 5:7. [PMID: 28103917 PMCID: PMC5244565 DOI: 10.1186/s40168-016-0219-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/05/2016] [Indexed: 05/03/2023]
Abstract
BACKGROUND Metagenomics is a rapidly emerging field aimed to analyze microbial diversity and dynamics by studying the genomic content of the microbiota. Metataxonomics tools analyze high-throughput sequencing data, primarily from 16S rRNA gene sequencing and DNAseq, to identify microorganisms and viruses within a complex mixture. With the growing demand for analysis of the functional microbiome, metatranscriptome studies attract more interest. To make metatranscriptomic data sufficient for metataxonomics, new analytical workflows are needed to deal with sparse and taxonomically less informative sequencing data. RESULTS We present a new protocol, IMSA+A, for accurate taxonomy classification based on metatranscriptome data of any read length that can efficiently and robustly identify bacteria, fungi, and viruses in the same sample. The new protocol improves accuracy by using a conservative reference database, employing a new counting scheme, and by assembling shotgun reads. Assembly also reduces analysis runtime. Simulated data were utilized to evaluate the protocol by permuting common experimental variables. When applied to the real metatranscriptome data for mouse intestines colonized by ASF, the protocol showed superior performance in detection of the microorganisms compared to the existing metataxonomics tools. IMSA+A is available at https://github.com/JeremyCoxBMI/IMSA-A . CONCLUSIONS The developed protocol addresses the need for taxonomy classification from RNAseq data. Previously not utilized, i.e., unmapped to a reference genome, RNAseq reads can now be used to gather taxonomic information about the microbiota present in a biological sample without conducting additional sequencing. Any metatranscriptome pipeline that includes assembly of reads can add this analysis with minimal additional cost of compute time. The new protocol also creates an opportunity to revisit old metatranscriptome data, where taxonomic content may be important but was not analyzed.
Collapse
Affiliation(s)
- Jeremy W Cox
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Richard A Ballweg
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Diana H Taft
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Prakash Velayutham
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - David B Haslam
- Division of Infectious Diseases, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - Aleksey Porollo
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA.
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
33
|
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data. Sci Rep 2017; 7:39194. [PMID: 28051068 PMCID: PMC5209729 DOI: 10.1038/srep39194] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 11/18/2016] [Indexed: 12/20/2022] Open
Abstract
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.
Collapse
|
34
|
Trappe K, Marschall T, Renard BY. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. Bioinformatics 2016; 32:i595-i604. [DOI: 10.1093/bioinformatics/btw423] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
35
|
MetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies. PLoS One 2016; 11:e0160334. [PMID: 27479078 PMCID: PMC4968819 DOI: 10.1371/journal.pone.0160334] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 07/18/2016] [Indexed: 02/07/2023] Open
Abstract
Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few standardized methodologies, thereby hampering comparison of metagenomics studies between research groups. In this publication the new bioinformatics framework MetLab is presented, aimed at providing scientists with an integrated tool for experimental design and analysis of viral metagenomes. MetLab provides support in designing the metagenomics experiment by estimating the sequencing depth needed for the complete coverage of a species. This is achieved by applying a methodology to calculate the probability of coverage using an adaptation of Stevens’ theorem. It also provides scientists with several pipelines aimed at simplifying the analysis of viral metagenomes, including; quality control, assembly and taxonomic binning. We also implement a tool for simulating metagenomics datasets from several sequencing platforms. The overall aim is to provide virologists with an easy to use tool for designing, simulating and analyzing viral metagenomes. The results presented here include a benchmark towards other existing software, with emphasis on detection of viruses as well as speed of applications. This is packaged, as comprehensive software, readily available for Linux and OSX users at https://github.com/norling/metlab.
Collapse
|
36
|
Mulcahy-O'Grady H, Workentine ML. The Challenge and Potential of Metagenomics in the Clinic. Front Immunol 2016; 7:29. [PMID: 26870044 PMCID: PMC4737888 DOI: 10.3389/fimmu.2016.00029] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/19/2016] [Indexed: 12/27/2022] Open
Abstract
The bacteria, fungi, and viruses that live on and in us have a tremendous impact on our day-to-day health and are often linked to many diseases, including autoimmune disorders and infections. Diagnosing and treating these disorders relies on accurate identification and characterization of the microbial community. Current sequencing technologies allow the sequencing of the entire nucleic acid complement of a sample providing an accurate snapshot of the community members present in addition to the full genetic potential of that microbial community. There are a number of clinical applications that stand to benefit from these data sets, such as the rapid identification of pathogens present in a sample. Other applications include the identification of antibiotic-resistance genes, diagnosis and treatment of gastrointestinal disorders, and many other diseases associated with bacterial, viral, and fungal microbiomes. Metagenomics also allows the physician to probe more complex phenotypes such as microbial dysbiosis with intestinal disorders and disruptions of the skin microbiome that may be associated with skin disorders. Many of these disorders are not associated with a single pathogen but emerge as a result of complex ecological interactions within microbiota. Currently, we understand very little about these complex phenotypes, yet clearly they are important and in some cases, as with fecal microbiota transplants in Clostridium difficile infections, treating the microbiome of the patient is effective. Here, we give an overview of metagenomics and discuss a number of areas where metagenomics is applicable in the clinic, and progress being made in these areas. This includes (1) the identification of unknown pathogens, and those pathogens particularly hard to culture, (2) utilizing functional information and gene content to understand complex infections such as Clostridium difficile, and (3) predicting antimicrobial resistance of the community using genetic determinants of resistance identified from the sequencing data. All of these applications rely on sophisticated computational tools, and we also discuss the importance of skilled bioinformatic support for the implementation and use of metagenomics in the clinic.
Collapse
Affiliation(s)
- Heidi Mulcahy-O'Grady
- Infection Prevention and Control, Alberta Health Services, and Faculty of Medicine , Calgary, AB , Canada
| | | |
Collapse
|
37
|
Groah SL, Pérez-Losada M, Caldovic L, Ljungberg IH, Sprague BM, Castro-Nallar E, Chandel NJ, Hsieh MH, Pohl HG. Redefining Healthy Urine: A Cross-Sectional Exploratory Metagenomic Study of People With and Without Bladder Dysfunction. J Urol 2016; 196:579-87. [PMID: 26807926 DOI: 10.1016/j.juro.2016.01.088] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2016] [Indexed: 12/29/2022]
Abstract
PURPOSE We used the PathoScope platform to perform species level analyses of publicly available, 16S rRNA pyrosequenced, asymptomatic urine data to determine relationships between microbiomes, and clinical and functional phenotypes. MATERIALS AND METHODS We reanalyzed previously reported, cross-sectionally acquired urine samples from 47 asymptomatic subjects, including 23 controls and 24 subjects with neuropathic bladder. Urine was originally collected by the usual method of bladder drainage and analyzed by urinalysis, culture and pyrosequencing. Urinalysis and culture values were stratified as leukocyte esterase (0, or 1 or greater), nitrite (positive or negative), pyuria (fewer than 5, or 5 or greater white blood cells per high power field), cloudy urine (positive or negative) and urine culture bacterial growth (less than 50,000, or 50,000 or greater cfu/ml). PathoScope was used for next generation sequencing alignment, bacterial classification and microbial diversity characterization. RESULTS Subjects with neuropathic bladder were significantly more likely to have positive leukocyte esterase and pyuria, cloudy urine and bacterial growth. Of 47 samples 23 showed bacterial growth on culture and in all samples bacteria were identified by pyrosequencing. Nonneuropathic bladder urine microbiomes included greater proportions of Lactobacillus crispatus in females and Staphylococcus haemolyticus in males. The Lactobacillus community differed significantly among females depending on bladder function. Irrespective of gender the subjects with neuropathic bladder had greater proportions of Enterococcus faecalis, Proteus mirabilis and Klebsiella pneumonia. In 4 subjects with neuropathic bladder Actinobaculum sp. was detected by sequencing and by PathoScope but not by cultivation and in all cases it was associated with pyuria. CONCLUSIONS Using PathoScope plus 16S pyrosequencing we were able to identify unique, phenotype dependent, species level microbes. Novel findings included absent L. crispatus in the urine of females with neuropathic bladder and the presence of Actinobaculum only in subjects with neuropathic bladder.
Collapse
Affiliation(s)
- Suzanne L Groah
- MedStar National Rehabilitation Hospital, Washington, D.C.; Department of Rehabilitation Medicine, Georgetown University Hospital, Washington, D.C..
| | - Marcos Pérez-Losada
- Department of Integrative Systems Biology, Children's National Health System, Washington, D.C.; Computational Biology Institute, George Washington University, Ashburn, Virginia; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal
| | - Ljubica Caldovic
- Department of Integrative Systems Biology, Children's National Health System, Washington, D.C
| | | | - Bruce M Sprague
- Division of Urology, Children's National Health System, Washington, D.C
| | - Eduardo Castro-Nallar
- Computational Biology Institute, George Washington University, Ashburn, Virginia; Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Universidad Andres Bello, Santiago, Chile
| | - Neel J Chandel
- MedStar National Rehabilitation Hospital, Washington, D.C
| | - Michael H Hsieh
- Division of Urology, Children's National Health System, Washington, D.C
| | - Hans G Pohl
- Division of Urology, Children's National Health System, Washington, D.C
| |
Collapse
|
38
|
Kilianski A, Carcel P, Yao S, Roth P, Schulte J, Donarum GB, Fochler ET, Hill JM, Liem AT, Wiley MR, Ladner JT, Pfeffer BP, Elliot O, Petrosov A, Jima DD, Vallard TG, Melendrez MC, Skowronski E, Quan PL, Lipkin WI, Gibbons HS, Hirschberg DL, Palacios GF, Rosenzweig CN. Pathosphere.org: pathogen detection and characterization through a web-based, open source informatics platform. BMC Bioinformatics 2015; 16:416. [PMID: 26714571 PMCID: PMC4696252 DOI: 10.1186/s12859-015-0840-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 12/08/2015] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND The detection of pathogens in complex sample backgrounds has been revolutionized by wide access to next-generation sequencing (NGS) platforms. However, analytical methods to support NGS platforms are not as uniformly available. Pathosphere (found at Pathosphere.org) is a cloud - based open - sourced community tool that allows for communication, collaboration and sharing of NGS analytical tools and data amongst scientists working in academia, industry and government. The architecture allows for users to upload data and run available bioinformatics pipelines without the need for onsite processing hardware or technical support. RESULTS The pathogen detection capabilities hosted on Pathosphere were tested by analyzing pathogen-containing samples sequenced by NGS with both spiked human samples as well as human and zoonotic host backgrounds. Pathosphere analytical pipelines developed by Edgewood Chemical Biological Center (ECBC) identified spiked pathogens within a common sample analyzed by 454, Ion Torrent, and Illumina sequencing platforms. ECBC pipelines also correctly identified pathogens in human samples containing arenavirus in addition to animal samples containing flavivirus and coronavirus. These analytical methods were limited in the detection of sequences with limited homology to previous annotations within NCBI databases, such as parvovirus. Utilizing the pipeline-hosting adaptability of Pathosphere, the analytical suite was supplemented by analytical pipelines designed by the United States Army Medical Research Insititute of Infectious Diseases and Walter Reed Army Institute of Research (USAMRIID-WRAIR). These pipelines were implemented and detected parvovirus sequence in the sample that the ECBC iterative analysis previously failed to identify. CONCLUSIONS By accurately detecting pathogens in a variety of samples, this work demonstrates the utility of Pathosphere and provides a platform for utilizing, modifying and creating pipelines for a variety of NGS technologies developed to detect pathogens in complex sample backgrounds. These results serve as an exhibition for the existing pipelines and web-based interface of Pathosphere as well as the plug-in adaptability that allows for integration of newer NGS analytical software as it becomes available.
Collapse
Affiliation(s)
- Andy Kilianski
- Biosciences Division, Edgewood Chemical and Biological Center, 5183 Blackhawk Rd, Aberdeen Proving Ground, Edgewood, MD, 21010, USA.
| | | | - Shijie Yao
- OptiMetrics, Inc, Abingdon, MD, USA. .,Joint Genome Institute, Department of Energy, LBNL, Berkley, CA, USA.
| | - Pierce Roth
- Biosciences Division, Edgewood Chemical and Biological Center, 5183 Blackhawk Rd, Aberdeen Proving Ground, Edgewood, MD, 21010, USA. .,OptiMetrics, Inc, Abingdon, MD, USA.
| | | | | | | | - Jessica M Hill
- Biosciences Division, Edgewood Chemical and Biological Center, 5183 Blackhawk Rd, Aberdeen Proving Ground, Edgewood, MD, 21010, USA. .,OptiMetrics, Inc, Abingdon, MD, USA.
| | - Alvin T Liem
- Biosciences Division, Edgewood Chemical and Biological Center, 5183 Blackhawk Rd, Aberdeen Proving Ground, Edgewood, MD, 21010, USA. .,OptiMetrics, Inc, Abingdon, MD, USA.
| | - Michael R Wiley
- Center for Genome Sciences, United States Medical Research Institute of Infectious Diseases, Ft. Detrick, Frederick, MD, USA.
| | - Jason T Ladner
- Center for Genome Sciences, United States Medical Research Institute of Infectious Diseases, Ft. Detrick, Frederick, MD, USA.
| | - Bradley P Pfeffer
- Center for Genome Sciences, United States Medical Research Institute of Infectious Diseases, Ft. Detrick, Frederick, MD, USA.
| | - Oliver Elliot
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| | - Alexandra Petrosov
- The Center for Infection and Immunity, Columbia University, New York, NY, USA.
| | - Dereje D Jima
- Walter Reed Army Institute of Research, Viral Diseases Branch, Silver Spring, MD, USA.
| | - Tyghe G Vallard
- Walter Reed Army Institute of Research, Viral Diseases Branch, Silver Spring, MD, USA.
| | - Melanie C Melendrez
- Walter Reed Army Institute of Research, Viral Diseases Branch, Silver Spring, MD, USA.
| | | | - Phenix-Lan Quan
- The Center for Infection and Immunity, Columbia University, New York, NY, USA.
| | - W Ian Lipkin
- The Center for Infection and Immunity, Columbia University, New York, NY, USA.
| | - Henry S Gibbons
- Biosciences Division, Edgewood Chemical and Biological Center, 5183 Blackhawk Rd, Aberdeen Proving Ground, Edgewood, MD, 21010, USA.
| | - David L Hirschberg
- The Center for Infection and Immunity, Columbia University, New York, NY, USA. .,Department of Interdisciplinary Arts and Sciences, University of Washington Tacoma, Tacoma, WA, USA.
| | - Gustavo F Palacios
- Center for Genome Sciences, United States Medical Research Institute of Infectious Diseases, Ft. Detrick, Frederick, MD, USA.
| | - C Nicole Rosenzweig
- Biosciences Division, Edgewood Chemical and Biological Center, 5183 Blackhawk Rd, Aberdeen Proving Ground, Edgewood, MD, 21010, USA.
| |
Collapse
|
39
|
Abstract
SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.
Collapse
|
40
|
Aflitos SA, Severing E, Sanchez-Perez G, Peters S, de Jong H, de Ridder D. Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data. BMC Bioinformatics 2015; 16:352. [PMID: 26525298 PMCID: PMC4630969 DOI: 10.1186/s12859-015-0806-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Accepted: 10/29/2015] [Indexed: 12/05/2022] Open
Abstract
Background Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. Results We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level. Conclusion CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saulo Alves Aflitos
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands. .,Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands.
| | - Edouard Severing
- Laboratory of Genetics, Wageningen University, Wageningen, The Netherlands.
| | - Gabino Sanchez-Perez
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands. .,Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands.
| | - Sander Peters
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands.
| | - Hans de Jong
- Laboratory of Genetics, Wageningen University, Wageningen, The Netherlands.
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands.
| |
Collapse
|
41
|
Pérez-Losada M, Castro-Nallar E, Bendall ML, Freishtat RJ, Crandall KA. Dual Transcriptomic Profiling of Host and Microbiota during Health and Disease in Pediatric Asthma. PLoS One 2015; 10:e0131819. [PMID: 26125632 PMCID: PMC4488395 DOI: 10.1371/journal.pone.0131819] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 06/07/2015] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND High-throughput sequencing (HTS) analysis of microbial communities from the respiratory airways has heavily relied on the 16S rRNA gene. Given the intrinsic limitations of this approach, airway microbiome research has focused on assessing bacterial composition during health and disease, and its variation in relation to clinical and environmental factors, or other microbiomes. Consequently, very little effort has been dedicated to describing the functional characteristics of the airway microbiota and even less to explore the microbe-host interactions. Here we present a simultaneous assessment of microbiome and host functional diversity and host-microbe interactions from the same RNA-seq experiment, while accounting for variation in clinical metadata. METHODS Transcriptomic (host) and metatranscriptomic (microbiota) sequences from the nasal epithelium of 8 asthmatics and 6 healthy controls were separated in silico and mapped to available human and NCBI-NR protein reference databases. Human genes differentially expressed in asthmatics and controls were then used to infer upstream regulators involved in immune and inflammatory responses. Concomitantly, microbial genes were mapped to metabolic databases (COG, SEED, and KEGG) to infer microbial functions differentially expressed in asthmatics and controls. Finally, multivariate analysis was applied to find associations between microbiome characteristics and host upstream regulators while accounting for clinical variation. RESULTS AND DISCUSSION Our study showed significant differences in the metabolism of microbiomes from asthmatic and non-asthmatic children for up to 25% of the functional properties tested. Enrichment analysis of 499 differentially expressed host genes for inflammatory and immune responses revealed 43 upstream regulators differentially activated in asthma. Microbial adhesion (virulence) and Proteobacteria abundance were significantly associated with variation in the expression of the upstream regulator IL1A; suggesting that microbiome characteristics modulate host inflammatory and immune systems during asthma.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, George Washington University, Ashburn, Virginia, United States of America
- Division of Emergency Medicine, Children’s National Medical Center, Washington, DC, United States of America
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Eduardo Castro-Nallar
- Computational Biology Institute, George Washington University, Ashburn, Virginia, United States of America
- Universidad Andrés Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biológicas, Santiago, Chile
| | - Matthew L. Bendall
- Computational Biology Institute, George Washington University, Ashburn, Virginia, United States of America
| | - Robert J. Freishtat
- Division of Emergency Medicine, Children’s National Medical Center, Washington, DC, United States of America
| | - Keith A. Crandall
- Computational Biology Institute, George Washington University, Ashburn, Virginia, United States of America
| |
Collapse
|
42
|
Ames SK, Gardner SN, Marti JM, Slezak TR, Gokhale MB, Allen JE. Using populations of human and microbial genomes for organism detection in metagenomes. Genome Res 2015; 25:1056-67. [PMID: 25926546 PMCID: PMC4484388 DOI: 10.1101/gr.184879.114] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 04/28/2015] [Indexed: 12/16/2022]
Abstract
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.
Collapse
Affiliation(s)
- Sasha K Ames
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Shea N Gardner
- Global Security Computer Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | | | - Tom R Slezak
- Global Security Computer Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Maya B Gokhale
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Jonathan E Allen
- Global Security Computer Applications Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| |
Collapse
|
43
|
Scheuch M, Höper D, Beer M. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. BMC Bioinformatics 2015; 16:69. [PMID: 25886935 PMCID: PMC4351923 DOI: 10.1186/s12859-015-0503-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 02/20/2015] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. RESULTS To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. CONCLUSIONS RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Collapse
Affiliation(s)
- Matthias Scheuch
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| | - Dirk Höper
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| | - Martin Beer
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| |
Collapse
|
44
|
Du R, Mercante D, An L, Fang Z. A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads. ACTA ACUST UNITED AC 2014; 5. [PMID: 29710879 PMCID: PMC5922784 DOI: 10.4172/2155-6180.1000208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Background Categorizing protein coding sequences into one family, if the proteins they encode perform the same biochemical function, and then tabulating the relative abundances among all the families, is a widely-adopted practice for functional profiling of a metagenomic sample. By homology searching of metagenomic sequencing reads against a protein database, the relative abundance of a family can be represented by the number of reads aligned to its members. However, it has been observed that, for short reads generated by next-generation sequencing platforms, some may be erroneously assigned to the functional families they are not associated to. This commonly occurred phenomenon is termed as cross-annotation. Current methods for functional profiling of a metagenomic sample use empirical cutoff values, to select the alignments and ignore such cross-annotation problem, or employ summarized equation to do a simple adjustment. Result By introducing latent variables, we use the Probabilistic Latent Semantic Analysis to model the proportions of reads assigned to functional families in a metagenomic sample. The approach can be applied on a metagenomic sample after the list of the true functional families being obtained or estimated. It was implemented in metagenomic samples functionally characterized by the database of Clusters of Orthologous Groups of proteins, and successfully addressed the cross-annotation issue on both in vitro-simulated, bioinformatics tool simulated metagenomic samples, and a real-world data. Conclusions Correcting cross-annotation will increase the accuracy of the functional profiling of a metagenome generated by short reads. It will further benefit differential abundance analysis of metagenomic samples under different conditions.
Collapse
Affiliation(s)
- Ruofei Du
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana, USA.,Department of Agricultural and Bio-systems Engineering, University of Arizona, Tucson, Arizona, USA
| | - Donald Mercante
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana, USA
| | - Lingling An
- Department of Agricultural and Bio-systems Engineering, University of Arizona, Tucson, Arizona, USA
| | - Zhide Fang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana, USA
| |
Collapse
|
45
|
Single-molecule long-read 16S sequencing to characterize the lung microbiome from mechanically ventilated patients with suspected pneumonia. J Clin Microbiol 2014; 52:3913-21. [PMID: 25143582 DOI: 10.1128/jcm.01678-14] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
In critically ill patients, the development of pneumonia results in significant morbidity and mortality and additional health care costs. The accurate and rapid identification of the microbial pathogens in patients with pulmonary infections might lead to targeted antimicrobial therapy with potentially fewer adverse effects and lower costs. Major advances in next-generation sequencing (NGS) allow culture-independent identification of pathogens. The present study used NGS of essentially full-length PCR-amplified 16S ribosomal DNA from the bronchial aspirates of intubated patients with suspected pneumonia. The results from 61 patients demonstrated that sufficient DNA was obtained from 72% of samples, 44% of which (27 samples) yielded PCR amplimers suitable for NGS. Out of the 27 sequenced samples, only 20 had bacterial culture growth, while the microbiological and NGS identification of bacteria coincided in 17 (85%) of these samples. Despite the lack of bacterial growth in 7 samples that yielded amplimers and were sequenced, the NGS identified a number of bacterial species in these samples. Overall, a significant diversity of bacterial species was identified from the same genus as the predominant cultured pathogens. The numbers of NGS-identifiable bacterial genera were consistently higher than identified by standard microbiological methods. As technical advances reduce the processing and sequencing times, NGS-based methods will ultimately be able to provide clinicians with rapid, precise, culture-independent identification of bacterial, fungal, and viral pathogens and their antimicrobial sensitivity profiles.
Collapse
|