1
|
Plyusnin I, Vapalahti O, Sironen T, Kant R, Smura T. Enhanced Viral Metagenomics with Lazypipe 2. Viruses 2023; 15:v15020431. [PMID: 36851645 PMCID: PMC9960287 DOI: 10.3390/v15020431] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/29/2023] [Accepted: 01/31/2023] [Indexed: 02/08/2023] Open
Abstract
Viruses are the main agents causing emerging and re-emerging infectious diseases. It is therefore important to screen for and detect them and uncover the evolutionary processes that support their ability to jump species boundaries and establish themselves in new hosts. Metagenomic next-generation sequencing (mNGS) is a high-throughput, impartial technology that has enabled virologists to detect either known or novel, divergent viruses from clinical, animal, wildlife and environmental samples, with little a priori assumptions. mNGS is heavily dependent on bioinformatic analysis, with an emerging demand for integrated bioinformatic workflows. Here, we present Lazypipe 2, an updated mNGS pipeline with, as compared to Lazypipe1, significant improvements in code stability and transparency, with added functionality and support for new software components. We also present extensive benchmarking results, including evaluation of a novel canine simulated metagenome, precision and recall of virus detection at varying sequencing depth, and a low to extremely low proportion of viral genetic material. Additionally, we report accuracy of virus detection with two strategies: homology searches using nucleotide or amino acid sequences. We show that Lazypipe 2 with nucleotide-based annotation approaches near perfect detection for eukaryotic viruses and, in terms of accuracy, outperforms the compared pipelines. We also discuss the importance of homology searches with amino acid sequences for the detection of highly divergent novel viruses.
Collapse
Affiliation(s)
- Ilya Plyusnin
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
- Correspondence:
| | - Olli Vapalahti
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
- HUS Diagnostic Center, Clinical Microbiology, Helsinki University Hospital, University of Helsinki, 00029 Helsinki, Finland
| | - Tarja Sironen
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
| | - Ravi Kant
- Department of Veterinary Biosciences, University of Helsinki, 00014 Helsinki, Finland
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
- Department of Tropical Parasitology, Institute of Maritime and Tropical Medicine, Medical University of Gdansk, 81-519 Gdynia, Poland
| | - Teemu Smura
- Department of Virology, University of Helsinki, 00014 Helsinki, Finland
- HUS Diagnostic Center, Clinical Microbiology, Helsinki University Hospital, University of Helsinki, 00029 Helsinki, Finland
| |
Collapse
|
2
|
Reteng P, Nguyen Thuy L, Tran Thi Minh T, Mares-Guia MAMDM, Torres MC, de Filippis AMB, Orba Y, Kobayashi S, Hayashida K, Sawa H, Hall WW, Nguyen Thi LA, Yamagishi J. A targeted approach with nanopore sequencing for the universal detection and identification of flaviviruses. Sci Rep 2021; 11:19031. [PMID: 34561471 PMCID: PMC8463598 DOI: 10.1038/s41598-021-98013-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/27/2021] [Indexed: 11/09/2022] Open
Abstract
Nucleic acid test (NAT), most typically quantitative PCR, is one of the standard methods for species specific flavivirus diagnosis. Semi-comprehensive NATs such as pan-flavivirus PCR which covers genus Flavivirus are also available; however, further specification by sequencing is required for species level differentiation. In this study, a semi-comprehensive detection system that allows species differentiation of flaviviruses was developed by integration of the pan-flavivirus PCR and Nanopore sequencing. In addition, a multiplexing method was established by adding index sequences through the PCR with a streamlined bioinformatics pipeline. This enables defining cut-off values for observed read counts. In the laboratory setting, this approach allowed the detection of up to nine different flaviviruses. Using clinical samples collected in Vietnam and Brazil, seven different flaviviruses were also detected. When compared to a commercial NAT, the sensitivity and specificity of our system were 66.7% and 95.4%, respectively. Conversely, when compared to our system, the sensitivity and specificity of the commercial NAT were 57.1% and 96.9%, respectively. In addition, Nanopore sequencing detected more positive samples (n = 8) compared to the commercial NAT (n = 6). Collectively, our study has established a semi-comprehensive sequencing-based diagnostic system for the detection of flaviviruses at extremely affordable costs, considerable sensitivity, and only requires simple experimental methods.
Collapse
Affiliation(s)
- Patrick Reteng
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Linh Nguyen Thuy
- Center for Bio-Medical Research, National Institute of Hygiene and Epidemiology, Hanoi, Vietnam
| | - Tam Tran Thi Minh
- Center for Bio-Medical Research, National Institute of Hygiene and Epidemiology, Hanoi, Vietnam
| | | | | | | | - Yasuko Orba
- Division of Molecular Pathobiology, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Shintaro Kobayashi
- Laboratory of Public Health, Faculty of Veterinary Medicine, Hokkaido University, Sapporo, Japan
| | - Kyoko Hayashida
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
- International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
| | - Hirofumi Sawa
- Division of Molecular Pathobiology, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
- International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
- Global Virus Network, Baltimore, USA
| | - William W Hall
- International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan
- Global Virus Network, Baltimore, USA
- National Virus Reference Laboratory, University College Dublin, Dublin, Ireland
- Ireland Vietnam Blood-Borne Virus Initiative (IVVI), Dublin, Ireland
| | - Lan Anh Nguyen Thi
- Center for Bio-Medical Research, National Institute of Hygiene and Epidemiology, Hanoi, Vietnam
| | - Junya Yamagishi
- Division of Collaboration and Education, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan.
- International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Sapporo, Japan.
| |
Collapse
|
3
|
Presence of complete murine viral genome sequences in patient-derived xenografts. Nat Commun 2021; 12:2031. [PMID: 33795676 PMCID: PMC8017013 DOI: 10.1038/s41467-021-22200-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Accepted: 03/01/2021] [Indexed: 12/13/2022] Open
Abstract
Patient-derived xenografts are crucial for drug development but their use is challenged by issues such as murine viral infection. We evaluate the scope of viral infection and its impact on patient-derived xenografts by taking an unbiased data-driven approach to analyze unmapped RNA-Seq reads from 184 experiments. We find and experimentally validate the extensive presence of murine viral sequence reads covering entire viral genomes in patient-derived xenografts. The existence of viral sequences inside tumor cells is further confirmed by single cell sequencing data. Extensive chimeric reads containing both viral and human sequences are also observed. Furthermore, we find significantly changed expression levels of many cancer-, immune-, and drug metabolism-related genes in samples with high virus load. Our analyses indicate a need to carefully evaluate the impact of viral infection on patient-derived xenografts for drug development. They also point to a need for attention to quality control of patient-derived xenograft experiments.
Collapse
|
4
|
Rodriguez RM, Khadka VS, Menor M, Hernandez BY, Deng Y. Tissue-associated microbial detection in cancer using human sequencing data. BMC Bioinformatics 2020; 21:523. [PMID: 33272199 PMCID: PMC7713026 DOI: 10.1186/s12859-020-03831-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/19/2022] Open
Abstract
Cancer is one of the leading causes of morbidity and mortality in the globe. Microbiological infections account for up to 20% of the total global cancer burden. The human microbiota within each organ system is distinct, and their compositional variation and interactions with the human host have been known to attribute detrimental and beneficial effects on tumor progression. With the advent of next generation sequencing (NGS) technologies, data generated from NGS is being used for pathogen detection in cancer. Numerous bioinformatics computational frameworks have been developed to study viral information from host-sequencing data and can be adapted to bacterial studies. This review highlights existing popular computational frameworks that utilize NGS data as input to decipher microbial composition, which output can predict functional compositional differences with clinically relevant applicability in the development of treatment and prevention strategies.
Collapse
Affiliation(s)
- Rebecca M. Rodriguez
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
- NIDDK Central Repository, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, USA
| | - Vedbar S. Khadka
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Mark Menor
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Brenda Y. Hernandez
- Epidemiology, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
| | - Youping Deng
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| |
Collapse
|
5
|
Plyusnin I, Kant R, Jääskeläinen AJ, Sironen T, Holm L, Vapalahti O, Smura T. Novel NGS pipeline for virus discovery from a wide spectrum of hosts and sample types. Virus Evol 2020; 6:veaa091. [PMID: 33408878 PMCID: PMC7772471 DOI: 10.1093/ve/veaa091] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The study of the microbiome data holds great potential for elucidating the biological and metabolic functioning of living organisms and their role in the environment. Metagenomic analyses have shown that humans, along with for example, domestic animals, wildlife and arthropods, are colonized by an immense community of viruses. The current Coronavirus pandemic (COVID-19) heightens the need to rapidly detect previously unknown viruses in an unbiased way. The increasing availability of metagenomic data in this era of next-generation sequencing (NGS), along with increasingly affordable sequencing technologies, highlight the need for reliable and comprehensive methods to manage such data. In this article, we present a novel bioinformatics pipeline called LAZYPIPE for identifying both previously known and novel viruses in host associated or environmental samples and give examples of virus discovery based on it. LAZYPIPE is a Unix-based pipeline for automated assembling and taxonomic profiling of NGS libraries implemented as a collection of C++, Perl, and R scripts.
Collapse
Affiliation(s)
- Ilya Plyusnin
- Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland
| | - Ravi Kant
- Department of Veterinary Bioscience, University of Helsinki, Helsinki 00014, Finland
| | - Anne J Jääskeläinen
- Department of Virology and Immunology, University of Helsinki and Helsinki University Hospital, Helsinki 00014, Finland
| | - Tarja Sironen
- Department of Veterinary Bioscience, University of Helsinki, Helsinki 00014, Finland
| | - Liisa Holm
- Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland
| | - Olli Vapalahti
- Department of Veterinary Bioscience, University of Helsinki, Helsinki 00014, Finland
| | - Teemu Smura
- Department of Virology, University of Helsinki, Helsinki 00014, Finland
| |
Collapse
|
6
|
Yuan Z, Ye X, Zhu L, Zhang N, An Z, Zheng WJ. Virome assembly and annotation in brain tissue based on next-generation sequencing. Cancer Med 2020; 9:6776-6790. [PMID: 32738030 PMCID: PMC7520322 DOI: 10.1002/cam4.3325] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 06/20/2020] [Accepted: 07/01/2020] [Indexed: 12/15/2022] Open
Abstract
The glioblastoma multiforme (GBM) is one of the deadliest tumors. It has been speculated that virus plays a role in GBM but the evidences are controversy. Published researches are mainly limited to studies on the presence of human cytomegalovirus (HCMV) in GBM. No comprehensive assessment of the brain virome, the collection of viral material in the brain, based on recently sequenced data has been performed. Here, we characterized the virome from 111 GBM samples and 57 normal brain samples from eight projects in the SRA database by a tested and comprehensive assembly approach. The annotation of the assembled contigs showed that most viral sequences in the brain belong to the viral family Retroviridae. In some GBM samples, we also detected full genome sequence of a novel picornavirus recently discovered in invertebrates. Unlike previous reports, our study did not detect herpes virus such as HCMV in GBM from the data we used. However, some contigs that cannot be annotated with any known genes exhibited antibody epitopes in their sequences. These findings provide several avenues for potential cancer therapy: the newly discovered picornavirus could be a starting point to engineer novel oncolytic virus; and the exhibited antibody epitopes could be a source to explore potential drug targets for immune cancer therapy. By characterizing the virosphere in GBM and normal brain at a global level, the results from this study strengthen the link between GBM and viral infection which warrants the further investigation.
Collapse
Affiliation(s)
- Zihao Yuan
- School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Xiaohua Ye
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Lisha Zhu
- School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Ningyan Zhang
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - Zhiqiang An
- Texas Therapeutics InstituteInstitute of Molecular MedicineMcGovern Medical SchoolUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| | - W. Jim Zheng
- School of Biomedical InformaticsUniversity of Texas Health Science Center at HoustonHoustonTXUSA
| |
Collapse
|
7
|
Chen X, Kost J, Li D. Comprehensive comparative analysis of methods and software for identifying viral integrations. Brief Bioinform 2020; 20:2088-2097. [PMID: 30102374 DOI: 10.1093/bib/bby070] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022] Open
Abstract
Many viruses are capable of integrating in the human genome, particularly viruses involved in tumorigenesis. Viral integrations can be considered genetic markers for discovering virus-caused cancers and inferring cancer cell development. Next-generation sequencing (NGS) technologies have been widely used to screen for viral integrations in cancer genomes, and a number of bioinformatics tools have been developed to detect viral integrations using NGS data. However, there has been no systematic comparison of the methods or software. In this study, we performed a comprehensive comparative analysis of the designs, performance, functionality and limitations among the existing methods and software for detecting viral integrations. We further compared the sensitivity, precision and runtime of integration detection of four representative tools. Our analyses showed that each of the existing software had its own merits; however, none of them were sufficient for parallel or accurate virome-wide detection. After carefully evaluating the limitations shared by the existing methods, we proposed strategies and directions for developing virome-wide integration detection.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Cancer Center, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
8
|
Chen X, Kost J, Sulovari A, Wong N, Liang WS, Cao J, Li D. A virome-wide clonal integration analysis platform for discovering cancer viral etiology. Genome Res 2019; 29:819-830. [PMID: 30872350 PMCID: PMC6499315 DOI: 10.1101/gr.242529.118] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 03/11/2019] [Indexed: 12/31/2022]
Abstract
Oncoviral infection is responsible for 12%–15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contamination and noncausal viruses complicate the process of identifying genuine oncoviruses. Here, we propose a novel strategy to address these challenges by performing virome-wide screening of early-stage clonal viral integrations. To implement this strategy, we developed VIcaller, a novel platform for identifying viral integrations that are derived from any characterized viruses and shared by a large proportion of tumor cells using whole-genome sequencing (WGS) data. The sensitivity and precision were confirmed with simulated and benchmark cancer data sets. By applying this platform to cancer WGS data sets with proven or speculated viral etiology, we newly identified or confirmed clonal integrations of hepatitis B virus (HBV), human papillomavirus (HPV), Epstein-Barr virus (EBV), and BK Virus (BKV), suggesting the involvement of these viruses in early stages of tumorigenesis in affected tumors, such as HBV in TERT and KMT2B (also known as MLL4) gene loci in liver cancer, HPV and BKV in bladder cancer, and EBV in non-Hodgkin's lymphoma. We also showed the capacity of VIcaller to identify integrations from some uncharacterized viruses. This is the first study to systematically investigate the strategy and method of virome-wide screening of clonal integrations to identify oncoviruses. Searching clonal viral integrations with our platform has the capacity to identify virus-caused cancers and discover cancer viral etiologies.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Nathalie Wong
- Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong 999077, P.R. China
| | - Winnie S Liang
- Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| | - Jian Cao
- Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA.,Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
9
|
Xia Y, Liu Y, Deng M, Xi R. Detecting virus integration sites based on multiple related sequencing data by VirTect. BMC Med Genomics 2019; 12:19. [PMID: 30704462 PMCID: PMC6357354 DOI: 10.1186/s12920-018-0461-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Since tumor often has a high level of intra-tumor heterogeneity, multiple tumor samples from the same patient at different locations or different time points are often sequenced to study tumor intra-heterogeneity or tumor evolution. In virus-related tumors such as human papillomavirus- and Hepatitis B Virus-related tumors, virus genome integrations can be critical driving events. It is thus important to investigate the integration sites of the virus genomes. Currently, a few algorithms for detecting virus integration sites based on high-throughput sequencing have been developed, but their insufficient performance in their sensitivity, specificity and computational complexity hinders their applications in multiple related tumor sequencing. Results We develop VirTect for detecting virus integration sites simultaneously from multiple related-sample data. This algorithm is mainly based on the joint analysis of short reads spanning breakpoints of integration sites from multiple samples. To achieve high specificity and breakpoint accuracy, a local precise sandwich alignment algorithm is used. Simulation and real data analyses show that, compared with other algorithms, VirTect is significantly more sensitive and has a similar or lower false discovery rate. Conclusions VirTect can provide more accurate breakpoint position and is computationally much more efficient in terms both memory requirement and computational time. Electronic supplementary material The online version of this article (10.1186/s12920-018-0461-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuchao Xia
- School of Mathematical Sciences, Peking University, Beijing, 100871, China
| | - Yun Liu
- School of Mathematical Sciences, Peking University, Beijing, 100871, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, 100871, China.,Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Ruibin Xi
- School of Mathematical Sciences, Peking University, Beijing, 100871, China. .,Center for Statistical Science, Peking University, Beijing, 100871, China. .,Center for Data Science, Peking University, Beijing, 100871, China.
| |
Collapse
|
10
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
11
|
Chesnais V, Ott A, Chaplais E, Gabillard S, Pallares D, Vauloup-Fellous C, Benachi A, Costa JM, Ginoux E. Using massively parallel shotgun sequencing of maternal plasmatic cell-free DNA for cytomegalovirus DNA detection during pregnancy: a proof of concept study. Sci Rep 2018; 8:4321. [PMID: 29531245 PMCID: PMC5847603 DOI: 10.1038/s41598-018-22414-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 02/22/2018] [Indexed: 12/29/2022] Open
Abstract
Human cytomegalovirus (HCMV) primary infections of pregnant women can lead to congenital infections of the fetus that could have severe impacts on the health of the newborn. Recent studies have shown that 10-100 billion DNA fragments per milliliter of plasma are circulating cell-free. The study of this DNA has rapidly expanding applications to non-invasive prenatal testing (NIPT). In this study, we have shown that we can detect viral specific reads in the massively parallel shotgun sequencing (MPSS) NIPT data. We have also observed a strong correlation between the viral load of calibration samples and the number of reads aligned on the reference genome. Based on these observations we have constructed a statistical model able to quantify the viral load of patient samples. We propose to use this new method to detect and quantify circulating DNA virus like HCMV during pregnancy using the same sequencing results as NIPT data. This method could be used to improve the NIPT diagnosis.
Collapse
Affiliation(s)
| | | | | | | | | | - Christelle Vauloup-Fellous
- AP-HP, Hôpital Paul Brousse, Groupe Hospitalier Universitaire Paris-Sud, Virologie, Université Paris-Sud, INSERM U1193, Villejuif, France
| | - Alexandra Benachi
- AP-HP, Hôpital Antoine Béclère, Service de Gynécologie-Obstétrique et Médecine de la Reproduction, Université Paris-Sud, Clamart, France
| | | | | |
Collapse
|
12
|
Cao J, Li D. Searching for human oncoviruses: Histories, challenges, and opportunities. J Cell Biochem 2018; 119:4897-4906. [PMID: 29377246 DOI: 10.1002/jcb.26717] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 01/24/2018] [Indexed: 01/05/2023]
Abstract
Oncoviruses contribute significantly to cancer burden. A century of tumor virological studies have led to the discovery of seven well-accepted human oncoviruses, cumulatively responsible for approximately 15% of human cancer cases. Virus-caused cancers are largely preventable through vaccination. Identifying additional oncoviruses and virus-caused tumors will advance cancer prevention and precision medicine, benefiting affected individuals, and society as a whole. The historic success of finding human oncoviruses has provided a unique lesson for directing new research efforts in the post-sequencing era. Combing the experiences from these pioneer studies with emerging high-throughput techniques will certainly accelerate new discovery and advance our knowledge of the remaining human oncoviruses.
Collapse
Affiliation(s)
- Jian Cao
- Department of Pathology, Yale University, New Haven, Connecticut
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont.,Department of Computer Science, University of Vermont, Burlington, Vermont.,Neuroscience, Behavior, Health Initiative, University of Vermont, Burlington, Vermont.,University of Vermont Cancer Center, University of Vermont, Burlington, Vermont
| |
Collapse
|
13
|
Gannon OM, Antonsson A, Bennett IC, Saunders NA. Viral infections and breast cancer - A current perspective. Cancer Lett 2018; 420:182-189. [PMID: 29410005 DOI: 10.1016/j.canlet.2018.01.076] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/08/2018] [Accepted: 01/31/2018] [Indexed: 01/25/2023]
Abstract
Sporadic human breast cancer is the most common cancer to afflict women. Since the discovery, decades ago, of the oncogenic mouse mammary tumour virus, there has been significant interest in the potential aetiologic role of infectious agents in sporadic human breast cancer. To address this, many studies have examined the presence of viruses (e.g. papillomaviruses, herpes viruses and retroviruses), endogenous retroviruses and more recently, microbes, as a means of implicating them in the aetiology of human breast cancer. Such studies have generated conflicting experimental and clinical reports of the role of infection in breast cancer. This review evaluates the current evidence for a productive oncogenic viral infection in human breast cancer, with a focus on the integration of sensitive and specific next generation sequencing technologies with pathogen discovery. Collectively, the majority of the recent literature using the more powerful next generation sequencing technologies fail to support an oncogenic viral infection being involved in disease causality in breast cancer. In balance, the weight of the current experimental evidence supports the conclusion that viral infection is unlikely to play a significant role in the aetiology of breast cancer.
Collapse
Affiliation(s)
- O M Gannon
- University of Queensland Diamantina Institute, The Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - A Antonsson
- Department of Population Health, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Queensland 4006, Australia; School of Medicine, The University of Queensland, Herston Road, Herston, Queensland 4006, Australia
| | - I C Bennett
- School of Medicine, The University of Queensland, Herston Road, Herston, Queensland 4006, Australia; Private Practice, The Wesley and St Andrews Hospital, Auchenflower 4066, Australia
| | - N A Saunders
- University of Queensland Diamantina Institute, The Faculty of Medicine, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
14
|
Haglund F, Hallström BM, Nilsson IL, Höög A, Juhlin CC, Larsson C. Inflammatory infiltrates in parathyroid tumors. Eur J Endocrinol 2017; 177:445-453. [PMID: 28855268 PMCID: PMC5642267 DOI: 10.1530/eje-17-0277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 08/22/2017] [Accepted: 08/30/2017] [Indexed: 12/21/2022]
Abstract
CONTEXT Inflammatory infiltrates are sometimes present in solid tumors and may be coupled to clinical behavior or etiology. Infectious viruses contribute to tumorigenesis in a significant fraction of human neoplasias. OBJECTIVE Characterize inflammatory infiltrates and possible viral transcription in primary hyperparathyroidism. DESIGN From the period 2007 to 2016, a total of 55 parathyroid tumors (51 adenomas and 4 hyperplasias) with prominent inflammatory infiltrates were identified from more than 2000 parathyroid tumors in the pathology archives, and investigated by immunohistochemistry for CD4, CD8, CD20 and CD45 and scored as +0, +1 or +2. Clinicopathological data were compared to 142 parathyroid adenomas without histological evidence of inflammation. Transcriptome sequencing was performed for 13 parathyroid tumors (four inflammatory, 9 non-inflammatory) to identify potential viral transcripts. RESULTS Tumors had prominent germinal center-like nodular (+2) lymphocytic infiltrates consisting of T and B lymphocytes (31%) and/or diffuse (+1-2) infiltrates of predominantly CD8+ T lymphocytes (84%). In the majority of cases with adjacent normal parathyroid tissue, the normal rim was unaffected by the inflammatory infiltrates (96%). Presence of inflammatory infiltrates was associated with higher levels of serum-PTH (P = 0.007) and oxyphilic differentiation (P = 0.002). Co-existent autoimmune disease was observed in 27% of patients with inflammatory infiltrates, which in turn was associated with oxyphilic differentiation (P = 0.041). Additionally, prescription of anti-inflammatory drugs was associated with lower serum ionized calcium (P = 0.037). CONCLUSIONS No evidence of virus-like sequences in the parathyroid tumors could be found by transcriptome sequencing, suggesting that other factors may contribute to attract the immune system to the parathyroid tumor tissue.
Collapse
Affiliation(s)
- Felix Haglund
- Department of Oncology-PathologyKarolinska Institutet, Cancer Center Karolinska (CCK), Karolinska University Hospital, Stockholm, Sweden
- Correspondence should be addressed to F Haglund;
| | - Björn M Hallström
- Science for Life LaboratoryKTH-Royal Institute of Technology, Stockholm, Sweden
| | - Inga-Lena Nilsson
- Department of Molecular Medicine and SurgeryKarolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
- Department of Breast and Endocrine SurgeryKarolinska University Hospital, Stockholm, Sweden
| | - Anders Höög
- Department of Oncology-PathologyKarolinska Institutet, Cancer Center Karolinska (CCK), Karolinska University Hospital, Stockholm, Sweden
| | - C Christofer Juhlin
- Department of Oncology-PathologyKarolinska Institutet, Cancer Center Karolinska (CCK), Karolinska University Hospital, Stockholm, Sweden
| | - Catharina Larsson
- Department of Oncology-PathologyKarolinska Institutet, Cancer Center Karolinska (CCK), Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
15
|
Doggett NA, Mukundan H, Lefkowitz EJ, Slezak TR, Chain PS, Morse S, Anderson K, Hodge DR, Pillai S. Culture-Independent Diagnostics for Health Security. Health Secur 2017; 14:122-42. [PMID: 27314653 DOI: 10.1089/hs.2015.0074] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The past decade has seen considerable development in the diagnostic application of nonculture methods, including nucleic acid amplification-based methods and mass spectrometry, for the diagnosis of infectious diseases. The implications of these new culture-independent diagnostic tests (CIDTs) include bypassing the need to culture organisms, thus potentially affecting public health surveillance systems, which continue to use isolates as the basis of their surveillance programs and to assess phenotypic resistance to antimicrobial agents. CIDTs may also affect the way public health practitioners detect and respond to a bioterrorism event. In response to a request from the Department of Homeland Security, Los Alamos National Laboratory and the Centers for Disease Control and Prevention cosponsored a workshop to review the impact of CIDTs on the rapid detection and identification of biothreat agents. Four panel discussions were held that covered nucleic acid amplification-based diagnostics, mass spectrometry, antibody-based diagnostics, and next-generation sequencing. Exploiting the extensive expertise available at this workshop, we identified the key features, benefits, and limitations of the various CIDT methods for providing rapid pathogen identification that are critical to the response and mitigation of a bioterrorism event. After the workshop we conducted a thorough review of the literature, investigating the current state of these 4 culture-independent diagnostic methods. This article combines information from the literature review and the insights obtained at the workshop.
Collapse
|
16
|
VirusSeeker, a computational pipeline for virus discovery and virome composition analysis. Virology 2017; 503:21-30. [PMID: 28110145 DOI: 10.1016/j.virol.2017.01.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 01/07/2017] [Accepted: 01/10/2017] [Indexed: 01/21/2023]
Abstract
The advent of Next Generation Sequencing (NGS) has vastly increased our ability to discover novel viruses and to systematically define the spectrum of viruses present in a given specimen. Such studies have led to the discovery of novel viral pathogens as well as broader associations of the virome with diverse diseases including inflammatory bowel disease, severe acute malnutrition and HIV/AIDS. Critical to the success of these efforts are robust bioinformatic pipelines for rapid classification of microbial sequences. Existing computational tools are typically focused on either eukaryotic virus discovery or virome composition analysis but not both. Here we present VirusSeeker, a BLAST-based NGS data analysis pipeline designed for both purposes. VirusSeeker has been successfully applied in several previously published virome studies. Here we demonstrate the functionality of VirusSeeker in both novel virus discovery and virome composition analysis.
Collapse
|
17
|
Lux M, Krüger J, Rinke C, Maus I, Schlüter A, Woyke T, Sczyrba A, Hammer B. acdc - Automated Contamination Detection and Confidence estimation for single-cell genome data. BMC Bioinformatics 2016; 17:543. [PMID: 27998267 PMCID: PMC5168860 DOI: 10.1186/s12859-016-1397-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 11/29/2016] [Indexed: 01/05/2023] Open
Abstract
Background A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. Results We present acdc, a tool specifically developed to aid the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Conclusions Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1397-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Markus Lux
- Computational Methods for the Analysis of the Diversity and Dynamics of Genomes, Bielefeld University, Universitätsstr. 25, Bielefeld, 33615, Germany.
| | - Jan Krüger
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Christian Rinke
- Australian Centre for Ecogenomics, University of Queensland, ST LUCIA, Brisbane, QLD 4072, Australia
| | - Irena Maus
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Andreas Schlüter
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Tanja Woyke
- , 2800 Mitchell Drive, Walnut Creek, 94598, CA, USA
| | - Alexander Sczyrba
- Center for Biotechnology - CeBiTec, Bielefeld University, Universitätsstr. 27, Bielefeld, 33615, Germany
| | - Barbara Hammer
- CITEC centre of excellence, Bielefeld University, Inspiration 1, Bielefeld, 33619, Germany
| |
Collapse
|
18
|
Bullman S, Meyerson M, Kostic AD. Emerging Concepts and Technologies for the Discovery of Microorganisms Involved in Human Disease. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2016; 12:217-244. [PMID: 27959634 DOI: 10.1146/annurev-pathol-012615-044305] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Established infectious agents continue to be a major cause of human morbidity and mortality worldwide. However, the causative agent remains unknown for a wide range of diseases; many of these are suspected to be attributable to yet undiscovered microorganisms. The advent of unbiased high-throughput sequencing and bioinformatics has enabled rapid identification and molecular characterization of known and novel infectious agents in human disease. An exciting era of microbe discovery, now under way, holds great promise for the improvement of global health via the development of antimicrobial therapies, vaccination strategies, targeted public health measures, and probiotic-based preventions and therapies. Here, we review the history of pathogen discovery, discuss improvements and clinical applications for the detection of microbially associated diseases, and explore the challenges and strategies for establishing causation in human disease.
Collapse
Affiliation(s)
- Susan Bullman
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215; , .,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Matthew Meyerson
- Dana-Farber Cancer Institute, Boston, Massachusetts 02215; , .,Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142.,Harvard Medical School, Boston, Massachusetts 02115
| | - Aleksandar D Kostic
- Research Division, Joslin Diabetes Center, Boston, Massachusetts 02215; .,Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115
| |
Collapse
|
19
|
Divergent viral presentation among human tumors and adjacent normal tissues. Sci Rep 2016; 6:28294. [PMID: 27339696 PMCID: PMC4919655 DOI: 10.1038/srep28294] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 05/26/2016] [Indexed: 12/13/2022] Open
Abstract
We applied a newly developed bioinformatics system called VirusScan to investigate the viral basis of 6,813 human tumors and 559 adjacent normal samples across 23 cancer types and identified 505 virus positive samples with distinctive, organ system- and cancer type-specific distributions. We found that herpes viruses (e.g., subtypes HHV4, HHV5, and HHV6) that are highly prevalent across cancers of the digestive tract showed significantly higher abundances in tumor versus adjacent normal samples, supporting their association with these cancers. We also found three HPV16-positive samples in brain lower grade glioma (LGG). Further, recurrent HBV integration at the KMT2B locus is present in three liver tumors, but absent in their matched adjacent normal samples, indicating that viral integration induced host driver genetic alterations are required on top of viral oncogene expression for initiation and progression of liver hepatocellular carcinoma. Notably, viral integrations were found in many genes, including novel recurrent HPV integrations at PTPN13 in cervical cancer. Finally, we observed a set of HHV4 and HBV variants strongly associated with ethnic groups, likely due to viral sequence evolution under environmental influences. These findings provide important new insights into viral roles of tumor initiation and progression and potential new therapeutic targets.
Collapse
|
20
|
Next-generation sequencing of elite berry germplasm and data analysis using a bioinformatics pipeline for virus detection and discovery. Methods Mol Biol 2016; 1302:301-13. [PMID: 25981263 DOI: 10.1007/978-1-4939-2620-6_22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
Berry crops (members of the genera Fragaria, Ribes, Rubus, Sambucus, and Vaccinium) are known hosts for more than 70 viruses and new ones are identified continually. In modern berry cultivars, viruses tend to be asymptomatic in single infections and symptoms only develop after plants accumulate multiple viruses. Most certification programs are based on visual observations. Infected, asymptomatic material may be propagated in the nursery system and shipped to farms where plants acquire additional viruses and develop symptoms. This practice may result in disease epidemics with great impact to producers and the natural ecosystem alike. In this chapter we present work that allows for the detection of known and discovery of new viruses in elite germplasm, having the potential to greatly reduce virus dispersal associated with movement of propagation material.
Collapse
|
21
|
Mulcahy-O'Grady H, Workentine ML. The Challenge and Potential of Metagenomics in the Clinic. Front Immunol 2016; 7:29. [PMID: 26870044 PMCID: PMC4737888 DOI: 10.3389/fimmu.2016.00029] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/19/2016] [Indexed: 12/27/2022] Open
Abstract
The bacteria, fungi, and viruses that live on and in us have a tremendous impact on our day-to-day health and are often linked to many diseases, including autoimmune disorders and infections. Diagnosing and treating these disorders relies on accurate identification and characterization of the microbial community. Current sequencing technologies allow the sequencing of the entire nucleic acid complement of a sample providing an accurate snapshot of the community members present in addition to the full genetic potential of that microbial community. There are a number of clinical applications that stand to benefit from these data sets, such as the rapid identification of pathogens present in a sample. Other applications include the identification of antibiotic-resistance genes, diagnosis and treatment of gastrointestinal disorders, and many other diseases associated with bacterial, viral, and fungal microbiomes. Metagenomics also allows the physician to probe more complex phenotypes such as microbial dysbiosis with intestinal disorders and disruptions of the skin microbiome that may be associated with skin disorders. Many of these disorders are not associated with a single pathogen but emerge as a result of complex ecological interactions within microbiota. Currently, we understand very little about these complex phenotypes, yet clearly they are important and in some cases, as with fecal microbiota transplants in Clostridium difficile infections, treating the microbiome of the patient is effective. Here, we give an overview of metagenomics and discuss a number of areas where metagenomics is applicable in the clinic, and progress being made in these areas. This includes (1) the identification of unknown pathogens, and those pathogens particularly hard to culture, (2) utilizing functional information and gene content to understand complex infections such as Clostridium difficile, and (3) predicting antimicrobial resistance of the community using genetic determinants of resistance identified from the sequencing data. All of these applications rely on sophisticated computational tools, and we also discuss the importance of skilled bioinformatic support for the implementation and use of metagenomics in the clinic.
Collapse
Affiliation(s)
- Heidi Mulcahy-O'Grady
- Infection Prevention and Control, Alberta Health Services, and Faculty of Medicine , Calgary, AB , Canada
| | | |
Collapse
|
22
|
No association between HPV positive breast cancer and expression of human papilloma viral transcripts. Sci Rep 2015; 5:18081. [PMID: 26658849 PMCID: PMC4677295 DOI: 10.1038/srep18081] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 11/11/2015] [Indexed: 12/15/2022] Open
Abstract
Infectious agents are thought to be responsible for approximately 16% of cancers worldwide, however there are mixed reports in the literature as to the prevalence and potential pathogenicity of viruses in breast cancer. Furthermore, most studies to date have focused primarily on viral DNA rather than the expression of viral transcripts. We screened a large cohort of fresh frozen breast cancer and normal breast tissue specimens collected from patients in Australia for the presence of human papilloma virus (HPV) DNA, with an overall prevalence of HPV of 16% and 10% in malignant and non-malignant tissue respectively. Samples that were positive for HPV DNA by nested PCR were screened by RNA-sequencing for the presence of transcripts of viral origin, using three different bioinformatic pipelines. We did not find any evidence for HPV or other viral transcripts in HPV DNA positive samples. In addition, we also screened publicly available breast RNA-seq data sets for the presence of viral transcripts and did not find any evidence for the expression of viral transcripts (HPV or otherwise) in other data sets. This data suggests that transcription of viral genomes is unlikely to be a significant factor in breast cancer pathogenesis.
Collapse
|
23
|
Abstract
Diagnostic Microbiology is the tool that makes it possible to identify the exact etiology of infectious diseases and the most optimal therapy at the level of individual patients as well as communities. Conventional methods require time to grow the microbes in vitro under specific conditions and not all microbes are easily cultivable. This is followed by biochemical methods for identification which also require hours and sometimes days. Transport of the specimens under less than ideal conditions, prior use of antibiotics and small number of organisms are among the factors that render culture-based methods less reliable. Newer methods depend on amplification of nucleic acids followed by use of probes for identification. This mitigates the need for higher microbial load, presence of metabolically active viable organisms and shortens the time to reporting. These methods can be used to detect antibiotic resistance genes directly from the specimen and help direct targeted therapy. Since these methods will not fulfill all the diagnostic needs, a second approach is being used to shorten the time to identification after the organism has already grown. Mass spectrometry and bioinformatics are the tools making this possible. This review gives a historical perspective on diagnostic microbiology, discusses the pitfalls of current methodology and provides an overview of newer and future methods.
Collapse
Affiliation(s)
- N Khardori
- Department of Internal Medicine, Division of Infectious Disease, Department of Microbiology and Molecular Cell Biology, Eastern Virginia Medical School, Norfolk, Virginia, USA
| |
Collapse
|
24
|
Chandrani P, Kulkarni V, Iyer P, Upadhyay P, Chaubal R, Das P, Mulherkar R, Singh R, Dutt A. NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome. Br J Cancer 2015; 112:1958-65. [PMID: 25973533 PMCID: PMC4580395 DOI: 10.1038/bjc.2015.121] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Revised: 03/03/2015] [Accepted: 03/07/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Human papilloma virus (HPV) accounts for the most common cause of all virus-associated human cancers. Here, we describe the first graphic user interface (GUI)-based automated tool 'HPVDetector', for non-computational biologists, exclusively for detection and annotation of the HPV genome based on next-generation sequencing data sets. METHODS We developed a custom-made reference genome that comprises of human chromosomes along with annotated genome of 143 HPV types as pseudochromosomes. The tool runs on a dual mode as defined by the user: a 'quick mode' to identify presence of HPV types and an 'integration mode' to determine genomic location for the site of integration. The input data can be a paired-end whole-exome, whole-genome or whole-transcriptome data set. The HPVDetector is available in public domain for download: http://www.actrec.gov.in/pi-webpages/AmitDutt/HPVdetector/HPVDetector.html. RESULTS On the basis of our evaluation of 116 whole-exome, 23 whole-transcriptome and 2 whole-genome data, we were able to identify presence of HPV in 20 exomes and 4 transcriptomes of cervical and head and neck cancer tumour samples. Using the inbuilt annotation module of HPVDetector, we found predominant integration of viral gene E7, a known oncogene, at known 17q21, 3q27, 7q35, Xq28 and novel sites of integration in the human genome. Furthermore, co-infection with high-risk HPVs such as 16 and 31 were found to be mutually exclusive compared with low-risk HPV71. CONCLUSIONS HPVDetector is a simple yet precise and robust tool for detecting HPV from tumour samples using variety of next-generation sequencing platforms including whole genome, whole exome and transcriptome. Two different modes (quick detection and integration mode) along with a GUI widen the usability of HPVDetector for biologists and clinicians with minimal computational knowledge.
Collapse
Affiliation(s)
- P Chandrani
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - V Kulkarni
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - P Iyer
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - P Upadhyay
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - R Chaubal
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - P Das
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - R Mulherkar
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - R Singh
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
| | - A Dutt
- Advanced Centre for Treatment, Research and Education in Cancer, Tata Memorial Centre, Kharghar, Navi Mumbai, Maharashtra 410210, India
- E-mail:
| |
Collapse
|
25
|
Scheuch M, Höper D, Beer M. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. BMC Bioinformatics 2015; 16:69. [PMID: 25886935 PMCID: PMC4351923 DOI: 10.1186/s12859-015-0503-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 02/20/2015] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. RESULTS To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. CONCLUSIONS RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Collapse
Affiliation(s)
- Matthias Scheuch
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| | - Dirk Höper
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| | - Martin Beer
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Südufer 10, 17493, Greifswald - Insel Riems, Germany.
| |
Collapse
|
26
|
Wang Q, Jia P, Zhao Z. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization. Genome Med 2015; 7:2. [PMID: 25699093 PMCID: PMC4333248 DOI: 10.1186/s13073-015-0126-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/05/2015] [Indexed: 12/28/2022] Open
Abstract
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA ; Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA
| |
Collapse
|
27
|
Calistri A, Palu G. Editorial Commentary: Unbiased Next-Generation Sequencing and New Pathogen Discovery: Undeniable Advantages and Still-Existing Drawbacks. Clin Infect Dis 2015; 60:889-91. [DOI: 10.1093/cid/ciu913] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
28
|
Pyne S, Vullikanti AKS, Marathe MV. Big Data Applications in Health Sciences and Epidemiology. HANDBOOK OF STATISTICS 2015. [PMCID: PMC7152243 DOI: 10.1016/b978-0-444-63492-4.00008-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Saumyadipta Pyne
- Bioinformatics, CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, University of Hyderabad Campus, Hyderabad, India
- Public Health Foundation of India, New Delhi, India
- Corresponding author:
| | | | - Madhav V. Marathe
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA
- Network Dynamics and Simulation Science Laboratory, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
| |
Collapse
|
29
|
Ho T, Tzanetakis IE. Development of a virus detection and discovery pipeline using next generation sequencing. Virology 2014; 471-473:54-60. [PMID: 25461531 DOI: 10.1016/j.virol.2014.09.019] [Citation(s) in RCA: 105] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Revised: 08/28/2014] [Accepted: 09/22/2014] [Indexed: 12/13/2022]
Abstract
Next generation sequencing (NGS) has revolutionized virus discovery. Notwithstanding, a vertical pipeline, from sample preparation to data analysis, has not been available to the plant virology community. We developed a degenerate oligonucleotide primed RT-PCR method with multiple barcodes for NGS, and constructed VirFind, a bioinformatics tool specifically for virus detection and discovery able to: (i) map and filter out host reads, (ii) deliver files of virus reads with taxonomic information and corresponding Blastn and Blastx reports, and (iii) perform conserved domain search for reads of unknown origin. The pipeline was used to process more than 30 samples resulting in the detection of all viruses known to infect the processed samples, the extension of the genomic sequences of others, and the discovery of several novel viruses. VirFind was tested by four external users with datasets from plants or insects, demonstrating its potential as a universal virus detection and discovery tool.
Collapse
Affiliation(s)
- Thien Ho
- Department of Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville, AR, USA.
| | - Ioannis E Tzanetakis
- Department of Plant Pathology, Division of Agriculture, University of Arkansas System, Fayetteville, AR, USA.
| |
Collapse
|
30
|
Byrd AL, Perez-Rogers JF, Manimaran S, Castro-Nallar E, Toma I, McCaffrey T, Siegel M, Benson G, Crandall KA, Johnson WE. Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data. BMC Bioinformatics 2014; 15:262. [PMID: 25091138 PMCID: PMC4131054 DOI: 10.1186/1471-2105-15-262] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 07/31/2014] [Indexed: 11/17/2022] Open
Abstract
Background The use of sequencing technologies to investigate the microbiome of a sample can positively impact patient healthcare by providing therapeutic targets for personalized disease treatment. However, these samples contain genomic sequences from various sources that complicate the identification of pathogens. Results Here we present Clinical PathoScope, a pipeline to rapidly and accurately remove host contamination, isolate microbial reads, and identify potential disease-causing pathogens. We have accomplished three essential tasks in the development of Clinical PathoScope. First, we developed an optimized framework for pathogen identification using a computational subtraction methodology in concordance with read trimming and ambiguous read reassignment. Second, we have demonstrated the ability of our approach to identify multiple pathogens in a single clinical sample, accurately identify pathogens at the subspecies level, and determine the nearest phylogenetic neighbor of novel or highly mutated pathogens using real clinical sequencing data. Finally, we have shown that Clinical PathoScope outperforms previously published pathogen identification methods with regard to computational speed, sensitivity, and specificity. Conclusions Clinical PathoScope is the only pathogen identification method currently available that can identify multiple pathogens from mixed samples and distinguish between very closely related species and strains in samples with very few reads per pathogen. Furthermore, Clinical PathoScope does not rely on genome assembly and thus can more rapidly complete the analysis of a clinical sample when compared with current assembly-based methods. Clinical PathoScope is freely available at:
http://sourceforge.net/projects/pathoscope/. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-262) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Keith A Crandall
- Department of Bioinformatics, Boston University, Boston, MA, USA.
| | | |
Collapse
|
31
|
Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk KC, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E, Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett J, Miller S, Chiu CY. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 2014; 24:1180-92. [PMID: 24899342 PMCID: PMC4079973 DOI: 10.1101/gr.171934.113] [Citation(s) in RCA: 311] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times.
Collapse
Affiliation(s)
- Samia N Naccache
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Scot Federman
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Narayanan Veeraraghavan
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Matei Zaharia
- Department of Computer Science, University of California, Berkeley, California 94720, USA
| | - Deanna Lee
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Erik Samayoa
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Jerome Bouquet
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | | | - Ka-Cheung Luk
- Abbott Diagnostics, Abbott Park, Illinois 60064, USA
| | - Barryett Enge
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Debra A Wadford
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Sharon L Messenger
- Viral and Rickettsial Disease Laboratory, California Department of Public Health, Richmond, California 94804, USA
| | - Gillian L Genrich
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA
| | - Kristen Pellegrino
- Department of Family and Community Medicine, UCSF, San Francisco, California 94143, USA
| | - Gilda Grard
- Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon
| | - Eric Leroy
- Viral Emergent Diseases Unit, Centre International de Recherches Médicales de Franceville, Franceville, BP 769, Gabon
| | | | - Joseph N Fair
- Metabiota, Inc., San Francisco, California 94104, USA
| | - Miguel A Martínez
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico
| | - Pavel Isa
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, 62260, Mexico
| | - John A Crump
- Division of Infectious Diseases and International Health and the Duke Global Health Institute, Duke University Medical Center, Durham, North Carolina 27708, USA; Kilimanjaro Christian Medical Centre, Moshi, Kilimanjaro, 7393, Tanzania; Centre for International Health, University of Otago, Dunedin, 9054, New Zealand
| | - Joseph L DeRisi
- Department of Biochemistry, UCSF, San Francisco, California 94107, USA
| | - Taylor Sittler
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA
| | - John Hackett
- Abbott Diagnostics, Abbott Park, Illinois 60064, USA
| | - Steve Miller
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA
| | - Charles Y Chiu
- Department of Laboratory Medicine, UCSF, San Francisco, California 94107, USA; UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California 94107, USA; Department of Medicine, Division of Infectious Diseases, UCSF, San Francisco, California 94143, USA
| |
Collapse
|
32
|
Caboche S, Audebert C, Hot D. High-Throughput Sequencing, a VersatileWeapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology. Pathogens 2014; 3:258-79. [PMID: 25437800 PMCID: PMC4243446 DOI: 10.3390/pathogens3020258] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/28/2014] [Accepted: 03/20/2014] [Indexed: 12/19/2022] Open
Abstract
The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose.
Collapse
Affiliation(s)
- Ségolène Caboche
- FRE 3642 Molecular and Cellular Medecine, CNRS, Institut Pasteur de Lille and University Lille Nord de France, Lille 59019, France.
| | | | - David Hot
- FRE 3642 Molecular and Cellular Medecine, CNRS, Institut Pasteur de Lille and University Lille Nord de France, Lille 59019, France.
| |
Collapse
|
33
|
Cimino PJ, Zhao G, Wang D, Sehn JK, Lewis JS, Duncavage EJ. Detection of viral pathogens in high grade gliomas from unmapped next-generation sequencing data. Exp Mol Pathol 2014; 96:310-5. [PMID: 24704430 DOI: 10.1016/j.yexmp.2014.03.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 03/25/2014] [Indexed: 12/25/2022]
Abstract
Viral pathogens have been implicated in the development of certain cancers including human papillomavirus (HPV) in squamous cell carcinoma and Epstein-Barr virus (EBV) in Burkitt's lymphoma. The significance of viral pathogens in brain tumors is controversial, and human cytomegalovirus (HCMV) has been associated with glioblastoma (GBM) in some but not all studies, making the role of HCMV unclear. In this study we sought to determine if viral pathogen sequences could be identified in an unbiased manner from previously discarded, unmapped, non-human, next-generation sequencing (NGS) reads obtained from targeted oncology, panel-based sequencing of high grade gliomas (HGGs), including GBMs. Twenty one sequential HGG cases were analyzed by a targeted NGS clinical oncology panel containing 151 genes using DNA obtained from formalin-fixed, paraffin-embedded (FFPE) tissue. Sequencing reads that did not map to the human genome (average of 38,000 non-human reads/case (1.9%)) were filtered and low quality reads removed. Extracted high quality reads were then sequentially aligned to the National Center for Biotechnology Information (NCBI) non-redundant nucleotide (nt and nr) databases. Aligned reads were classified based on NCBI taxonomy database and all eukaryotic viral sequences were further classified into viral families. Two viral sequences (both herpesviruses), EBV and Roseolovirus were detected in 5/21 (24%) cases and in 1/21 (5%) cases, respectively. None of the cases had detectable HCMV. Of the five HGG cases with detectable EBV DNA, four had additional material for EBV in situ hybridization (ISH), all of which were negative for expressed viral sequence. Overall, a similar discovery approach using unmapped non-human NGS reads could be used to discover viral sequences in other cancer types.
Collapse
Affiliation(s)
- Patrick J Cimino
- Division of Neuropathology, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, United States
| | - Guoyan Zhao
- Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, United States
| | - David Wang
- Department of Molecular Microbiology, Washington University School of Medicine, Saint Louis, MO, United States; Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, United States
| | - Jennifer K Sehn
- Division of Anatomic and Molecular Pathology, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, United States
| | - James S Lewis
- Division of Anatomic and Molecular Pathology, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, United States; Department of Otolaryngology Head and Neck Surgery, Washington University School of Medicine, Saint Louis, MO, United States
| | - Eric J Duncavage
- Division of Anatomic and Molecular Pathology, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, United States.
| |
Collapse
|
34
|
Hong C, Manimaran S, Shen Y, Perez-Rogers JF, Byrd AL, Castro-Nallar E, Crandall KA, Johnson WE. PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples. MICROBIOME 2014; 2:33. [PMID: 25225611 PMCID: PMC4164323 DOI: 10.1186/2049-2618-2-33] [Citation(s) in RCA: 147] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Accepted: 07/23/2014] [Indexed: 05/20/2023]
Abstract
BACKGROUND Recent innovations in sequencing technologies have provided researchers with the ability to rapidly characterize the microbial content of an environmental or clinical sample with unprecedented resolution. These approaches are producing a wealth of information that is providing novel insights into the microbial ecology of the environment and human health. However, these sequencing-based approaches produce large and complex datasets that require efficient and sensitive computational analysis workflows. Many recent tools for analyzing metagenomic-sequencing data have emerged, however, these approaches often suffer from issues of specificity, efficiency, and typically do not include a complete metagenomic analysis framework. RESULTS We present PathoScope 2.0, a complete bioinformatics framework for rapidly and accurately quantifying the proportions of reads from individual microbial strains present in metagenomic sequencing data from environmental or clinical samples. The pipeline performs all necessary computational analysis steps; including reference genome library extraction and indexing, read quality control and alignment, strain identification, and summarization and annotation of results. We rigorously evaluated PathoScope 2.0 using simulated data and data from the 2011 outbreak of Shiga-toxigenic Escherichia coli O104:H4. CONCLUSIONS The results show that PathoScope 2.0 is a complete, highly sensitive, and efficient approach for metagenomic analysis that outperforms alternative approaches in scope, speed, and accuracy. The PathoScope 2.0 pipeline software is freely available for download at: http://sourceforge.net/projects/pathoscope/.
Collapse
Affiliation(s)
- Changjin Hong
- Computational Biomedicine, Boston University School of Medicine, 72 E Concord St. E645, Boston, MA 02118, USA
| | - Solaiappan Manimaran
- Computational Biomedicine, Boston University School of Medicine, 72 E Concord St. E645, Boston, MA 02118, USA
| | - Ying Shen
- Computational Biomedicine, Boston University School of Medicine, 72 E Concord St. E645, Boston, MA 02118, USA
| | - Joseph F Perez-Rogers
- Computational Biomedicine, Boston University School of Medicine, 72 E Concord St. E645, Boston, MA 02118, USA
- Bioinformatics Program, Boston University, Boston, MA 02125, USA
| | - Allyson L Byrd
- Bioinformatics Program, Boston University, Boston, MA 02125, USA
| | - Eduardo Castro-Nallar
- Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - William Evan Johnson
- Computational Biomedicine, Boston University School of Medicine, 72 E Concord St. E645, Boston, MA 02118, USA
- Bioinformatics Program, Boston University, Boston, MA 02125, USA
| |
Collapse
|
35
|
Bogich TL, Anthony SJ, Nichols JD. Surveillance theory applied to virus detection: a case for targeted discovery. Future Virol 2013. [DOI: 10.2217/fvl.13.105] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Virus detection and mathematical modeling have gone through rapid developments in the past decade. Both offer new insights into the epidemiology of infectious disease and characterization of future risk; however, modeling has not yet been applied to designing the best surveillance strategies for viral and pathogen discovery. We review recent developments and propose methods to integrate viral and pathogen discovery and mathematical modeling through optimal surveillance theory, arguing for a more targeted approach to novel virus detection guided by the principles of adaptive management and structured decision-making.
Collapse
Affiliation(s)
- Tiffany L Bogich
- Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
- Princeton University, Dept of Ecology & Evolutionary Biology, Princeton, NJ, USA
| | - Simon J Anthony
- Center for Infection & Immunity, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, NY, USA
- EcoHealth Alliance, 17th Floor, 460 West 34th Street, New York, NY, USA
| | - James D Nichols
- US Geological Survey, Patuxent Wildlife Research Center, Laurel, MD, USA
| |
Collapse
|
36
|
Abstract
Pathogen discovery is critically important to infectious diseases and public health. Nearly all new outbreaks are caused by the emergence of novel viruses. Genomic tools for pathogen discovery include consensus PCR, microarrays, and deep sequencing. Downstream studies are often necessary to link a candidate novel virus to a disease.
Viral pathogen discovery is of critical importance to clinical microbiology, infectious diseases, and public health. Genomic approaches for pathogen discovery, including consensus polymerase chain reaction (PCR), microarrays, and unbiased next-generation sequencing (NGS), have the capacity to comprehensively identify novel microbes present in clinical samples. Although numerous challenges remain to be addressed, including the bioinformatics analysis and interpretation of large datasets, these technologies have been successful in rapidly identifying emerging outbreak threats, screening vaccines and other biological products for microbial contamination, and discovering novel viruses associated with both acute and chronic illnesses. Downstream studies such as genome assembly, epidemiologic screening, and a culture system or animal model of infection are necessary to establish an association of a candidate pathogen with disease.
Collapse
|
37
|
Wang Q, Jia P, Zhao Z. VirusFinder: software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data. PLoS One 2013; 8:e64465. [PMID: 23717618 PMCID: PMC3663743 DOI: 10.1371/journal.pone.0064465] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 04/08/2013] [Indexed: 11/23/2022] Open
Abstract
Next generation sequencing (NGS) technologies allow us to explore virus interactions with host genomes that lead to carcinogenesis or other diseases; however, this effort is largely hindered by the dearth of efficient computational tools. Here, we present a new tool, VirusFinder, for the identification of viruses and their integration sites in host genomes using NGS data, including whole transcriptome sequencing (RNA-Seq), whole genome sequencing (WGS), and targeted sequencing data. VirusFinder’s unique features include the characterization of insertion loci of virus of arbitrary type in the host genome and high accuracy and computational efficiency as a result of its well-designed pipeline. The source code as well as additional data of VirusFinder is publicly available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
38
|
Lindner MS, Kollock M, Zickmann F, Renard BY. Analyzing genome coverage profiles with applications to quality control in metagenomics. Bioinformatics 2013; 29:1260-7. [DOI: 10.1093/bioinformatics/btt147] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|