1
|
Cadenas-Castrejón E, Verleyen J, Boukadida C, Díaz-González L, Taboada B. Evaluation of tools for taxonomic classification of viruses. Brief Funct Genomics 2023; 22:31-41. [PMID: 36335985 DOI: 10.1093/bfgp/elac036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 09/21/2022] [Accepted: 09/28/2022] [Indexed: 11/09/2022] Open
Abstract
Viruses are the most abundant infectious agents on earth, and they infect living organisms such as bacteria, plants and animals, among others. They play an important role in the balance of different ecosystems by modulating microbial populations. In humans, they are responsible for some common diseases and may cause severe illnesses. Viral metagenomic studies have become essential and offer the possibility to understand and extend the knowledge of virus diversity and functionality. For these approaches, an essential step is the classification of viral sequences. In this work, 11 taxonomic classification tools were compared by analysing their performances, in terms of sensitivity and precision, to classify reads at the species and family levels using the same (viral and nonviral) datasets and evaluation metrics, as well as their processing times and memory requirements. The results showed that factors such as richness (numbers of viral species in samples), taxonomic level in the classification and read length influence tool performance. High values of viral richness in samples decreased the performances of most tools. Additionally, the classifications were better at higher taxonomic levels, such as families, compared to lower taxonomic levels, such as species, and were more evident in short reads. The results also indicated that BLAST and Kraken2 were the best tools for classifying all types of reads, while FastViromeExplorer and VirusFinder were only good when used for long reads and Centrifuge, DIAMOND, and One Codex when used for short reads. Regarding nonviral datasets (human and bacterial), all tools correctly classified them as nonviral.
Collapse
|
2
|
VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data. Microbiol Spectr 2022; 10:e0256421. [PMID: 35234489 PMCID: PMC8941893 DOI: 10.1128/spectrum.02564-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.
Collapse
|
3
|
Tang X, Shang J, Sun Y. RdRp-based sensitive taxonomic classification of RNA viruses for metagenomic data. Brief Bioinform 2022; 23:6523411. [PMID: 35136930 PMCID: PMC8921650 DOI: 10.1093/bib/bbac011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 12/24/2021] [Accepted: 01/10/2022] [Indexed: 11/30/2022] Open
Abstract
With advances in library construction protocols and next-generation sequencing technologies, viral metagenomic sequencing has become the major source for novel virus discovery. Conducting taxonomic classification for metagenomic data is an important means to characterize the viral composition in the underlying samples. However, RNA viruses are abundant and highly diverse, jeopardizing the sensitivity of comparison-based classification methods. To improve the sensitivity of read-level taxonomic classification, we developed an RNA-dependent RNA polymerase (RdRp) gene-based read classification tool RdRpBin. It combines alignment-based strategy with machine learning models in order to fully exploit the sequence properties of RdRp. We tested our method and compared its performance with the state-of-the-art tools on the simulated and real sequencing data. RdRpBin competes favorably with all. In particular, when the query RNA viruses share low sequence similarity with the known viruses (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\sim 0.4$\end{document}), our tool can still maintain a higher F-score than the state-of-the-art tools. The experimental results on real data also showed that RdRpBin can classify more RNA viral reads with a relatively low false-positive rate. Thus, RdRpBin can be utilized to classify novel and diverged RNA viruses.
Collapse
Affiliation(s)
- Xubo Tang
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China SAR
| | - Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China SAR
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China SAR
| |
Collapse
|
4
|
Kohls M, Saremi B, Muchsin I, Fischer N, Becher P, Jung K. A resampling strategy for studying robustness in virus detection pipelines. Comput Biol Chem 2021; 94:107555. [PMID: 34364046 DOI: 10.1016/j.compbiolchem.2021.107555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 07/14/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Next-generation sequencing is regularly used to identify viral sequences in DNA or RNA samples of infected hosts. A major step of most pipelines for virus detection is to map sequence reads against known virus genomes. Due to small differences between the sequences of related viruses, and due to several biological or technical errors, mapping underlies uncertainties. As a consequence, the resulting list of detected viruses can lack robustness. A new approach for generating artificial sequencing reads together with a strategy of resampling from the original findings is proposed that can help to assess the robustness of the originally identified list of viruses. From the original mapping result in form of a SAM file, a set of statistical distributions are derived. These are used in the resampling pipeline to generate new artificial reads which are again mapped versus the reference genomes. By summarizing the resampling procedure, the analyst receives information about whether the presence of a particular virus in the sample gains or losses evidence, and thus about the robustness of the original mapping list but also that of individual viruses in this list. To judge robustness, several indicators are derived from the resampling procedure such as the correlation between original and resampling read counts, or the statistical detection of outliers in the differences of read counts. Additionally, graphical illustrations of read count shifts via Sankey diagrams are provided. To demonstrate the use of the new approach, the resampling approach is applied to three real-world data samples, one of them with laboratory-confirmed Influenza sequences, and to artificially generated data where virus sequences have been spiked into the sequencing data of a host. By applying the resampling pipeline, several viruses drop from the original list while new viruses emerge, showing robustness of those viruses that remain in the list. The evaluation of the new approach shows that the resampling approach is helpful to analyze the viral content of a biological sample, to rate the robustness of original findings and to better show the overall distribution of findings. The method is also applicable to other virus detection pipelines based on read mapping.
Collapse
Affiliation(s)
- Moritz Kohls
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany.
| | - Babak Saremi
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany.
| | - Ihsan Muchsin
- Institute for Virology and Immunobiology, University of Würzburg, Versbacher Straße 7, 97078 Würzburg, Germany.
| | - Nicole Fischer
- Institute of Medical Microbiology, Virology and Hygiene, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20251 Hamburg, Germany.
| | - Paul Becher
- Institute of Virology, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17, 30559 Hannover, Germany.
| | - Klaus Jung
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany.
| |
Collapse
|
5
|
Manso CF, Bibby DF, Mohamed H, Brown DWG, Zuckerman M, Mbisa JL. Enhanced Detection of DNA Viruses in the Cerebrospinal Fluid of Encephalitis Patients Using Metagenomic Next-Generation Sequencing. Front Microbiol 2020; 11:1879. [PMID: 32903437 PMCID: PMC7435129 DOI: 10.3389/fmicb.2020.01879] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 07/16/2020] [Indexed: 12/16/2022] Open
Abstract
The long and expanding list of viral pathogens associated with causing encephalitis confounds current diagnostic procedures, and in up to 50% of cases, the etiology remains undetermined. Sequence-agnostic metagenomic next-generation sequencing (mNGS) obviates the need to specify targets in advance and thus has great potential in encephalitis diagnostics. However, the low relative abundance of viral nucleic acids in clinical specimens poses a significant challenge. Our protocol employs two novel techniques to selectively remove human material at two stages, significantly increasing the representation of viral material. Our bioinformatic workflow using open source protein- and nucleotide sequence-matching software balances sensitivity and specificity in diagnosing and characterizing any DNA viruses present. A panel of 12 cerebrospinal fluid (CSFs) from encephalitis cases was retrospectively interrogated by mNGS, with concordant results in seven of nine samples with a definitive DNA virus diagnosis, and a different herpesvirus was identified in the other two. In two samples with an inconclusive diagnosis, DNA viruses were detected and in a virus-negative sample, no viruses were detected. This assay has the potential to detect DNA virus infections in cases of encephalitis of unknown etiology and to improve the current screening tests by identifying new and emerging agents.
Collapse
Affiliation(s)
- Carmen F Manso
- Virus Reference Department, Public Health England, London, United Kingdom
| | - David F Bibby
- Virus Reference Department, Public Health England, London, United Kingdom
| | - Hodan Mohamed
- Virus Reference Department, Public Health England, London, United Kingdom
| | - David W G Brown
- Virus Reference Department, Public Health England, London, United Kingdom.,Laboratorio de Virus Respiratorios e do Sarampo, Instituto Oswaldo Cruz/Fiocruz, Rio de Janeiro, Brazil
| | - Mark Zuckerman
- South London Specialist Virology Centre, King's College Hospital NHS Foundation Trust, London, United Kingdom
| | - Jean L Mbisa
- Virus Reference Department, Public Health England, London, United Kingdom
| |
Collapse
|
6
|
Dovrolis N, Kolios G, Spyrou GM, Maroulakou I. Computational profiling of the gut-brain axis: microflora dysbiosis insights to neurological disorders. Brief Bioinform 2020; 20:825-841. [PMID: 29186317 DOI: 10.1093/bib/bbx154] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 12/14/2022] Open
Abstract
Almost 2500 years after Hippocrates' observations on health and its direct association to the gastrointestinal tract, a paradigm shift has recently occurred, making the gut and its symbionts (bacteria, fungi, archaea and viruses) a point of convergence for studies. It is nowadays well established that the gut microflora's compositional diversity regulates via its genes (the microbiome) the host's health and provides preliminary insights into disease progression and regulation. The microbiome's involvement is evident in immunological and physiological studies that link changes in its biodiversity to its contributions to the host's phenotype but also in neurological investigations, substantiating the aptly named gut-brain axis. The definitive mechanisms of this last bidirectional interaction will be our main focus because it presents researchers with a new conundrum. In this review, we prospect current literature for computational analysis methodologies that accommodate the need for better understanding of the microbiome-gut-brain interactions and neurological disorder onset and progression, through cross-disciplinary systems biology applications. We will present bioinformatics tools used in exploring these synergies that help build and interpret microbial 16S ribosomal RNA data sets, produced by shotgun and high-throughput sequencing of healthy and neurological disorder samples stored in biological databases. These approaches provide alternative means for researchers to form hypotheses to their inquests faster, cheaper and swith precision. The goal of these studies relies on the integration of combined metagenomics and metabolomics assessments. An accurate characterization of the microbiome and its functionality can support new diagnostic, prognostic and therapeutic strategies for neurological disorders, customized for each individual host.
Collapse
|
7
|
Mendes CI, Lizarazo E, Machado MP, Silva DN, Tami A, Ramirez M, Couto N, Rossen JWA, Carriço JA. DEN-IM: dengue virus genotyping from amplicon and shotgun metagenomic sequencing. Microb Genom 2020; 6:e000328. [PMID: 32134380 PMCID: PMC7200064 DOI: 10.1099/mgen.0.000328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 12/23/2019] [Indexed: 11/18/2022] Open
Abstract
Dengue virus (DENV) represents a public health threat and economic burden in affected countries. The availability of genomic data is key to understanding viral evolution and dynamics, supporting improved control strategies. Currently, the use of high-throughput sequencing (HTS) technologies, which can be applied both directly to patient samples (shotgun metagenomics) and to PCR-amplified viral sequences (amplicon sequencing), is potentially the most informative approach to monitor viral dissemination and genetic diversity by providing, in a single methodological step, identification and characterization of the whole viral genome at the nucleotide level. Despite many advantages, these technologies require bioinformatics expertise and appropriate infrastructure for the analysis and interpretation of the resulting data. In addition, the many software solutions available can hamper the reproducibility and comparison of results. Here we present DEN-IM, a one-stop, user-friendly, containerized and reproducible workflow for the analysis of DENV short-read sequencing data from both amplicon and shotgun metagenomics approaches. It is able to infer the DENV coding sequence (CDS), identify the serotype and genotype, and generate a phylogenetic tree. It can easily be run on any UNIX-like system, from local machines to high-performance computing clusters, performing a comprehensive analysis without the requirement for extensive bioinformatics expertise. Using DEN-IM, we successfully analysed two types of DENV datasets. The first comprised 25 shotgun metagenomic sequencing samples from patients with variable serotypes and genotypes, including an in vitro spiked sample containing the four known serotypes. The second consisted of 106 paired-end and 76 single-end amplicon sequences of DENV 3 genotype III and DENV 1 genotype I, respectively, where DEN-IM allowed detection of the intra-genotype diversity. The DEN-IM workflow, parameters and execution configuration files, and documentation are freely available at https://github.com/B-UMMI/DEN-IM).
Collapse
Affiliation(s)
- Catarina I. Mendes
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - Erley Lizarazo
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - Miguel P. Machado
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Diogo N. Silva
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Adriana Tami
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - Mário Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Natacha Couto
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - John W. A. Rossen
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
| | - João A. Carriço
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
8
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
9
|
Soliman HK, Abouelhoda M, El Rouby MN, Ahmed OS, Esmat G, Hassan ZK, Hafez MM, Mehaney DA, Selvaraju M, Darwish RK, Osman YA, Zekri ARN. Whole-genome sequencing of human Pegivirus variant from an Egyptian patient co-infected with hepatitis C virus: a case report. Virol J 2019; 16:132. [PMID: 31711510 PMCID: PMC6849219 DOI: 10.1186/s12985-019-1242-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 10/14/2019] [Indexed: 01/02/2023] Open
Abstract
Background Human pegivirus (HPgV) is structurally similar to hepatitis C virus (HCV) and was discovered 20 years ago. Its distribution, natural history and exact rule of this viral group in human hosts remain unclear. Our aim was to determine, by deep next-generation sequencing (NGS), the entire genome sequence of HPgV that was discovered in an Egyptian patient while analyzing HCV sequence from the same patient. We also inspected whether the co-infection of HCV and HPgV will affect the patient response to HCV viral treatment. To the best of our knowledge, this is the first report for a newly isolated HPgV in an Egyptian patient who is co-infected with HCV. Case presentation The deep Next Generation Sequencing (NGS) technique was used to detect HCV sequence in hepatitis C patient’s plasma. The results revealed the presence of HPgV with HCV. This co-infection was confirmed using conventional PCR of the HPgV 5′ untranslated region. The patient was then subjected to direct-acting-antiviral treatment (DAA). At the end of the treatment, the patient showed a good response to the HCV treatment (i.e., no HCV-RNA was detected in the plasma), while the HPgV-RNA was still detected. Sequence alignment and phylogenetic analyses demonstrated that the detected HPgV was a novel isolate and was not previously published. Conclusion We report a new variant of HPgV in a patient suffering from hepatitis C viral infection.
Collapse
Affiliation(s)
- Hany K Soliman
- Cancer Biology Department, Virology and Immunology Unit, National Cancer Institute, Cairo University, Cairo, 11796, Egypt
| | - Mohamed Abouelhoda
- Systems and Biomedical Engineering Department, Faculty of Engineering, Cairo University, Cairo, 12613, Egypt
| | - Mahmoud N El Rouby
- Cancer Biology Department, Virology and Immunology Unit, National Cancer Institute, Cairo University, Cairo, 11796, Egypt
| | - Ola S Ahmed
- Cancer Biology Department, Virology and Immunology Unit, National Cancer Institute, Cairo University, Cairo, 11796, Egypt
| | - G Esmat
- Endemic Medicine and Hepatology Department, Faculty of Medicine, Cairo University, Cairo, 11562, Egypt
| | - Zeinab K Hassan
- Cancer Biology Department, Virology and Immunology Unit, National Cancer Institute, Cairo University, Cairo, 11796, Egypt
| | - Mohammed M Hafez
- Cancer Biology Department, Virology and Immunology Unit, National Cancer Institute, Cairo University, Cairo, 11796, Egypt
| | - Dina Ahmed Mehaney
- Clinical and chemical pathology Department, Faculty of Medicine, Cairo University, Cairo, 11562, Egypt
| | | | - Rania Kamal Darwish
- Clinical and chemical pathology Department, Faculty of Medicine, Cairo University, Cairo, 11562, Egypt
| | - Yehia A Osman
- Botany Department, Faculty of Science, Mansoura University, Mansoura, 33516, Egypt
| | - Abdel-Rahman N Zekri
- Cancer Biology Department, Virology and Immunology Unit, National Cancer Institute, Cairo University, Cairo, 11796, Egypt.
| |
Collapse
|
10
|
Genome Sequences of Ambystoma Tigrinum Virus Recovered during a Mass Die-off of Western Tiger Salamanders in Alberta, Canada. Microbiol Resour Announc 2019; 8:8/29/e00265-19. [PMID: 31320425 PMCID: PMC6639604 DOI: 10.1128/mra.00265-19] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Complete genome sequences of six Ambystoma tigrinum viruses (ATV) were determined directly from tail clips of western tiger salamanders (Ambystoma mavortium) from 2013 (high-mortality year) and 2014 (low-mortality year) in Alberta, Canada. The genome lengths ranged from 106,258 to 106,915 bp and contained 108 open reading frames encoding predicted proteins larger than 50 amino acids. Complete genome sequences of six Ambystoma tigrinum viruses (ATV) were determined directly from tail clips of western tiger salamanders (Ambystoma mavortium) from 2013 (high-mortality year) and 2014 (low-mortality year) in Alberta, Canada. The genome lengths ranged from 106,258 to 106,915 bp and contained 108 open reading frames encoding predicted proteins larger than 50 amino acids.
Collapse
|
11
|
Chen J, Huang J, Sun Y. TAR-VIR: a pipeline for TARgeted VIRal strain reconstruction from metagenomic data. BMC Bioinformatics 2019; 20:305. [PMID: 31164077 PMCID: PMC6549370 DOI: 10.1186/s12859-019-2878-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 05/07/2019] [Indexed: 12/15/2022] Open
Abstract
Background Strain-level RNA virus characterization is essential for developing prevention and treatment strategies. Viral metagenomic data, which can contain sequences of both known and novel viruses, provide new opportunities for characterizing RNA viruses. Although there are a number of pipelines for analyzing viruses in metagenomic data, they have different limitations. First, viruses that lack closely related reference genomes cannot be detected with high sensitivity. Second, strain-level analysis is usually missing. Results In this study, we developed a hybrid pipeline named TAR-VIR that reconstructs viral strains without relying on complete or high-quality reference genomes. It is optimized for identifying RNA viruses from metagenomic data by combining an effective read classification method and our in-house strain-level de novo assembly tool. TAR-VIR was tested on both simulated and real viral metagenomic data sets. The results demonstrated that TAR-VIR competes favorably with other tested tools. Conclusion TAR-VIR can be used standalone for viral strain reconstruction from metagenomic data. Or, its read recruiting stage can be used with other de novo assembly tools for superior viral functional and taxonomic analyses. The source code and the documentation of TAR-VIR are available at https://github.com/chjiao/TAR-VIR. Electronic supplementary material The online version of this article (10.1186/s12859-019-2878-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jiao Chen
- Computer Science and Engineering, Michigan State University, East Lansing, 48824, USA
| | - Jiating Huang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, 510006, China
| | - Yanni Sun
- Electronic Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
12
|
Vilsker M, Moosa Y, Nooij S, Fonseca V, Ghysens Y, Dumon K, Pauwels R, Alcantara LC, Vanden Eynden E, Vandamme AM, Deforche K, de Oliveira T. Genome Detective: an automated system for virus identification from high-throughput sequencing data. Bioinformatics 2019; 35:871-873. [PMID: 30124794 PMCID: PMC6524403 DOI: 10.1093/bioinformatics/bty695] [Citation(s) in RCA: 213] [Impact Index Per Article: 42.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 07/09/2018] [Accepted: 08/14/2018] [Indexed: 12/20/2022] Open
Abstract
SUMMARY Genome Detective is an easy to use web-based software application that assembles the genomes of viruses quickly and accurately. The application uses a novel alignment method that constructs genomes by reference-based linking of de novo contigs by combining amino-acids and nucleotide scores. The software was optimized using synthetic datasets to represent the great diversity of virus genomes. The application was then validated with next generation sequencing data of hundreds of viruses. User time is minimal and it is limited to the time required to upload the data. AVAILABILITY AND IMPLEMENTATION Available online: http://www.genomedetective.com/app/typingtool/virus/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Yumna Moosa
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, Nelson R Mandela School of Medicine, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Sam Nooij
- The Dutch National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Vagner Fonseca
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, Nelson R Mandela School of Medicine, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
- Laboratory of Hematology Genetic and computational Biology, Goncalo Moniz Research Center, Oswaldo Cruz Foundation (LHGB/CPqGM/FIOCRUZ), Bahia, Brazil
| | | | | | | | - Luiz Carlos Alcantara
- Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
- Laboratory of Hematology Genetic and computational Biology, Goncalo Moniz Research Center, Oswaldo Cruz Foundation (LHGB/CPqGM/FIOCRUZ), Bahia, Brazil
- Laboratório de Flavivírus, IOC, Fundação Oswaldo Cruz
| | - Ewout Vanden Eynden
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium
| | - Anne-Mieke Vandamme
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium
- Center for Global Health and Tropical Medicine, Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | | | - Tulio de Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, Nelson R Mandela School of Medicine, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
13
|
Maximal viral information recovery from sequence data using VirMAP. Nat Commun 2018; 9:3205. [PMID: 30097567 PMCID: PMC6086868 DOI: 10.1038/s41467-018-05658-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Accurate classification of the human virome is critical to a full understanding of the role viruses play in health and disease. This implies the need for sensitive, specific, and practical pipelines that return precise outputs while still enabling case-specific post hoc analysis. Viral taxonomic characterization from metagenomic data suffers from high background noise and signal crosstalk that confounds current methods. Here we develop VirMAP that overcomes these limitations using techniques that merge nucleotide and protein information to taxonomically classify viral reconstructions independent of genome coverage or read overlap. We validate VirMAP using published data sets and viral mock communities containing RNA and DNA viruses and bacteriophages. VirMAP offers opportunities to enhance metagenomic studies seeking to define virome-host interactions, improve biosurveillance capabilities, and strengthen molecular epidemiology reporting.
Collapse
|
14
|
Lin HH, Liao YC. drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes. Gigascience 2017; 6:1-10. [PMID: 28369462 PMCID: PMC5466706 DOI: 10.1093/gigascience/gix003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 01/15/2017] [Indexed: 11/29/2022] Open
Abstract
Background: Virus discovery using high-throughput next-generation sequencing has become more commonplace. However, although analysis of deep next-generation sequencing data allows us to identity potential pathogens, the entire analytical procedure requires competency in the bioinformatics domain, which includes implementing proper software packages and preparing prerequisite databases. Simple and user-friendly bioinformatics pipelines are urgently required to obtain complete viral genome sequences from metagenomic data. Results: This manuscript presents a pipeline, drVM (detect and reconstruct known viral genomes from metagenomes), for rapid viral read identification, genus-level read partition, read normalization, de novo assembly, sequence annotation, and coverage profiling. The first two procedures and sequence annotation rely on known viral genomes as a reference database. drVM was validated via the analysis of over 300 sequencing runs generated by Illumina and Ion Torrent platforms to provide complete viral genome assemblies for a variety of virus types including DNA viruses, RNA viruses, and retroviruses. drVM is available for free download at: https://sourceforge.net/projects/sb2nhri/files/drVM/ and is also assembled as a Docker container, an Amazon machine image, and a virtual machine to facilitate seamless deployment. Conclusions: drVM was compared with other viral detection tools to demonstrate its merits in terms of viral genome completeness and reduced computation time. This substantiates the platform's potential to produce prompt and accurate viral genome sequences from clinical samples.
Collapse
|