1
|
Ortigas-Vasquez A, Szpara M. Embracing Complexity: What Novel Sequencing Methods Are Teaching Us About Herpesvirus Genomic Diversity. Annu Rev Virol 2024; 11:67-87. [PMID: 38848592 DOI: 10.1146/annurev-virology-100422-010336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2024]
Abstract
The arrival of novel sequencing technologies throughout the past two decades has led to a paradigm shift in our understanding of herpesvirus genomic diversity. Previously, herpesviruses were seen as a family of DNA viruses with low genomic diversity. However, a growing body of evidence now suggests that herpesviruses exist as dynamic populations that possess standing variation and evolve at much faster rates than previously assumed. In this review, we explore how strategies such as deep sequencing, long-read sequencing, and haplotype reconstruction are allowing scientists to dissect the genomic composition of herpesvirus populations. We also discuss the challenges that need to be addressed before a detailed picture of herpesvirus diversity can emerge.
Collapse
Affiliation(s)
- Alejandro Ortigas-Vasquez
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Moriah Szpara
- Departments of Biology and of Biochemistry and Molecular Biology; Center for Infectious Disease Dynamics; and Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA;
| |
Collapse
|
2
|
Santos JD, Sobral D, Pinheiro M, Isidro J, Bogaardt C, Pinto M, Eusébio R, Santos A, Mamede R, Horton DL, Gomes JP, Borges V. INSaFLU-TELEVIR: an open web-based bioinformatics suite for viral metagenomic detection and routine genomic surveillance. Genome Med 2024; 16:61. [PMID: 38659008 PMCID: PMC11044337 DOI: 10.1186/s13073-024-01334-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/15/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Implementation of clinical metagenomics and pathogen genomic surveillance can be particularly challenging due to the lack of bioinformatics tools and/or expertise. In order to face this challenge, we have previously developed INSaFLU, a free web-based bioinformatics platform for virus next-generation sequencing data analysis. Here, we considerably expanded its genomic surveillance component and developed a new module (TELEVIR) for metagenomic virus identification. RESULTS The routine genomic surveillance component was strengthened with new workflows and functionalities, including (i) a reference-based genome assembly pipeline for Oxford Nanopore technologies (ONT) data; (ii) automated SARS-CoV-2 lineage classification; (iii) Nextclade analysis; (iv) Nextstrain phylogeographic and temporal analysis (SARS-CoV-2, human and avian influenza, monkeypox, respiratory syncytial virus (RSV A/B), as well as a "generic" build for other viruses); and (v) algn2pheno for screening mutations of interest. Both INSaFLU pipelines for reference-based consensus generation (Illumina and ONT) were benchmarked against commonly used command line bioinformatics workflows for SARS-CoV-2, and an INSaFLU snakemake version was released. In parallel, a new module (TELEVIR) for virus detection was developed, after extensive benchmarking of state-of-the-art metagenomics software and following up-to-date recommendations and practices in the field. TELEVIR allows running complex workflows, covering several combinations of steps (e.g., with/without viral enrichment or host depletion), classification software (e.g., Kaiju, Kraken2, Centrifuge, FastViromeExplorer), and databases (RefSeq viral genome, Virosaurus, etc.), while culminating in user- and diagnosis-oriented reports. Finally, to potentiate real-time virus detection during ONT runs, we developed findONTime, a tool aimed at reducing costs and the time between sample reception and diagnosis. CONCLUSIONS The accessibility, versatility, and functionality of INSaFLU-TELEVIR are expected to supply public and animal health laboratories and researchers with a user-oriented and pan-viral bioinformatics framework that promotes a strengthened and timely viral metagenomic detection and routine genomics surveillance. INSaFLU-TELEVIR is compatible with Illumina, Ion Torrent, and ONT data and is freely available at https://insaflu.insa.pt/ (online tool) and https://github.com/INSaFLU (code).
Collapse
Affiliation(s)
- João Dourado Santos
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
| | - Daniel Sobral
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
| | - Miguel Pinheiro
- Institute of Biomedicine-iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal
| | - Joana Isidro
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
| | - Carlijn Bogaardt
- Department of Comparative Biomedical Sciences, School of Veterinary Medicine, University of Surrey, Surrey, UK
| | - Miguel Pinto
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
| | - Rodrigo Eusébio
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
| | - André Santos
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
| | - Rafael Mamede
- Faculdade de Medicina, Instituto de Microbiologia, Instituto de Medicina Molecular, Universidade de Lisboa, Lisbon, Portugal
| | - Daniel L Horton
- Department of Comparative Biomedical Sciences, School of Veterinary Medicine, University of Surrey, Surrey, UK
| | - João Paulo Gomes
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
- Veterinary and Animal Research Centre (CECAV), Faculty of Veterinary Medicine, Lusófona University, Lisbon, Portugal
| | - Vítor Borges
- Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal.
| |
Collapse
|
3
|
Strutt JPB, Natarajan M, Lee E, Teo DBL, Sin WX, Cheung KW, Chew M, Thazin K, Barone PW, Wolfrum JM, Williams RBH, Rice SA, Springs SL. Machine learning-based detection of adventitious microbes in T-cell therapy cultures using long-read sequencing. Microbiol Spectr 2023; 11:e0135023. [PMID: 37646508 PMCID: PMC10580871 DOI: 10.1128/spectrum.01350-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/03/2023] [Indexed: 09/01/2023] Open
Abstract
Assuring that cell therapy products are safe before releasing them for use in patients is critical. Currently, compendial sterility testing for bacteria and fungi can take 7-14 days. The goal of this work was to develop a rapid untargeted approach for the sensitive detection of microbial contaminants at low abundance from low volume samples during the manufacturing process of cell therapies. We developed a long-read sequencing methodology using Oxford Nanopore Technologies MinION platform with 16S and 18S amplicon sequencing to detect USP <71> organisms and other microbial species. Reads are classified metagenomically to predict the microbial species. We used an extreme gradient boosting machine learning algorithm (XGBoost) to first assess if a sample is contaminated, and second, determine whether the predicted contaminant is correctly classified or misclassified. The model was used to make a final decision on the sterility status of the input sample. An optimized experimental and bioinformatics pipeline starting from spiked species through to sequenced reads allowed for the detection of microbial samples at 10 colony-forming units (CFU)/mL using metagenomic classification. Machine learning can be coupled with long-read sequencing to detect and identify sample sterility status and microbial species present in T-cell cultures, including the USP <71> organisms to 10 CFU/mL. IMPORTANCE This research presents a novel method for rapidly and accurately detecting microbial contaminants in cell therapy products, which is essential for ensuring patient safety. Traditional testing methods are time-consuming, taking 7-14 days, while our approach can significantly reduce this time. By combining advanced long-read nanopore sequencing techniques and machine learning, we can effectively identify the presence and types of microbial contaminants at low abundance levels. This breakthrough has the potential to improve the safety and efficiency of cell therapy manufacturing, leading to better patient outcomes and a more streamlined production process.
Collapse
Affiliation(s)
- James P. B. Strutt
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | | | - Elizabeth Lee
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Denise Bei Lin Teo
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Wei-Xiang Sin
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Ka-Wai Cheung
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Marvin Chew
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Khaing Thazin
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | - Paul W. Barone
- MIT Center for Biomedical Innovation, Massachusetts Institute of Technology, Boston, USA
| | - Jacqueline M. Wolfrum
- MIT Center for Biomedical Innovation, Massachusetts Institute of Technology, Boston, USA
| | - Rohan B. H. Williams
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
- Singapore Centre for Environmental Life Sciences Engineering, Life Sciences Institute, National University of Singapore, Singapore, Singapore
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Scott A. Rice
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
- CSIRO Microbiomes for One Systems Health, Agriculture and Food, Westmead, Australia
| | - Stacy L. Springs
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
- MIT Center for Biomedical Innovation, Massachusetts Institute of Technology, Boston, USA
| |
Collapse
|
4
|
Javaran VJ, Poursalavati A, Lemoyne P, Ste-Croix DT, Moffett P, Fall ML. NanoViromics: long-read sequencing of dsRNA for plant virus and viroid rapid detection. Front Microbiol 2023; 14:1192781. [PMID: 37415816 PMCID: PMC10320856 DOI: 10.3389/fmicb.2023.1192781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/06/2023] [Indexed: 07/08/2023] Open
Abstract
There is a global need for identifying viral pathogens, as well as for providing certified clean plant materials, in order to limit the spread of viral diseases. A key component of management programs for viral-like diseases is having a diagnostic tool that is quick, reliable, inexpensive, and easy to use. We have developed and validated a dsRNA-based nanopore sequencing protocol as a reliable method for detecting viruses and viroids in grapevines. We compared our method, which we term direct-cDNA sequencing from dsRNA (dsRNAcD), to direct RNA sequencing from rRNA-depleted total RNA (rdTotalRNA), and found that it provided more viral reads from infected samples. Indeed, dsRNAcD was able to detect all of the viruses and viroids detected using Illumina MiSeq sequencing (dsRNA-MiSeq). Furthermore, dsRNAcD sequencing was also able to detect low-abundance viruses that rdTotalRNA sequencing failed to detect. Additionally, rdTotalRNA sequencing resulted in a false-positive viroid identification due to the misannotation of a host-driven read. Two taxonomic classification workflows, DIAMOND & MEGAN (DIA & MEG) and Centrifuge & Recentrifuge (Cent & Rec), were also evaluated for quick and accurate read classification. Although the results from both workflows were similar, we identified pros and cons for both workflows. Our study shows that dsRNAcD sequencing and the proposed data analysis workflows are suitable for consistent detection of viruses and viroids, particularly in grapevines where mixed viral infections are common.
Collapse
Affiliation(s)
- Vahid J. Javaran
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
- Centre SÈVE, Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Abdonaser Poursalavati
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
- Centre SÈVE, Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Pierre Lemoyne
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
| | - Dave T. Ste-Croix
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
- Département de phytologie, Faculté des Sciences de l’Agriculture et de l’Alimentation, Université Laval, Québec, QC, Canada
| | - Peter Moffett
- Centre SÈVE, Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Mamadou L. Fall
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC, Canada
| |
Collapse
|
5
|
Zarei A, Javid H, Sanjarian S, Senemar S, Zarei H. Metagenomics studies for the diagnosis and treatment of prostate cancer. Prostate 2022; 82:289-297. [PMID: 34855234 DOI: 10.1002/pros.24276] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 11/09/2021] [Accepted: 11/19/2021] [Indexed: 12/19/2022]
Abstract
AIM Mutation occurs in the prostate cell genes, leading to abnormal prostate proliferation and ultimately cancer. Prostate cancer (PC) is one of the most common cancers amongst men, and its prevalence worldwide increases relative to men's age. About 16% of the world's cancers are the result of microbes in the human body. Impaired population balance of symbiosis microbes in the human reproductive system is linked to PC development. DISCUSSION With the advent of metagenomics science, the genome sequence of the microbiota of the human body has been unveiled. Therefore, it is now possible to identify a higher range of microbiome changes in PC tissue via the Next Generation Technique, which will have positive consequences in personalized medicine. In this review, we intend to question the role of metagenomics studies in the diagnosis and treatment of PC. CONCLUSION The microbial imbalance in the men's genital tract might have an effect on prostate health. Based on next-generation sequencing-generated data, Proteobacteria, Firmicutes, Actinobacteria, and Bacteriodetes are the nine frequent phyla detected in a PC sample, which might be involved in inducing mutation in the prostate cells that cause cancer.
Collapse
Affiliation(s)
- Ali Zarei
- Department of Human Genetics, Iranian Academic Center for Education, Culture and Research (ACECR)-Fars Branch Institute for Human Genetics Research, Shiraz, Iran
| | - Hossein Javid
- Department of Human Genetics, Iranian Academic Center for Education, Culture and Research (ACECR)-Fars Branch Institute for Human Genetics Research, Shiraz, Iran
| | - Sara Sanjarian
- Department of Human Genetics, Iranian Academic Center for Education, Culture and Research (ACECR)-Fars Branch Institute for Human Genetics Research, Shiraz, Iran
| | - Sara Senemar
- Department of Human Genetics, Iranian Academic Center for Education, Culture and Research (ACECR)-Fars Branch Institute for Human Genetics Research, Shiraz, Iran
| | - Hanieh Zarei
- Department of Physical Therapy, School of Rehabilitation Sciences, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
6
|
Alama-Bermejo G, Meyer E, Atkinson SD, Holzer AS, Wiśniewska MM, Kolísko M, Bartholomew JL. Transcriptome-Wide Comparisons and Virulence Gene Polymorphisms of Host-Associated Genotypes of the Cnidarian Parasite Ceratonova shasta in Salmonids. Genome Biol Evol 2021; 12:1258-1276. [PMID: 32467979 PMCID: PMC7487138 DOI: 10.1093/gbe/evaa109] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/25/2020] [Indexed: 12/15/2022] Open
Abstract
Ceratonova shasta is an important myxozoan pathogen affecting the health of salmonid fishes in the Pacific Northwest of North America. Ceratonova shasta exists as a complex of host-specific genotypes, some with low to moderate virulence, and one that causes a profound, lethal infection in susceptible hosts. High throughput sequencing methods are powerful tools for discovering the genetic basis of these host/virulence differences, but deep sequencing of myxozoans has been challenging due to extremely fast molecular evolution of this group, yielding strongly divergent sequences that are difficult to identify, and unavoidable host contamination. We designed and optimized different bioinformatic pipelines to address these challenges. We obtained a unique set of comprehensive, host-free myxozoan RNA-seq data from C. shasta genotypes of varying virulence from different salmonid hosts. Analyses of transcriptome-wide genetic distances and maximum likelihood multigene phylogenies elucidated the evolutionary relationship between lineages and demonstrated the limited resolution of the established Internal Transcribed Spacer marker for C. shasta genotype identification, as this marker fails to differentiate between biologically distinct genotype II lineages from coho salmon and rainbow trout. We further analyzed the data sets based on polymorphisms in two gene groups related to virulence: cell migration and proteolytic enzymes including their inhibitors. The developed single-nucleotide polymorphism-calling pipeline identified polymorphisms between genotypes and demonstrated that variations in both motility and protease genes were associated with different levels of virulence of C. shasta in its salmonid hosts. The prospective use of proteolytic enzymes as promising candidates for targeted interventions against myxozoans in aquaculture is discussed. We developed host-free transcriptomes of a myxozoan model organism from strains that exhibited different degrees of virulence, as a unique source of data that will foster functional gene analyses and serve as a base for the development of potential therapeutics for efficient control of these parasites.
Collapse
Affiliation(s)
- Gema Alama-Bermejo
- Department of Microbiology, Oregon State University.,Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic.,Centro de Investigación Aplicada y Transferencia Tecnológica en Recursos Marinos Almirante Storni (CIMAS), CCT CONICET - CENPAT, San Antonio Oeste, Argentina
| | - Eli Meyer
- Department of Integrative Biology, Oregon State University
| | | | - Astrid S Holzer
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Monika M Wiśniewska
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Martin Kolísko
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic.,Department of Molecular Biology and Genetics, Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic
| | | |
Collapse
|
7
|
MacDonald ML, Polson SW, Lee KH. k-mer-Based Metagenomics Tools Provide a Fast and Sensitive Approach for the Detection of Viral Contaminants in Biopharmaceutical and Vaccine Manufacturing Applications Using Next-Generation Sequencing. mSphere 2021; 6:e01336-20. [PMID: 33883263 PMCID: PMC8546726 DOI: 10.1128/msphere.01336-20] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 03/30/2021] [Indexed: 11/20/2022] Open
Abstract
Adventitious agent detection during the production of vaccines and biotechnology-based medicines is of critical importance to ensure the final product is free from any possible viral contamination. Increasing the speed and accuracy of viral detection is beneficial as a means to accelerate development timelines and to ensure patient safety. Here, several rapid viral metagenomics approaches were tested on simulated next-generation sequencing (NGS) data sets and existing data sets from virus spike-in studies done in CHO-K1 and HeLa cell lines. It was observed that these rapid methods had comparable sensitivity to full-read alignment methods used for NGS viral detection for these data sets, but their specificity could be improved. A method that first filters host reads using KrakenUniq and then selects the virus classification tool based on the number of remaining reads is suggested as the preferred approach among those tested to detect nonlatent and nonendogenous viruses. Such an approach shows reasonable sensitivity and specificity for the data sets examined and requires less time and memory as full-read alignment methods.IMPORTANCE Next-generation sequencing (NGS) has been proposed as a complementary method to detect adventitious viruses in the production of biotherapeutics and vaccines to current in vivo and in vitro methods. Before NGS can be established in industry as a main viral detection technology, further investigation into the various aspects of bioinformatics analyses required to identify and classify viral NGS reads is needed. In this study, the ability of rapid metagenomics tools to detect viruses in biopharmaceutical relevant samples is tested and compared to recommend an efficient approach. The results showed that KrakenUniq can quickly and accurately filter host sequences and classify viral reads and had comparable sensitivity and specificity to slower full read alignment approaches, such as BLASTn, for the data sets examined.
Collapse
Affiliation(s)
- Madolyn L MacDonald
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, Delaware, USA
- Ammon Pinizzotto Biopharmaceutical Innovation Center, Newark, Delaware, USA
| | - Shawn W Polson
- Ammon Pinizzotto Biopharmaceutical Innovation Center, Newark, Delaware, USA
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, USA
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, USA
| | - Kelvin H Lee
- Department of Chemical and Biomolecular Engineering, University of Delaware, Newark, Delaware, USA
- Ammon Pinizzotto Biopharmaceutical Innovation Center, Newark, Delaware, USA
| |
Collapse
|
8
|
Metagenomic Analysis of the Respiratory Microbiome of a Broiler Flock from Hatching to Processing. Microorganisms 2021; 9:microorganisms9040721. [PMID: 33807233 PMCID: PMC8065701 DOI: 10.3390/microorganisms9040721] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 03/30/2021] [Indexed: 12/15/2022] Open
Abstract
Elucidating the complex microbial interactions in biological environments requires the identification and characterization of not only the bacterial component but also the eukaryotic viruses, bacteriophage, and fungi. In a proof of concept experiment, next generation sequencing approaches, accompanied by the development of novel computational and bioinformatics tools, were utilized to examine the evolution of the microbial ecology of the avian trachea during the growth of a healthy commercial broiler flock. The flock was sampled weekly, beginning at placement and concluding at 49 days, the day before processing. Metagenomic sequencing of DNA and RNA was utilized to examine the bacteria, virus, bacteriophage, and fungal components during flock growth. The utility of using a metagenomic approach to study the avian respiratory virome was confirmed by detecting the dysbiosis in the avian respiratory virome of broiler chickens diagnosed with infection with infectious laryngotracheitis virus. This study provides the first comprehensive analysis of the ecology of the avian respiratory microbiome and demonstrates the feasibility for the use of this approach in future investigations of avian respiratory diseases.
Collapse
|
9
|
Advances and Discoveries in Myxozoan Genomics. Trends Parasitol 2021; 37:552-568. [PMID: 33619004 DOI: 10.1016/j.pt.2021.01.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/20/2021] [Accepted: 01/23/2021] [Indexed: 12/21/2022]
Abstract
Myxozoans are highly diverse and globally distributed cnidarian endoparasites in freshwater and marine habitats. They have adopted a heteroxenous life cycle, including invertebrate and fish hosts, and have been associated with diseases in aquaculture and wild fish stocks. Despite their importance, genomic resources of myxozoans have proven difficult to obtain due to their miniaturized and derived genome character and close associations with fish tissues. The first 'omic' datasets have now become the main resource for a better understanding of host-parasite interactions, virulence, and diversity, but also the evolutionary history of myxozoans. In this review, we discuss recent genomic advances in the field and outline outstanding questions to be answered with continuous and improved efforts of generating myxozoan genomic data.
Collapse
|
10
|
Leung CM, Li D, Xin Y, Law WC, Zhang Y, Ting HF, Luo R, Lam TW. MegaPath: sensitive and rapid pathogen detection using metagenomic NGS data. BMC Genomics 2020; 21:500. [PMID: 33349238 PMCID: PMC7751095 DOI: 10.1186/s12864-020-06875-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 06/30/2020] [Indexed: 12/02/2022] Open
Abstract
Background Next-generation sequencing (NGS) enables unbiased detection of pathogens by mapping the sequencing reads of a patient sample to the known reference sequence of bacteria and viruses. However, for a new pathogen without a reference sequence of a close relative, or with a high load of mutations compared to its predecessors, read mapping fails due to a low similarity between the pathogen and reference sequence, which in turn leads to insensitive and inaccurate pathogen detection outcomes. Results We developed MegaPath, which runs fast and provides high sensitivity in detecting new pathogens. In MegaPath, we have implemented and tested a combination of polishing techniques to remove non-informative human reads and spurious alignments. MegaPath applies a global optimization to the read alignments and reassigns the reads incorrectly aligned to multiple species to a unique species. The reassignment not only significantly increased the number of reads aligned to distant pathogens, but also significantly reduced incorrect alignments. MegaPath implements an enhanced maximum-exact-match prefix seeding strategy and a SIMD-accelerated Smith-Waterman algorithm to run fast. Conclusions In our benchmarks, MegaPath demonstrated superior sensitivity by detecting eight times more reads from a low-similarity pathogen than other tools. Meanwhile, MegaPath ran much faster than the other state-of-the-art alignment-based pathogen detection tools (and compariable with the less sensitivity profile-based pathogen detection tools). The running time of MegaPath is about 20 min on a typical 1 Gb dataset.
Collapse
Affiliation(s)
- Chi-Ming Leung
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong. .,L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong.
| | - Dinghua Li
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong
| | - Yan Xin
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong.,L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong
| | - Wai-Chun Law
- L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong
| | - Yifan Zhang
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong.,L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong
| | - Hing-Fung Ting
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong.,L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong.,L3 Bioinformatics Limited, Rm 2114, Hong Kong Plaza, 188 Connaught Road West, Sai Ying Pun, Hong Kong
| |
Collapse
|
11
|
Rodriguez RM, Khadka VS, Menor M, Hernandez BY, Deng Y. Tissue-associated microbial detection in cancer using human sequencing data. BMC Bioinformatics 2020; 21:523. [PMID: 33272199 PMCID: PMC7713026 DOI: 10.1186/s12859-020-03831-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/19/2022] Open
Abstract
Cancer is one of the leading causes of morbidity and mortality in the globe. Microbiological infections account for up to 20% of the total global cancer burden. The human microbiota within each organ system is distinct, and their compositional variation and interactions with the human host have been known to attribute detrimental and beneficial effects on tumor progression. With the advent of next generation sequencing (NGS) technologies, data generated from NGS is being used for pathogen detection in cancer. Numerous bioinformatics computational frameworks have been developed to study viral information from host-sequencing data and can be adapted to bacterial studies. This review highlights existing popular computational frameworks that utilize NGS data as input to decipher microbial composition, which output can predict functional compositional differences with clinically relevant applicability in the development of treatment and prevention strategies.
Collapse
Affiliation(s)
- Rebecca M. Rodriguez
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
- NIDDK Central Repository, National Institute of Diabetes and Digestive and Kidney Diseases, NIH, Bethesda, USA
| | - Vedbar S. Khadka
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Mark Menor
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| | - Brenda Y. Hernandez
- Epidemiology, University of Hawaii Cancer Center, University of Hawaii, Honolulu, HI USA
- Population Sciences in the Pacific Program-Cancer Epidemiology, Honolulu, HI USA
| | - Youping Deng
- Bioinformatics Core, Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii, Mānoa, Honolulu, HI USA
| |
Collapse
|
12
|
Mordecai GJ, Di Cicco E, Günther OP, Schulze AD, Kaukinen KH, Li S, Tabata A, Ming TJ, Ferguson HW, Suttle CA, Miller KM. Discovery and surveillance of viruses from salmon in British Columbia using viral immune-response biomarkers, metatranscriptomics, and high-throughput RT-PCR. Virus Evol 2020; 7:veaa069. [PMID: 33623707 PMCID: PMC7887441 DOI: 10.1093/ve/veaa069] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The emergence of infectious agents poses a continual economic and environmental challenge to aquaculture production, yet the diversity, abundance, and epidemiology of aquatic viruses are poorly characterised. In this study, we applied salmon host transcriptional biomarkers to identify and select fish in a viral disease state, but only those that were negative for known viruses based on RT-PCR screening. These fish were selected for metatranscriptomic sequencing to discover potential viral pathogens of dead and dying farmed Atlantic (Salmo salar) and Chinook (Oncorhynchus tshawytscha) salmon in British Columbia (BC). We found that the application of the biomarker panel increased the probability of discovering viruses in aquaculture populations. We discovered two viruses that have not previously been characterised in Atlantic salmon farms in BC (Atlantic salmon calicivirus and Cutthroat trout virus-2), as well as partially sequenced three putative novel viruses. To determine the epidemiology of the newly discovered or emerging viruses, we conducted high-throughput reverse transcription polymerase chain reaction (RT-PCR) and screened over 9,000 farmed and wild salmon sampled over one decade. Atlantic salmon calicivirus and Cutthroat trout virus-2 were in more than half of the farmed Atlantic salmon we tested. Importantly we detected some of the viruses we first discovered in farmed Atlantic salmon in Chinook salmon, suggesting a broad host range. Finally, we applied in situ hybridisation to determine infection and found differing cell tropism for each virus tested. Our study demonstrates that continual discovery and surveillance of emerging viruses in these ecologically important salmon will be vital for management of both aquaculture and wild resources in the future.
Collapse
Affiliation(s)
- Gideon J Mordecai
- Department of Medicine, University of British Columbia, 2775 Laurel Street, 10th Floor Vancouver, BC Canada V5Z 1M9, Canada
- Corresponding author: E-mail:
| | - Emiliano Di Cicco
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
- Pacific Salmon Foundation, 1682 W 7th Ave, Vancouver, BC V6J 4S6, Canada
| | - Oliver P Günther
- Günther Analytics, 402-5775 Hampton Place, Vancouver, BC, V6T 2G6, Canada
| | - Angela D Schulze
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
| | - Karia H Kaukinen
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
| | - Shaorong Li
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
| | - Amy Tabata
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
| | - Tobi J Ming
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
| | - Hugh W Ferguson
- School of Veterinary Medicine, St George’s University, True Blue, GrenadaWest Indies
| | - Curtis A Suttle
- Department of Earth, Ocean and Atmospheric Sciences, University of British Columbia, Vancouver, Canada
- Department of Microbiology and Immunology, University of British Columbia, 1365 - 2350 Health Sciences Mall Vancouver, British Columbia Canada V6T 1Z3
- Department of Botany, University of British Columbia, 3156-6270 University Blvd. Vancouver, BC Canada V6T 1Z4, Canada
- Institute for the Oceans and Fisheries, University of British Columbia, 2202 Main Mall, Vancouver, BC V6T 1Z4, Canada
| | - Kristina M Miller
- Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Rd, Nanaimo, BC V9T 6N7, Canada
| |
Collapse
|
13
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
14
|
Gihawi A, Rallapalli G, Hurst R, Cooper CS, Leggett RM, Brewer DS. SEPATH: benchmarking the search for pathogens in human tissue whole genome sequence data leads to template pipelines. Genome Biol 2019; 20:208. [PMID: 31639030 PMCID: PMC6805339 DOI: 10.1186/s13059-019-1819-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 09/11/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Human tissue is increasingly being whole genome sequenced as we transition into an era of genomic medicine. With this arises the potential to detect sequences originating from microorganisms, including pathogens amid the plethora of human sequencing reads. In cancer research, the tumorigenic ability of pathogens is being recognized, for example, Helicobacter pylori and human papillomavirus in the cases of gastric non-cardia and cervical carcinomas, respectively. As of yet, no benchmark has been carried out on the performance of computational approaches for bacterial and viral detection within host-dominated sequence data. RESULTS We present the results of benchmarking over 70 distinct combinations of tools and parameters on 100 simulated cancer datasets spiked with realistic proportions of bacteria. mOTUs2 and Kraken are the highest performing individual tools achieving median genus-level F1 scores of 0.90 and 0.91, respectively. mOTUs2 demonstrates a high performance in estimating bacterial proportions. Employing Kraken on unassembled sequencing reads produces a good but variable performance depending on post-classification filtering parameters. These approaches are investigated on a selection of cervical and gastric cancer whole genome sequences where Alphapapillomavirus and Helicobacter are detected in addition to a variety of other interesting genera. CONCLUSIONS We provide the top-performing pipelines from this benchmark in a unifying tool called SEPATH, which is amenable to high throughput sequencing studies across a range of high-performance computing clusters. SEPATH provides a benchmarked and convenient approach to detect pathogens in tissue sequence data helping to determine the relationship between metagenomics and disease.
Collapse
Affiliation(s)
- Abraham Gihawi
- Norwich Medical School, University of East Anglia, Bob Champion Research and Education Building, Norwich, NR4 7UQ UK
| | - Ghanasyam Rallapalli
- Norwich Medical School, University of East Anglia, Bob Champion Research and Education Building, Norwich, NR4 7UQ UK
| | - Rachel Hurst
- Norwich Medical School, University of East Anglia, Bob Champion Research and Education Building, Norwich, NR4 7UQ UK
| | - Colin S. Cooper
- Norwich Medical School, University of East Anglia, Bob Champion Research and Education Building, Norwich, NR4 7UQ UK
- Functional Crosscutting Genomics England Clinical Interpretation Partnership (GeCIP) Domain Lead, 100,000 Genomes Project, Genomics England, London, UK
| | | | - Daniel S. Brewer
- Norwich Medical School, University of East Anglia, Bob Champion Research and Education Building, Norwich, NR4 7UQ UK
- Norwich Research Park, Earlham Institute, Norwich, NR4 7UZ UK
| |
Collapse
|
15
|
Santiago-Rodriguez TM, Hollister EB. Human Virome and Disease: High-Throughput Sequencing for Virus Discovery, Identification of Phage-Bacteria Dysbiosis and Development of Therapeutic Approaches with Emphasis on the Human Gut. Viruses 2019; 11:v11070656. [PMID: 31323792 PMCID: PMC6669467 DOI: 10.3390/v11070656] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 07/14/2019] [Accepted: 07/15/2019] [Indexed: 02/06/2023] Open
Abstract
The virome is comprised of endogenous retroviruses, eukaryotic viruses, and bacteriophages and is increasingly being recognized as an essential part of the human microbiome. The human virome is associated with Type-1 diabetes (T1D), Type-2 diabetes (T2D), Inflammatory Bowel Disease (IBD), Human Immunodeficiency Virus (HIV) infection, and cancer. Increasing evidence also supports trans-kingdom interactions of viruses with bacteria, small eukaryotes and host in disease progression. The present review focuses on virus ecology and biology and how this translates mostly to human gut virome research. Current challenges in the field and how the development of bioinformatic tools and controls are aiding to overcome some of these challenges are also discussed. Finally, the present review also focuses on how human gut virome research could result in translational and clinical studies that may facilitate the development of therapeutic approaches.
Collapse
Affiliation(s)
| | - Emily B Hollister
- Diversigen Inc., 2450 Holcombe Blvd, Suite BCMA, 77021 Houston, TX, USA.
| |
Collapse
|
16
|
Iacoangeli A, Al Khleifat A, Sproviero W, Shatunov A, Jones AR, Morgan SL, Pittman A, Dobson RJ, Newhouse SJ, Al-Chalabi A. DNAscan: personal computer compatible NGS analysis, annotation and visualisation. BMC Bioinformatics 2019; 20:213. [PMID: 31029080 PMCID: PMC6487045 DOI: 10.1186/s12859-019-2791-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 04/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Next Generation Sequencing (NGS) is a commonly used technology for studying the genetic basis of biological processes and it underpins the aspirations of precision medicine. However, there are significant challenges when dealing with NGS data. Firstly, a huge number of bioinformatics tools for a wide range of uses exist, therefore it is challenging to design an analysis pipeline. Secondly, NGS analysis is computationally intensive, requiring expensive infrastructure, and many medical and research centres do not have adequate high performance computing facilities and cloud computing is not always an option due to privacy and ownership issues. Finally, the interpretation of the results is not trivial and most available pipelines lack the utilities to favour this crucial step. RESULTS We have therefore developed a fast and efficient bioinformatics pipeline that allows for the analysis of DNA sequencing data, while requiring little computational effort and memory usage. DNAscan can analyse a whole exome sequencing sample in 1 h and a 40x whole genome sequencing sample in 13 h, on a midrange computer. The pipeline can look for single nucleotide variants, small indels, structural variants, repeat expansions and viral genetic material (or any other organism). Its results are annotated using a customisable variety of databases and are available for an on-the-fly visualisation with a local deployment of the gene.iobio platform. DNAscan is implemented in Python. Its code and documentation are available on GitHub: https://github.com/KHP-Informatics/DNAscan . Instructions for an easy and fast deployment with Docker and Singularity are also provided on GitHub. CONCLUSIONS DNAscan is an extremely fast and computationally efficient pipeline for analysis, visualization and interpretation of NGS data. It is designed to provide a powerful and easy-to-use tool for applications in biomedical research and diagnostic medicine, at minimal computational cost. Its comprehensive approach will maximise the potential audience of users, bringing such analyses within the reach of non-specialist laboratories, and those from centres with limited funding available.
Collapse
Affiliation(s)
- A Iacoangeli
- Department of Biostatistics and Health Informatics, King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK.
| | - A Al Khleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - W Sproviero
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - A Shatunov
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - A R Jones
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - S L Morgan
- Department of Molecular Neuroscience, UCL, Institute of Neurology, London, UK
| | - A Pittman
- Department of Molecular Neuroscience, UCL, Institute of Neurology, London, UK
| | - R J Dobson
- Department of Biostatistics and Health Informatics, King's College London, London, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - S J Newhouse
- Department of Biostatistics and Health Informatics, King's College London, London, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - A Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
- King's College Hospital, Bessemer Road, London, SE5 9RS, UK
| |
Collapse
|
17
|
Shates TM, Sun P, Malmstrom CM, Dominguez C, Mauck KE. Addressing Research Needs in the Field of Plant Virus Ecology by Defining Knowledge Gaps and Developing Wild Dicot Study Systems. Front Microbiol 2019; 9:3305. [PMID: 30687284 PMCID: PMC6333650 DOI: 10.3389/fmicb.2018.03305] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 12/19/2018] [Indexed: 11/30/2022] Open
Abstract
Viruses are ubiquitous within all habitats that support cellular life and represent the most important emerging infectious diseases of plants. Despite this, it is only recently that we have begun to describe the ecological roles of plant viruses in unmanaged systems and the influence of ecosystem properties on virus evolution. We now know that wild plants frequently harbor infections by diverse virus species, but much remains to be learned about how viruses influence host traits and how hosts influence virus evolution and vector interactions. To identify knowledge gaps and suggest avenues for alleviating research deficits, we performed a quantitative synthesis of a representative sample of virus ecology literature, developed criteria for expanding the suite of pathosystems serving as models, and applied these criteria through a case study. We found significant gaps in the types of ecological systems studied, which merit more attention. In particular, there is a strong need for a greater diversity of logistically tractable, wild dicot perennial study systems suitable for experimental manipulations of infection status. Based on criteria developed from our quantitative synthesis, we evaluated three California native dicot perennials typically found in Mediterranean-climate plant communities as candidate models: Cucurbita foetidissima (buffalo gourd), Cucurbita palmata (coyote gourd), and Datura wrightii (sacred thorn-apple). We used Illumina sequencing and network analyses to characterize viromes and viral links among species, using samples taken from multiple individuals at two different reserves. We also compared our Illumina workflow with targeted RT-PCR detection assays of varying costs. To make this process accessible to ecologists looking to incorporate virology into existing studies, we describe our approach in detail and discuss advantages and challenges of different protocols. We also provide a bioinformatics workflow based on open-access tools with graphical user interfaces. Our study provides evidence that dicot perennials in xeric habitats support multiple, asymptomatic infections by viruses known to be pathogenic in related crop hosts. Quantifying the impacts of these interactions on plant performance and virus epidemiology in our logistically tractable host systems will provide fundamental information about plant virus ecology outside of crop environments.
Collapse
Affiliation(s)
- Tessa M. Shates
- Department of Entomology, University of California, Riverside, Riverside, CA, United States
| | - Penglin Sun
- Department of Entomology, University of California, Riverside, Riverside, CA, United States
| | - Carolyn M. Malmstrom
- Department of Plant Biology, Michigan State University, East Lansing, MI, United States
- Graduate Program in Ecology, Evolutionary Biology and Behavior, Michigan State University, East Lansing, MI, United States
| | - Chrysalyn Dominguez
- Department of Entomology, University of California, Riverside, Riverside, CA, United States
| | - Kerry E. Mauck
- Department of Entomology, University of California, Riverside, Riverside, CA, United States
| |
Collapse
|
18
|
Miller JR, Koren S, Dilley KA, Puri V, Brown DM, Harkins DM, Thibaud-Nissen F, Rosen B, Chen XG, Tu Z, Sharakhov IV, Sharakhova MV, Sebra R, Stockwell TB, Bergman NH, Sutton GG, Phillippy AM, Piermarini PM, Shabman RS. Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation. Gigascience 2018; 7:1-13. [PMID: 29329394 PMCID: PMC5869287 DOI: 10.1093/gigascience/gix135] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 12/23/2017] [Indexed: 12/25/2022] Open
Abstract
Background The 50-year-old Aedes albopictus C6/36 cell line is a resource for the detection, amplification, and analysis of mosquito-borne viruses including Zika, dengue, and chikungunya. The cell line is derived from an unknown number of larvae from an unspecified strain of Aedes albopictus mosquitoes. Toward improved utility of the cell line for research in virus transmission, we present an annotated assembly of the C6/36 genome. Results The C6/36 genome assembly has the largest contig N50 (3.3 Mbp) of any mosquito assembly, presents the sequences of both haplotypes for most of the diploid genome, reveals independent null mutations in both alleles of the Dicer locus, and indicates a male-specific genome. Gene annotation was computed with publicly available mosquito transcript sequences. Gene expression data from cell line RNA sequence identified enrichment of growth-related pathways and conspicuous deficiency in aquaporins and inward rectifier K+ channels. As a test of utility, RNA sequence data from Zika-infected cells were mapped to the C6/36 genome and transcriptome assemblies. Host subtraction reduced the data set by 89%, enabling faster characterization of nonhost reads. Conclusions The C6/36 genome sequence and annotation should enable additional uses of the cell line to study arbovirus vector interactions and interventions aimed at restricting the spread of human disease.
Collapse
Affiliation(s)
- Jason R Miller
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA.,College of Natural Sciences and Mathematics, Shepherd University, Shepherdstown, WV 25443, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Kari A Dilley
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Vinita Puri
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - David M Brown
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Derek M Harkins
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | | | - Benjamin Rosen
- USDA 10300 Baltimore Ave., Bldg 306 Barc-East, Beltsville, MD 20705-2350, USA
| | - Xiao-Guang Chen
- Department of Pathogen Biology, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou 510515, China
| | - Zhijian Tu
- Department of Biochemistry and the Fralin Life Science Institute, Virginia Tech, Blacksburg, VA, USA
| | - Igor V Sharakhov
- Department of Entomology and the Fralin Life Science Institute, Virginia Tech, Blacksburg, VA, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia
| | - Maria V Sharakhova
- Department of Entomology and the Fralin Life Science Institute, Virginia Tech, Blacksburg, VA, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia
| | - Robert Sebra
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | | | - Granger G Sutton
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Peter M Piermarini
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA.,Department of Entomology, The Ohio State University, Ohio Agricultural Research and Development Center, Wooster, OH 44691, USA
| | - Reed S Shabman
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA.,ATCC, 217 Perry Parkway, Gaithersburg, MD 20877, USA
| |
Collapse
|
19
|
Miller JR, Koren S, Dilley KA, Harkins DM, Stockwell TB, Shabman RS, Sutton GG. A draft genome sequence for the Ixodes scapularis cell line, ISE6. F1000Res 2018; 7:297. [PMID: 29707202 PMCID: PMC5883391 DOI: 10.12688/f1000research.13635.1] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/29/2018] [Indexed: 12/02/2022] Open
Abstract
Background: The tick cell line ISE6, derived from
Ixodes scapularis, is commonly used for amplification and detection of arboviruses in environmental or clinical samples. Methods: To assist with sequence-based assays, we sequenced the ISE6 genome with single-molecule, long-read technology. Results: The draft assembly appears near complete based on gene content analysis, though it appears to lack some instances of repeats in this highly repetitive genome. The assembly appears to have separated the haplotypes at many loci. DNA short read pairs, used for validation only, mapped to the cell line assembly at a higher rate than they mapped to the
Ixodes scapularis reference genome sequence. Conclusions: The assembly could be useful for filtering host genome sequence from sequence data obtained from cells infected with pathogens.
Collapse
Affiliation(s)
- Jason R Miller
- J. Craig Venter Institute, Rockville, MD, 20850, USA.,Shepherd University, Shepherdstown, WV, 25443, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA
| | - Kari A Dilley
- J. Craig Venter Institute, Rockville, MD, 20850, USA
| | | | - Timothy B Stockwell
- J. Craig Venter Institute, Rockville, MD, 20850, USA.,NBACC, Fort Detrick, MD, 21702, USA
| | - Reed S Shabman
- J. Craig Venter Institute, Rockville, MD, 20850, USA.,ATCC, Gaithersburg, MD, 20877, USA
| | | |
Collapse
|
20
|
Miller JR, Dilley KA, Harkins DM, Stockwell TB, Shabman RS, Sutton GG. A host subtraction database for virus discovery in human cell line sequencing data. F1000Res 2018; 7:98. [PMID: 31231504 PMCID: PMC6556987 DOI: 10.12688/f1000research.13580.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2019] [Indexed: 11/28/2022] Open
Abstract
The human cell lines HepG2, HuH-7, and Jurkat are commonly used for amplification of the RNA viruses present in environmental samples. To assist with assays by RNAseq, we sequenced these cell lines and developed a subtraction database that contains sequences expected in sequence data from uninfected cells. RNAseq data from cell lines infected with Sendai virus were analyzed to test host subtraction. The process of mapping RNAseq reads to our subtraction database vastly reduced the number non-viral reads in the dataset to allow for efficient secondary analyses.
Collapse
Affiliation(s)
- Jason R Miller
- J. Craig Venter Institute, Rockville, MD, 20850, USA.,Shepherd University, Shepherdstown, WV, 25443, USA
| | - Kari A Dilley
- J. Craig Venter Institute, Rockville, MD, 20850, USA
| | | | - Timothy B Stockwell
- J. Craig Venter Institute, Rockville, MD, 20850, USA.,National Biodefense Analysis and Countermeasures Center (NBACC), Fort Detrick, MD, 21702, USA
| | - Reed S Shabman
- J. Craig Venter Institute, Rockville, MD, 20850, USA.,American Type Culture Collection, Gaithersburg, MD, 20877, USA
| | | |
Collapse
|
21
|
Bovo S, Mazzoni G, Ribani A, Utzeri VJ, Bertolini F, Schiavo G, Fontanesi L. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections. PLoS One 2017; 12:e0179462. [PMID: 28662150 PMCID: PMC5491021 DOI: 10.1371/journal.pone.0179462] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 05/29/2017] [Indexed: 12/14/2022] Open
Abstract
Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2), PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18). The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18) previous studies reported their first occurrence much later (from 5 to more than 10 years) than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.
Collapse
Affiliation(s)
- Samuele Bovo
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Biological, Geological, and Environmental Sciences (BiGeA), Biocomputing Group, University of Bologna, Bologna, Italy
| | - Gianluca Mazzoni
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Anisa Ribani
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Valerio Joe Utzeri
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Francesca Bertolini
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Animal Science, Iowa State University, Iowa, United States of America
| | - Giuseppina Schiavo
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Luca Fontanesi
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- * E-mail:
| |
Collapse
|
22
|
Doggett NA, Mukundan H, Lefkowitz EJ, Slezak TR, Chain PS, Morse S, Anderson K, Hodge DR, Pillai S. Culture-Independent Diagnostics for Health Security. Health Secur 2017; 14:122-42. [PMID: 27314653 DOI: 10.1089/hs.2015.0074] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The past decade has seen considerable development in the diagnostic application of nonculture methods, including nucleic acid amplification-based methods and mass spectrometry, for the diagnosis of infectious diseases. The implications of these new culture-independent diagnostic tests (CIDTs) include bypassing the need to culture organisms, thus potentially affecting public health surveillance systems, which continue to use isolates as the basis of their surveillance programs and to assess phenotypic resistance to antimicrobial agents. CIDTs may also affect the way public health practitioners detect and respond to a bioterrorism event. In response to a request from the Department of Homeland Security, Los Alamos National Laboratory and the Centers for Disease Control and Prevention cosponsored a workshop to review the impact of CIDTs on the rapid detection and identification of biothreat agents. Four panel discussions were held that covered nucleic acid amplification-based diagnostics, mass spectrometry, antibody-based diagnostics, and next-generation sequencing. Exploiting the extensive expertise available at this workshop, we identified the key features, benefits, and limitations of the various CIDT methods for providing rapid pathogen identification that are critical to the response and mitigation of a bioterrorism event. After the workshop we conducted a thorough review of the literature, investigating the current state of these 4 culture-independent diagnostic methods. This article combines information from the literature review and the insights obtained at the workshop.
Collapse
|
23
|
Díez-Vives C, Moitinho-Silva L, Nielsen S, Reynolds D, Thomas T. Expression of eukaryotic-like protein in the microbiome of sponges. Mol Ecol 2017; 26:1432-1451. [PMID: 28036141 DOI: 10.1111/mec.14003] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 12/08/2016] [Accepted: 12/09/2016] [Indexed: 01/04/2023]
Abstract
Eukaryotic-like proteins (ELPs) are classes of proteins that are found in prokaryotes, but have a likely evolutionary origin in eukaryotes. ELPs have been postulated to mediate host-microbiome interactions. Recent work has discovered that prokaryotic symbionts of sponges contain abundant and diverse genes for ELPs, which could modulate interactions with their filter-feeding and phagocytic host. However, the extent to which these ELP genes are actually used and expressed by the symbionts is poorly understood. Here, we use metatranscriptomics to investigate ELP expression in the microbiomes of three different sponges (Cymbastella concentrica, Scopalina sp. and Tedania anhelens). We developed a workflow with optimized rRNA removal and in silico subtraction of host sequences to obtain a reliable symbiont metatranscriptome. This showed that between 1.3% and 2.3% of all symbiont transcripts contain genes for ELPs. Two classes of ELPs (cadherin and tetratricopeptide repeats) were abundantly expressed in the C. concentrica and Scopalina sp. microbiomes, while ankyrin repeat ELPs were predominant in the T. anhelens metatranscriptome. Comparison with transcripts that do not encode ELPs indicated a constitutive expression of ELPs across a range of bacterial and archaeal symbionts. Expressed ELPs also contained domains involved in protein secretion and/or were co-expressed with proteins involved in extracellular transport. This suggests these ELPs are likely exported, which could allow for direct interaction with the sponge. Our study shows that ELP genes in sponge symbionts represent actively expressed functions that could mediate molecular interaction between symbiosis partners.
Collapse
Affiliation(s)
- C Díez-Vives
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - L Moitinho-Silva
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - S Nielsen
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - D Reynolds
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - T Thomas
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
24
|
Rose R, Constantinides B, Tapinos A, Robertson DL, Prosperi M. Challenges in the analysis of viral metagenomes. Virus Evol 2016; 2:vew022. [PMID: 29492275 PMCID: PMC5822887 DOI: 10.1093/ve/vew022] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Genome sequencing technologies continue to develop with remarkable pace, yet
analytical approaches for reconstructing and classifying viral genomes from
mixed samples remain limited in their performance and usability. Existing
solutions generally target expert users and often have unclear scope, making it
challenging to critically evaluate their performance. There is a growing need
for intuitive analytical tooling for researchers lacking specialist computing
expertise and that is applicable in diverse experimental circumstances. Notable
technical challenges have impeded progress; for example, fragments of viral
genomes are typically orders of magnitude less abundant than those of host,
bacteria, and/or other organisms in clinical and environmental metagenomes;
observed viral genomes often deviate considerably from reference genomes
demanding use of exhaustive alignment approaches; high intrapopulation viral
diversity can lead to ambiguous sequence reconstruction; and finally, the
relatively few documented viral reference genomes compared to the estimated
number of distinct viral taxa renders classification problematic. Various
software tools have been developed to accommodate the unique challenges and use
cases associated with characterizing viral sequences; however, the quality of
these tools varies, and their use often necessitates computing expertise or
access to powerful computers, thus limiting their usefulness to many
researchers. In this review, we consider the general and application-specific
challenges posed by viral sequencing and analysis, outline the landscape of
available tools and methodologies, and propose ways of overcoming the current
barriers to effective analysis.
Collapse
Affiliation(s)
- Rebecca Rose
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Bede Constantinides
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Avraam Tapinos
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - David L Robertson
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| | - Mattia Prosperi
- BioInfoExperts, Norfolk, VA, USA.,Computational and Evolutionary Biology Faculty of Life Sciences, University of Manchester, Manchester, UK.,Department of Epidemiology, University of Florida, Gainesville, FL, USA
| |
Collapse
|
25
|
Smits SL, Bodewes R, Ruiz-González A, Baumgärtner W, Koopmans MP, Osterhaus ADME, Schürch AC. Recovering full-length viral genomes from metagenomes. Front Microbiol 2015; 6:1069. [PMID: 26483782 PMCID: PMC4589665 DOI: 10.3389/fmicb.2015.01069] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 09/17/2015] [Indexed: 12/17/2022] Open
Abstract
Infectious disease metagenomics is driven by the question: “what is causing the disease?” in contrast to classical metagenome studies which are guided by “what is out there?” In case of a novel virus, a first step to eventually establishing etiology can be to recover a full-length viral genome from a metagenomic sample. However, retrieval of a full-length genome of a divergent virus is technically challenging and can be time-consuming and costly. Here we discuss different assembly and fragment linkage strategies such as iterative assembly, motif searches, k-mer frequency profiling, coverage profile binning, and other strategies used to recover genomes of potential viral pathogens in a timely and cost-effective manner.
Collapse
Affiliation(s)
- Saskia L Smits
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| | - Rogier Bodewes
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| | - Aritz Ruiz-González
- Department of Zoology and Animal Cell Biology, University of the Basque Country (UPV/EHU) Vitoria-Gasteiz, Spain ; Systematics, Biogeography and Population Dynamics Research Group, Lascaray Research Center, University of the Basque Country (UPV/EHU) Vitoria-Gasteiz, Spain ; Conservation Genetics Laboratory, National Institute for Environmental Protection and Research Bologna, Italy
| | - Wolfgang Baumgärtner
- Department of Pathology, University of Veterinary Medicine Hannover Hannover, Germany
| | - Marion P Koopmans
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands ; Centre for Infectious Diseases Research, Diagnostics and Screening, National Institute for Public Health and the Environment Bilthoven, Netherlands
| | - Albert D M E Osterhaus
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands ; Center for Infection Medicine and Zoonoses Research Hannover, Germany
| | - Anita C Schürch
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| |
Collapse
|