1
|
Medina JE, Castañeda S, Camargo M, Garcia-Corredor DJ, Muñoz M, Ramírez JD. Exploring viral diversity and metagenomics in livestock: insights into disease emergence and spillover risks in cattle. Vet Res Commun 2024; 48:2029-2049. [PMID: 38865041 DOI: 10.1007/s11259-024-10403-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/01/2024] [Indexed: 06/13/2024]
Abstract
Cattle have a significant impact on human societies in terms of both economics and health. Viral infections pose a relevant problem as they directly or indirectly disrupt the balance within cattle populations. This has negative consequences at the economic level for producers and territories, and also jeopardizes human health through the transmission of zoonotic diseases that can escalate into outbreaks or pandemics. To establish prevention strategies and control measures at various levels (animal, farm, region, or global), it is crucial to identify the viral agents present in animals. Various techniques, including virus isolation, serological tests, and molecular techniques like PCR, are typically employed for this purpose. However, these techniques have two major drawbacks: they are ineffective for non-culturable viruses, and they only detect a small fraction of the viruses present. In contrast, metagenomics offers a promising approach by providing a comprehensive and unbiased analysis for detecting all viruses in a given sample. It has the potential to identify rare or novel infectious agents promptly and establish a baseline of healthy animals. Nevertheless, the routine application of viral metagenomics for epidemiological surveillance and diagnostics faces challenges related to socioeconomic variables, such as resource availability and space dedicated to metagenomics, as well as the lack of standardized protocols and resulting heterogeneity in presenting results. This review aims to provide an overview of the current knowledge and prospects for using viral metagenomics to detect and identify viruses in cattle raised for livestock, while discussing the epidemiological and clinical implications.
Collapse
Affiliation(s)
- Julián Esteban Medina
- Centro de Investigaciones en Microbiología y Biotecnología - UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Sergio Castañeda
- Centro de Investigaciones en Microbiología y Biotecnología - UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Milena Camargo
- Centro de Investigaciones en Microbiología y Biotecnología - UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
- Centro de Tecnología en Salud (CETESA), Innovaseq SAS, Mosquera, Cundinamarca, Colombia
| | - Diego J Garcia-Corredor
- Centro de Investigaciones en Microbiología y Biotecnología - UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
- Grupo de Investigación en Medicina Veterinaria y Zootecnia, Facultad de Ciencias Agropecuarias, Universidad Pedagógica y Tecnológica de Colombia, Tunja, Colombia
| | - Marina Muñoz
- Centro de Investigaciones en Microbiología y Biotecnología - UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Juan David Ramírez
- Centro de Investigaciones en Microbiología y Biotecnología - UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia.
- Molecular Microbiology Laboratory, Department of Pathology, Molecular and Cell-Based Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
2
|
Shi Z, Long X, Zhang C, Chen Z, Usman M, Zhang Y, Zhang S, Luo G. Viral and Bacterial Community Dynamics in Food Waste and Digestate from Full-Scale Biogas Plants. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:13010-13022. [PMID: 38989650 DOI: 10.1021/acs.est.4c04109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Anaerobic digestion (AD) is commonly used in food waste treatment. Prokaryotic microbial communities in AD of food waste have been comprehensively studied. The role of viruses, known to affect microbial dynamics and metabolism, remains largely unexplored. This study employed metagenomic analysis and recovered 967 high-quality viral bins within food waste and digestate derived from 8 full-scale biogas plants. The diversity of viral communities was higher in digestate. In silico predictions linked 20.8% of viruses to microbial host populations, highlighting possible virus predators of key functional microbes. Lineage-specific virus-host ratio varied, indicating that viral infection dynamics might differentially affect microbial responses to the varying process parameters. Evidence for virus-mediated gene transfer was identified, emphasizing the potential role of viruses in controlling the microbiome. AD altered the specific process parameters, potentially promoting a shift in viral lifestyle from lysogenic to lytic. Viruses encoding auxiliary metabolic genes (AMGs) were involved in microbial carbon and nutrient cycling, and most AMGs were transcriptionally expressed in digestate, meaning that viruses with active functional states were likely actively involved in AD. These findings provided a comprehensive profile of viral and bacterial communities and expanded knowledge of the interactions between viruses and hosts in food waste and digestate.
Collapse
Affiliation(s)
- Zhijian Shi
- Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China
| | - Xinyi Long
- Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China
| | - Chao Zhang
- Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China
| | - Zheng Chen
- Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China
| | - Muhammad Usman
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - Yalei Zhang
- Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
- State Key Laboratory of Pollution Control and Resources Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Shicheng Zhang
- Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China
- Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
- Shanghai Technical Service Platform for Pollution Control and Resource Utilization of Organic Wastes, Shanghai 200438, China
| | - Gang Luo
- Shanghai Key Laboratory of Atmospheric Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China
- Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
- Shanghai Technical Service Platform for Pollution Control and Resource Utilization of Organic Wastes, Shanghai 200438, China
| |
Collapse
|
3
|
Jansz N, Faulkner GJ. Viral genome sequencing methods: benefits and pitfalls of current approaches. Biochem Soc Trans 2024; 52:1431-1447. [PMID: 38747720 DOI: 10.1042/bst20231322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 06/27/2024]
Abstract
Whole genome sequencing of viruses provides high-resolution molecular insights, enhancing our understanding of viral genome function and phylogeny. Beyond fundamental research, viral sequencing is increasingly vital for pathogen surveillance, epidemiology, and clinical applications. As sequencing methods rapidly evolve, the diversity of viral genomics applications and catalogued genomes continues to expand. Advances in long-read, single molecule, real-time sequencing methodologies present opportunities to sequence contiguous, haplotype resolved viral genomes in a range of research and applied settings. Here we present an overview of nucleic acid sequencing methods and their applications in studying viral genomes. We emphasise the advantages of different viral sequencing approaches, with a particular focus on the benefits of third-generation sequencing technologies in elucidating viral evolution, transmission networks, and pathogenesis.
Collapse
Affiliation(s)
- Natasha Jansz
- Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Geoffrey J Faulkner
- Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
4
|
Nie W, Qiu T, Wei Y, Ding H, Guo Z, Qiu J. Advances in phage-host interaction prediction: in silico method enhances the development of phage therapies. Brief Bioinform 2024; 25:bbae117. [PMID: 38555471 PMCID: PMC10981677 DOI: 10.1093/bib/bbae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/15/2024] [Accepted: 03/02/2024] [Indexed: 04/02/2024] Open
Abstract
Phages can specifically recognize and kill bacteria, which lead to important application value of bacteriophage in bacterial identification and typing, livestock aquaculture and treatment of human bacterial infection. Considering the variety of human-infected bacteria and the continuous discovery of numerous pathogenic bacteria, screening suitable therapeutic phages that are capable of infecting pathogens from massive phage databases has been a principal step in phage therapy design. Experimental methods to identify phage-host interaction (PHI) are time-consuming and expensive; high-throughput computational method to predict PHI is therefore a potential substitute. Here, we systemically review bioinformatic methods for predicting PHI, introduce reference databases and in silico models applied in these methods and highlight the strengths and challenges of current tools. Finally, we discuss the application scope and future research direction of computational prediction methods, which contribute to the performance improvement of prediction models and the development of personalized phage therapy.
Collapse
Affiliation(s)
- Wanchun Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, 200032, China
| | - Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Hao Ding
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Zhixiang Guo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| |
Collapse
|
5
|
Du Y, Fuhrman JA, Sun F. ViralCC retrieves complete viral genomes and virus-host pairs from metagenomic Hi-C data. Nat Commun 2023; 14:502. [PMID: 36720887 PMCID: PMC9889337 DOI: 10.1038/s41467-023-35945-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 01/09/2023] [Indexed: 02/01/2023] Open
Abstract
The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at https://github.com/dyxstat/ViralCC .
Collapse
Affiliation(s)
- Yuxuan Du
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Gupta AK, Kumar M. Benchmarking and Assessment of Eight De Novo Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:372-381. [PMID: 35759429 DOI: 10.1089/omi.2022.0042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight de novo assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative De Bruijn graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight de novo genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.
Collapse
Affiliation(s)
- Amit Kumar Gupta
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
7
|
Clement Dobbins G, Kimberlin D, Ross S. Cytomegalovirus variation among newborns treated with valganciclovir. Antiviral Res 2022; 203:105326. [DOI: 10.1016/j.antiviral.2022.105326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 04/19/2022] [Accepted: 04/20/2022] [Indexed: 11/02/2022]
|
8
|
Johansen J, Plichta DR, Nissen JN, Jespersen ML, Shah SA, Deng L, Stokholm J, Bisgaard H, Nielsen DS, Sørensen SJ, Rasmussen S. Genome binning of viral entities from bulk metagenomics data. Nat Commun 2022; 13:965. [PMID: 35181661 PMCID: PMC8857322 DOI: 10.1038/s41467-022-28581-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 01/28/2022] [Indexed: 12/26/2022] Open
Abstract
Despite the accelerating number of uncultivated virus sequences discovered in metagenomics and their apparent importance for health and disease, the human gut virome and its interactions with bacteria in the gastrointestinal tract are not well understood. This is partly due to a paucity of whole-virome datasets and limitations in current approaches for identifying viral sequences in metagenomics data. Here, combining a deep-learning based metagenomics binning algorithm with paired metagenome and metavirome datasets, we develop Phages from Metagenomics Binning (PHAMB), an approach that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations. When applied on the Human Microbiome Project 2 (HMP2) dataset, PHAMB recovered 6,077 high-quality genomes from 1,024 viral populations, and identified viral-microbial host interactions. PHAMB can be advantageously applied to existing and future metagenomes to illuminate viral ecological dynamics with other microbiome constituents.
Collapse
Affiliation(s)
- Joachim Johansen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Damian R Plichta
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jakob Nybo Nissen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,Statens Serum Institut, Viral & Microbial Special diagnostics, Copenhagen, Denmark
| | - Marie Louise Jespersen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.,National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Shiraz A Shah
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Ling Deng
- Section of Food Microbiology and Fermentation, Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Jakob Stokholm
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark.,Section of Food Microbiology and Fermentation, Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Hans Bisgaard
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Dennis Sandris Nielsen
- Section of Food Microbiology and Fermentation, Department of Food Science, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Søren J Sørensen
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
9
|
Abstract
Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.
Collapse
Affiliation(s)
- Lee Call
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| | - Stephen Nayfach
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| |
Collapse
|
10
|
Chiara M, D’Erchia AM, Gissi C, Manzari C, Parisi A, Resta N, Zambelli F, Picardi E, Pavesi G, Horner DS, Pesole G. Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities. Brief Bioinform 2021; 22:616-630. [PMID: 33279989 PMCID: PMC7799330 DOI: 10.1093/bib/bbaa297] [Citation(s) in RCA: 118] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/27/2020] [Accepted: 10/07/2020] [Indexed: 12/31/2022] Open
Abstract
Various next generation sequencing (NGS) based strategies have been successfully used in the recent past for tracing origins and understanding the evolution of infectious agents, investigating the spread and transmission chains of outbreaks, as well as facilitating the development of effective and rapid molecular diagnostic tests and contributing to the hunt for treatments and vaccines. The ongoing COVID-19 pandemic poses one of the greatest global threats in modern history and has already caused severe social and economic costs. The development of efficient and rapid sequencing methods to reconstruct the genomic sequence of SARS-CoV-2, the etiological agent of COVID-19, has been fundamental for the design of diagnostic molecular tests and to devise effective measures and strategies to mitigate the diffusion of the pandemic. Diverse approaches and sequencing methods can, as testified by the number of available sequences, be applied to SARS-CoV-2 genomes. However, each technology and sequencing approach has its own advantages and limitations. In the current review, we will provide a brief, but hopefully comprehensive, account of currently available platforms and methodological approaches for the sequencing of SARS-CoV-2 genomes. We also present an outline of current repositories and databases that provide access to SARS-CoV-2 genomic data and associated metadata. Finally, we offer general advice and guidelines for the appropriate sharing and deposition of SARS-CoV-2 data and metadata, and suggest that more efficient and standardized integration of current and future SARS-CoV-2-related data would greatly facilitate the struggle against this new pathogen. We hope that our 'vademecum' for the production and handling of SARS-CoV-2-related sequencing data, will contribute to this objective.
Collapse
Affiliation(s)
- Matteo Chiara
- molecular biology and bioinformatics at the University of Milan
| | - Anna Maria D’Erchia
- molecular biology at the University of Bari and research associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Carmela Gissi
- molecular biology at the University of Bari and research associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Caterina Manzari
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Antonio Parisi
- Genetic and Molecular Epidemiology Laboratory at the Experimental Zooprophylactic Institute of Apulia and Basilicata
| | - Nicoletta Resta
- Medical Genetics at the University of Bari. She heads the Laboratory Unit of Medical Genetics and the School of Specialization in Medical Genetics
| | | | - Ernesto Picardi
- molecular biology and bioinformatics at the University of Bari and research associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| | - Giulio Pavesi
- Associate Professor of bioinformatics at the University of Milan (Italy)
| | - David S Horner
- molecular biology and bioinformatics at the University of Milan
| | - Graziano Pesole
- molecular biology at the University of Bari and Research Associate at the Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council in Bari
| |
Collapse
|
11
|
CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol 2020; 39:578-585. [PMID: 33349699 PMCID: PMC8116208 DOI: 10.1038/s41587-020-00774-7] [Citation(s) in RCA: 555] [Impact Index Per Article: 138.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 11/12/2020] [Indexed: 02/07/2023]
Abstract
Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.
Collapse
|
12
|
Abstract
Colorectal cancer (CRC) is a leading cause of cancer-related deaths in both the USA and the world. Recent research has demonstrated the involvement of the gut microbiota in CRC development and progression. Microbial biomarkers of disease have focused primarily on the bacterial component of the microbiome; however, the viral portion of the microbiome, consisting of both bacteriophages and eukaryotic viruses, together known as the virome, has been lesser studied. Here we review the recent advancements in high-throughput sequencing (HTS) technologies and bioinformatics, which have enabled scientists to better understand how viruses might influence the development of colorectal cancer. We discuss the contemporary findings revealing modulations in the virome and their correlation with CRC development and progression. While a variety of challenges still face viral HTS detection in clinical specimens, we consider herein numerous next steps for future basic and clinical research. Clinicians need to move away from a single infectious agent model for disease etiology by grasping new, more encompassing etiological paradigms, in which communities of various microbial components interact with each other and the host. The reporting and indexing of patient health information, socioeconomic data, and other relevant metadata will enable identification of predictive variables and covariates of viral presence and CRC development. Altogether, the virome has a more profound role in carcinogenesis and cancer progression than once thought, and viruses, specific for either human cells or bacteria, are clinically relevant in understanding CRC pathology, patient prognosis, and treatment development.
Collapse
|
13
|
Petrovich ML, Zilberman A, Kaplan A, Eliraz GR, Wang Y, Langenfeld K, Duhaime M, Wigginton K, Poretsky R, Avisar D, Wells GF. Microbial and Viral Communities and Their Antibiotic Resistance Genes Throughout a Hospital Wastewater Treatment System. Front Microbiol 2020; 11:153. [PMID: 32140141 PMCID: PMC7042388 DOI: 10.3389/fmicb.2020.00153] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Accepted: 01/22/2020] [Indexed: 11/16/2022] Open
Abstract
Antibiotic resistance poses a serious threat to global public health, and antibiotic resistance determinants can enter natural aquatic systems through discharge of wastewater effluents. Hospital wastewater in particular is expected to contain high abundances of antibiotic resistance genes (ARGs) compared to municipal wastewater because it contains human enteric bacteria that may include antibiotic-resistant organisms originating from hospital patients, and can also have high concentrations of antibiotics and antimicrobials relative to municipal wastewater. Viruses also play an important role in wastewater treatment systems since they can influence the bacterial community composition through killing bacteria, facilitating transduction of genetic material between organisms, and modifying the chromosomal content of bacteria as prophages. However, little is known about the fate and connections between ARGs, viruses, and their associated bacteria in hospital wastewater systems. To address this knowledge gap, we characterized the composition and persistence of ARGs, dsDNA viruses, and bacteria from influent to effluent in a pilot-scale hospital wastewater treatment system in Israel using shotgun metagenomics. Results showed that ARGs, including genes conferring resistance to antibiotics of high clinical relevance, were detected in all sampling locations throughout the pilot-scale system, with only 16% overall depletion of ARGs per genome equivalent between influent and effluent. The most common classes of ARGs detected throughout the system conferred resistance to aminoglycoside, cephalosporin, macrolide, penam, and tetracycline antibiotics. A greater proportion of total ARGs were associated with plasmid-associated genes in effluent compared to in influent. No strong associations between viral sequences and ARGs were identified in viral metagenomes from the system, suggesting that phage may not be a significant vector for ARG transfer in this system. The majority of viruses in the pilot-scale system belonged to the families Myoviridae, Podoviridae, and Siphoviridae. Gammaproteobacteria was the dominant class of bacteria harboring ARGs and the most common putative viral host in all samples, followed by Bacilli and Betaproteobacteria. In the total bacterial community, the dominant class was Betaproteobacteria for each sample. Overall, we found that a variety of different types of ARGs and viruses were persistent throughout this hospital wastewater treatment system, which can be released to the environment through effluent discharge.
Collapse
Affiliation(s)
- Morgan L. Petrovich
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, United States
| | - Adi Zilberman
- The Water Research Center, School of The Environment and Earth Sciences, Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Aviv Kaplan
- The Water Research Center, School of The Environment and Earth Sciences, Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Gefen R. Eliraz
- The Water Research Center, School of The Environment and Earth Sciences, Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yubo Wang
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, United States
| | - Kathryn Langenfeld
- Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, United States
| | - Melissa Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, United States
| | - Krista Wigginton
- Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, United States
| | - Rachel Poretsky
- Department of Biological Sciences, The University of Illinois at Chicago, Chicago, IL, United States
| | - Dror Avisar
- The Water Research Center, School of The Environment and Earth Sciences, Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel
| | - George F. Wells
- Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL, United States
| |
Collapse
|
14
|
Dobbins GC, Patki A, Chen D, Tiwari HK, Hendrickson C, Britt WJ, Fowler K, Chen JY, Boppana SB, Ross SA. Association of CMV genomic mutations with symptomatic infection and hearing loss in congenital CMV infection. BMC Infect Dis 2019; 19:1046. [PMID: 31822287 PMCID: PMC6905059 DOI: 10.1186/s12879-019-4681-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 11/29/2019] [Indexed: 12/23/2022] Open
Abstract
Background Congenital cytomegalovirus (cCMV) infection is the most common congenital infection and a leading cause of long-term neurological and sensory sequelae, the most common being sensorineural hearing loss (SNHL). Despite extensive research, clinical or laboratory markers to identify CMV infected children with increased risk for disease have not been identified. This study utilizes viral whole-genome next generation-sequencing (NGS) of specimens from congenitally infected infants to explore viral diversity and specific viral variants that may be associated with symptomatic infection and SNHL. Methods CMV DNA from urine specimens of 30 infants (17 asymptomatic, 13 symptomatic) was target enriched and next generation sequenced resulting in 93% coverage of the CMV genome allowing analysis of viral diversity. Results Variant frequency distribution was compared between children with symptomatic and asymptomatic cCMV and those with (n = 13) and without (n = 17) hearing loss. The CMV genes UL48A, UL88, US19 and US22 were found to have an increase in nucleotide diversity in symptomatic children; while UL57, UL20, UL104, US14, UL115, and UL35 had an increase in diversity in children with hearing loss. An analysis of single variant differences between symptomatic and asymptomatic children found UL55 to have the highest number, while the most variants associated with SNHL were in the RL11 gene family. In asymptomatic infants with SNHL, mutations were observed more frequently in UL33 and UL20. Conclusion CMV genomes from infected newborns can be mapped to 93% of the genome at a depth allowing accurate and reproducible analysis of polymorphisms for variant and gene discovery that may be linked to symptomatic and hearing loss outcomes.
Collapse
Affiliation(s)
- G Clement Dobbins
- Department of Pediatrics, The University of Alabama School of Medicine, CHB 116, 1600 6th Avenue South, Birmingham, AL, USA.
| | - Amit Patki
- Department of Biostatistics, The University of Alabama School of Public Health, Birmingham, AL, USA
| | - Dongquan Chen
- Informatics Institute, The University of Alabama at Birmingham, Birmingham, AL, USA.,Department of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Hemant K Tiwari
- Department of Biostatistics, The University of Alabama School of Public Health, Birmingham, AL, USA
| | - Curtis Hendrickson
- Department of Microbiology, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - William J Britt
- Department of Pediatrics, The University of Alabama School of Medicine, CHB 116, 1600 6th Avenue South, Birmingham, AL, USA.,Informatics Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Karen Fowler
- Department of Pediatrics, The University of Alabama School of Medicine, CHB 116, 1600 6th Avenue South, Birmingham, AL, USA
| | - Jake Y Chen
- Informatics Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Suresh B Boppana
- Department of Pediatrics, The University of Alabama School of Medicine, CHB 116, 1600 6th Avenue South, Birmingham, AL, USA.,Department of Microbiology, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Shannon A Ross
- Department of Pediatrics, The University of Alabama School of Medicine, CHB 116, 1600 6th Avenue South, Birmingham, AL, USA. .,Department of Microbiology, The University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
15
|
Detecting viral sequences in NGS data. Curr Opin Virol 2019; 39:41-48. [DOI: 10.1016/j.coviro.2019.07.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 07/29/2019] [Accepted: 07/30/2019] [Indexed: 01/03/2023]
|
16
|
Coding-Complete Genome Sequence of a Pollen-Associated Virus Belonging to the Secoviridae Family Recovered from a Japanese Apricot ( Prunus mume) Metagenome Data Set. Microbiol Resour Announc 2019; 8:8/40/e00881-19. [PMID: 31582454 PMCID: PMC6776771 DOI: 10.1128/mra.00881-19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
We report the coding-complete genome sequence of Japanese apricot pollen-associated secovirus 1 (JAPSV1), a virus belonging to the Secoviridae family, recovered from Japanese apricot (Prunus mume) pollen that is closely related to Peach leaf pitting-associated virus (PLPAV). This discovery adds to the number of known pollen-associated viruses. We report the coding-complete genome sequence of Japanese apricot pollen-associated secovirus 1 (JAPSV1), a virus belonging to the Secoviridae family, recovered from Japanese apricot (Prunus mume) pollen that is closely related to Peach leaf pitting-associated virus (PLPAV). This discovery adds to the number of known pollen-associated viruses.
Collapse
|
17
|
Garretto A, Hatzopoulos T, Putonti C. virMine: automated detection of viral sequences from complex metagenomic samples. PeerJ 2019; 7:e6695. [PMID: 30993039 PMCID: PMC6462185 DOI: 10.7717/peerj.6695] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 02/26/2019] [Indexed: 12/29/2022] Open
Abstract
Metagenomics has enabled sequencing of viral communities from a myriad of different environments. Viral metagenomic studies routinely uncover sequences with no recognizable homology to known coding regions or genomes. Nevertheless, complete viral genomes have been constructed directly from complex community metagenomes, often through tedious manual curation. To address this, we developed the software tool virMine to identify viral genomes from raw reads representative of viral or mixed (viral and bacterial) communities. virMine automates sequence read quality control, assembly, and annotation. Researchers can easily refine their search for a specific study system and/or feature(s) of interest. In contrast to other viral genome detection tools that often rely on the recognition of viral signature sequences, virMine is not restricted by the insufficient representation of viral diversity in public data repositories. Rather, viral genomes are identified through an iterative approach, first omitting non-viral sequences. Thus, both relatives of previously characterized viruses and novel species can be detected, including both eukaryotic viruses and bacteriophages. Here we present virMine and its analysis of synthetic communities as well as metagenomic data sets from three distinctly different environments: the gut microbiota, the urinary microbiota, and freshwater viromes. Several new viral genomes were identified and annotated, thus contributing to our understanding of viral genetic diversity in these three environments.
Collapse
Affiliation(s)
- Andrea Garretto
- Bioinformatics Program, Loyola University of Chicago, Chicago, IL, United States of America
| | - Thomas Hatzopoulos
- Department of Computer Science, Loyola University of Chicago, Chicago, IL, United States of America
| | - Catherine Putonti
- Bioinformatics Program, Loyola University of Chicago, Chicago, IL, United States of America.,Department of Computer Science, Loyola University of Chicago, Chicago, IL, United States of America.,Department of Biology, Loyola University of Chicago, Chicago, IL, United States of America.,Department of Microbiology and Immunology, Loyola University of Chicago, Maywood, IL, United States of America
| |
Collapse
|
18
|
Sutton TDS, Clooney AG, Ryan FJ, Ross RP, Hill C. Choice of assembly software has a critical impact on virome characterisation. MICROBIOME 2019; 7:12. [PMID: 30691529 PMCID: PMC6350398 DOI: 10.1186/s40168-019-0626-5] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 01/14/2019] [Indexed: 05/19/2023]
Abstract
BACKGROUND The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. DESIGN This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. RESULTS Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.
Collapse
Affiliation(s)
- Thomas D S Sutton
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
| | - Adam G Clooney
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
| | - Feargal J Ryan
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
- Present Address: South Australian Health and Medical Research Institute, Adelaide, Australia
| | - R Paul Ross
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
- Teagasc Food Research Centre, Fermoy, Cork, Ireland
| | - Colin Hill
- APC Microbiome Ireland, Cork, Ireland.
- School for Microbiology, University College Cork, Cork, Ireland.
| |
Collapse
|
19
|
Nasko DJ, Chopyk J, Sakowski EG, Ferrell BD, Polson SW, Wommack KE. Family A DNA Polymerase Phylogeny Uncovers Diversity and Replication Gene Organization in the Virioplankton. Front Microbiol 2018; 9:3053. [PMID: 30619142 PMCID: PMC6302109 DOI: 10.3389/fmicb.2018.03053] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/27/2018] [Indexed: 12/20/2022] Open
Abstract
Shotgun metagenomics, which allows for broad sampling of viral diversity, has uncovered genes that are widely distributed among virioplankton populations and show linkages to important biological features of unknown viruses. Over 25% of known dsDNA phage carry the DNA polymerase I (polA) gene, making it one of the most widely distributed phage genes. Because of its pivotal role in DNA replication, this enzyme is linked to phage lifecycle characteristics. Previous research has suggested that a single amino acid substitution might be predictive of viral lifestyle. In this study Chesapeake Bay virioplankton were sampled by shotgun metagenomic sequencing (using long and short read technologies). More polA sequences were predicted from this single viral metagenome (virome) than from 86 globally distributed virome libraries (ca. 2,100, and 1,200, respectively). The PolA peptides predicted from the Chesapeake Bay virome clustered with 69% of PolA peptides from global viromes; thus, remarkably the Chesapeake Bay virome captured the majority of known PolA peptide diversity in viruses. This deeply sequenced virome also expanded the diversity of PolA sequences, increasing the number of PolA clusters by 44%. Contigs containing polA sequences were also used to examine relationships between phylogenetic clades of PolA and other genes within unknown viral populations. Phylogenic analysis revealed five distinct groups of phages distinguished by the amino acids at their 762 (Escherichia coli IAI39 numbering) positions and replication genes. DNA polymerase I sequences from Tyr762 and Phe762 groups were most often neighbored by ring-shaped superfamily IV helicases and ribonucleotide reductases (RNRs). The Leu762 groups had non-ring shaped helicases from superfamily II and were further distinguished by an additional helicase gene from superfamily I and the lack of any identifiable RNR genes. Moreover, we found that the inclusion of ribonucleotide reductase associated with PolA helped to further differentiate phage diversity, chiefly within lytic podovirus populations. Altogether, these data show that DNA Polymerase I is a useful marker for observing the diversity and composition of the virioplankton and may be a driving factor in the divergence of phage replication components.
Collapse
Affiliation(s)
- Daniel J Nasko
- Delaware Biotechnology Institute, University of Delaware, Newark, DE, United States
| | - Jessica Chopyk
- School of Public Health, University of Maryland, College Park, MD, United States
| | - Eric G Sakowski
- Department of Environmental Health and Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Barbra D Ferrell
- Delaware Biotechnology Institute, University of Delaware, Newark, DE, United States
| | - Shawn W Polson
- Delaware Biotechnology Institute, University of Delaware, Newark, DE, United States
| | - K Eric Wommack
- Delaware Biotechnology Institute, University of Delaware, Newark, DE, United States
| |
Collapse
|
20
|
Galiez C, Siebert M, Enault F, Vincent J, Söding J. WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics 2018; 33:3113-3114. [PMID: 28957499 PMCID: PMC5870724 DOI: 10.1093/bioinformatics/btx383] [Citation(s) in RCA: 138] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 07/11/2017] [Indexed: 11/24/2022] Open
Abstract
Summary WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies. Availability and implementation OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Clovis Galiez
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, 37077?Göttingen, Germany
| | - Matthias Siebert
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, 37077?Göttingen, Germany
| | - François Enault
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| | - Jonathan Vincent
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, 37077?Göttingen, Germany
| |
Collapse
|
21
|
Tomasik J, Smits SL, Leweke FM, Eljasz P, Pas S, Kahn RS, Osterhaus ADME, Bahn S, de Witte LD. Virus discovery analyses on post-mortem brain tissue and cerebrospinal fluid of schizophrenia patients. Schizophr Res 2018; 197:605-606. [PMID: 29478863 DOI: 10.1016/j.schres.2018.02.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 12/05/2017] [Accepted: 02/14/2018] [Indexed: 10/18/2022]
Affiliation(s)
- Jakub Tomasik
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Saskia L Smits
- Department of Viroscience, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - F Markus Leweke
- Rain and Mind Centre, The University of Sydney, Sydney, Australia; Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Paweł Eljasz
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Suzan Pas
- Department of Viroscience, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - René S Kahn
- Department of Psychiatry, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Albert D M E Osterhaus
- Department of Viroscience, Erasmus University Medical Center, Rotterdam, The Netherlands; Research Center for Emerging Infections and Zoonoses (RIZ), University of Veterinary Medicine, Hannover, Germany
| | - Sabine Bahn
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Lot D de Witte
- Department of Psychiatry, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA.
| |
Collapse
|
22
|
Parras-Moltó M, Rodríguez-Galet A, Suárez-Rodríguez P, López-Bueno A. Evaluation of bias induced by viral enrichment and random amplification protocols in metagenomic surveys of saliva DNA viruses. MICROBIOME 2018; 6:119. [PMID: 29954453 PMCID: PMC6022446 DOI: 10.1186/s40168-018-0507-3] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 06/19/2018] [Indexed: 05/02/2023]
Abstract
BACKGROUND Viruses are key players regulating microbial ecosystems. Exploration of viral assemblages is now possible thanks to the development of metagenomics, the most powerful tool available for studying viral ecology and discovering new viruses. Unfortunately, several sources of bias lead to the misrepresentation of certain viruses within metagenomics workflows, hindering the shift from merely descriptive studies towards quantitative comparisons of communities. Therefore, benchmark studies on virus enrichment and random amplification protocols are required to better understand the sources of bias. RESULTS We assessed the bias introduced by viral enrichment on mock assemblages composed of seven DNA viruses, and the bias from random amplification methods on human saliva DNA viromes, using qPCR and deep sequencing, respectively. While iodixanol cushions and 0.45 μm filtration preserved the original composition of nuclease-protected viral genomes, low-force centrifugation and 0.22 μm filtration removed large viruses. Comparison of unamplified and randomly amplified saliva viromes revealed that multiple displacement amplification (MDA) induced stochastic bias from picograms of DNA template. However, the type of bias shifted to systematic using 1 ng, with only a marginal influence by amplification time. Systematic bias consisted of over-amplification of small circular genomes, and under-amplification of those with extreme GC content, a negative bias that was shared with the PCR-based sequence-independent, single-primer amplification (SISPA) method. MDA based on random priming provided by a DNA primase activity slightly outperformed those based on random hexamers and SISPA, which may reflect differences in ability to handle sequences with extreme GC content. SISPA viromes showed uneven coverage profiles, with high coverage peaks in regions with low linguistic sequence complexity. Despite misrepresentation of certain viruses after random amplification, ordination plots based on dissimilarities among contig profiles showed perfect overlapping of related amplified and unamplified saliva viromes and strong separation from unrelated saliva viromes. This result suggests that random amplification bias has a minor impact on beta diversity studies. CONCLUSIONS Benchmark analyses of mock and natural communities of viruses improve understanding and mitigate bias in metagenomics surveys. Bias induced by random amplification methods has only a minor impact on beta diversity studies of human saliva viromes.
Collapse
Affiliation(s)
- Marcos Parras-Moltó
- Centro de Biología Molecular Severo Ochoa (Universidad Autónoma de Madrid/Consejo Superior de Investigaciones Científicas), Madrid, Spain
| | - Ana Rodríguez-Galet
- Centro de Biología Molecular Severo Ochoa (Universidad Autónoma de Madrid/Consejo Superior de Investigaciones Científicas), Madrid, Spain
| | - Patricia Suárez-Rodríguez
- Centro de Biología Molecular Severo Ochoa (Universidad Autónoma de Madrid/Consejo Superior de Investigaciones Científicas), Madrid, Spain
| | - Alberto López-Bueno
- Centro de Biología Molecular Severo Ochoa (Universidad Autónoma de Madrid/Consejo Superior de Investigaciones Científicas), Madrid, Spain.
| |
Collapse
|
23
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
24
|
Bodewes R. Novel viruses in birds: Flying through the roof or is a cage needed? Vet J 2018; 233:55-62. [PMID: 29486880 DOI: 10.1016/j.tvjl.2017.12.023] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 09/28/2017] [Accepted: 12/28/2017] [Indexed: 01/17/2023]
Abstract
Emerging viral diseases continue to have a major global impact on human beings and animals. To be able to take adequate measures in case of an outbreak of an emerging disease, rapid detection of the causative agent is a crucial first step. In this review, various aspects of virus discovery are discussed, with a special focus on recently discovered viruses in birds. Novel viruses with a potential major impact have been discovered in domestic and wild bird species in recent years using various virus discovery methods. Only a few studies report the detection of novel viruses in endangered bird species, although increased knowledge about viruses circulating in these species is important. Additional studies focusing on the exact role of a novel virus in disease and on the impact of a novel virus on bird populations are often lacking. Intensive collaboration between different disciplines is needed to obtain useful information about the role of these novel viruses.
Collapse
Affiliation(s)
- R Bodewes
- Department of Farm Animal Health, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
25
|
White DJ, Wang J, Hall RJ. Assessing the Impact of Assemblers on Virus Detection in a De Novo Metagenomic Analysis Pipeline. J Comput Biol 2017; 24:874-881. [PMID: 28414526 PMCID: PMC5610382 DOI: 10.1089/cmb.2017.0008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Applying high-throughput sequencing to pathogen discovery is a relatively new field, the objective of which is to find disease-causing agents when little or no background information on disease is available. Key steps in the process are the generation of millions of sequence reads from an infected tissue sample, followed by assembly of these reads into longer, contiguous stretches of nucleotide sequences, and then identification of the contigs by matching them to known databases, such as those stored at GenBank or Ensembl. This technique, that is, de novo metagenomics, is particularly useful when the pathogen is viral and strong discriminatory power can be achieved. However, recently, we found that striking differences in results can be achieved when different assemblers were used. In this study, we test formally the impact of five popular assemblers (MIRA, VELVET, METAVELVET, SPADES, and OMEGA) on the detection of a novel virus and assembly of its whole genome in a data set for which we have confirmed the presence of the virus by empirical laboratory techniques, and compare the overall performance between assemblers. Our results show that if results from only one assembler are considered, biologically important reads can easily be overlooked. The impacts of these results on the field of pathogen discovery are considered.
Collapse
Affiliation(s)
| | - Jing Wang
- Institute of Environmental Science and Research at the National Centre for Biosecurity and Infectious Disease, Upper Hutt, New Zealand
| | - Richard J. Hall
- Animal Health Laboratory, Investigation and Diagnostic Centres and Response, Ministry for Primary Industries—Manatū Ahu Matua, Upper Hutt, New Zealand
| |
Collapse
|
26
|
Bovo S, Mazzoni G, Ribani A, Utzeri VJ, Bertolini F, Schiavo G, Fontanesi L. A viral metagenomic approach on a non-metagenomic experiment: Mining next generation sequencing datasets from pig DNA identified several porcine parvoviruses for a retrospective evaluation of viral infections. PLoS One 2017; 12:e0179462. [PMID: 28662150 PMCID: PMC5491021 DOI: 10.1371/journal.pone.0179462] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 05/29/2017] [Indexed: 12/14/2022] Open
Abstract
Shot-gun next generation sequencing (NGS) on whole DNA extracted from specimens collected from mammals often produces reads that are not mapped (i.e. unmapped reads) on the host reference genome and that are usually discarded as by-products of the experiments. In this study, we mined Ion Torrent reads obtained by sequencing DNA isolated from archived blood samples collected from 100 performance tested Italian Large White pigs. Two reduced representation libraries were prepared from two DNA pools constructed each from 50 equimolar DNA samples. Bioinformatic analyses were carried out to mine unmapped reads on the reference pig genome that were obtained from the two NGS datasets. In silico analyses included read mapping and sequence assembly approaches for a viral metagenomic analysis using the NCBI Viral Genome Resource. Our approach identified sequences matching several viruses of the Parvoviridae family: porcine parvovirus 2 (PPV2), PPV4, PPV5 and PPV6 and porcine bocavirus 1-H18 isolate (PBoV1-H18). The presence of these viruses was confirmed by PCR and Sanger sequencing of individual DNA samples. PPV2, PPV4, PPV5, PPV6 and PBoV1-H18 were all identified in samples collected in 1998-2007, 1998-2000, 1997-2000, 1998-2004 and 2003, respectively. For most of these viruses (PPV4, PPV5, PPV6 and PBoV1-H18) previous studies reported their first occurrence much later (from 5 to more than 10 years) than our identification period and in different geographic areas. Our study provided a retrospective evaluation of apparently asymptomatic parvovirus infected pigs providing information that could be important to define occurrence and prevalence of different parvoviruses in South Europe. This study demonstrated the potential of mining NGS datasets non-originally derived by metagenomics experiments for viral metagenomics analyses in a livestock species.
Collapse
Affiliation(s)
- Samuele Bovo
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Biological, Geological, and Environmental Sciences (BiGeA), Biocomputing Group, University of Bologna, Bologna, Italy
| | - Gianluca Mazzoni
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Anisa Ribani
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Valerio Joe Utzeri
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Francesca Bertolini
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- Department of Animal Science, Iowa State University, Iowa, United States of America
| | - Giuseppina Schiavo
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
| | - Luca Fontanesi
- Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Bologna, Italy
- * E-mail:
| |
Collapse
|
27
|
Gupta A, Kumar S, Prasoodanan VPK, Harish K, Sharma AK, Sharma VK. Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes. Front Microbiol 2016; 7:469. [PMID: 27148174 PMCID: PMC4828583 DOI: 10.3389/fmicb.2016.00469] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 03/21/2016] [Indexed: 11/13/2022] Open
Abstract
Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments.
Collapse
Affiliation(s)
- Ankit Gupta
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - Sanjiv Kumar
- Department of Medicine, University of Connecticut Health Center Farmington, CT, USA
| | - Vishnu P K Prasoodanan
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - K Harish
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - Ashok K Sharma
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| | - Vineet K Sharma
- Metagenomics and Systems Biology Group, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, India
| |
Collapse
|
28
|
|
29
|
Bekliz M, Verneau J, Benamar S, Raoult D, La Scola B, Colson P. A New Zamilon-like Virophage Partial Genome Assembled from a Bioreactor Metagenome. Front Microbiol 2015; 6:1308. [PMID: 26640459 PMCID: PMC4661282 DOI: 10.3389/fmicb.2015.01308] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 11/09/2015] [Indexed: 12/27/2022] Open
Abstract
Virophages replicate within viral factories inside the Acanthamoeba cytoplasm, and decrease the infectivity and replication of their associated giant viruses. Culture isolation and metagenome analyses have suggested that they are common in our environment. By screening metagenomic databases in search of amoebal viruses, we detected virophage-related sequences among sequences generated from the same non-aerated bioreactor metagenome as recently screened by another team for virophage capsid-encoding genes. We describe here the assembled partial genome of a virophage closely related to Zamilon, which infects Acanthamoeba with mimiviruses of lineages B and C but not A. Searches for sequences related to amoebal giant viruses, other Megavirales representatives and virophages were conducted using BLAST against this bioreactor metagenome (PRJNA73603). Comparative genomic and phylogenetic analyses were performed using sequences from previously identified virophages. A total of 72 metagenome contigs generated from the bioreactor were identified as best matching with sequences from Megavirales representatives, mostly Pithovirus sibericum, pandoraviruses and amoebal mimiviruses from three lineages A–C, as well as from virophages. In addition, a partial genome from a Zamilon-like virophage, we named Zamilon 2, was assembled. This genome has a size of 6716 base pairs, corresponding to 39% of the Zamilon genome, and comprises partial or full-length homologs for 15 Zamilon predicted open reading frames (ORFs). Mean nucleotide and amino acid identities for these 15 Zamilon 2 ORFs with their Zamilon counterparts were 89% (range, 81–96%) and 91% (range, 78–99%), respectively. Notably, these ORFs included two encoding a capsid protein and a packaging ATPase. Comparative genomics and phylogenetic analyses indicated that the partial genome was that of a new Zamilon-like virophage. Further studies are needed to gain better knowledge of the tropism and prevalence of virophages in our biosphere and in humans.
Collapse
Affiliation(s)
- Meriem Bekliz
- URMITE, UM 63, Centre National de la Recherche Scientifique 7278, IRD 198, INSERM U1095, Aix-Marseille University Marseille, France
| | - Jonathan Verneau
- URMITE, UM 63, Centre National de la Recherche Scientifique 7278, IRD 198, INSERM U1095, Aix-Marseille University Marseille, France
| | - Samia Benamar
- URMITE, UM 63, Centre National de la Recherche Scientifique 7278, IRD 198, INSERM U1095, Aix-Marseille University Marseille, France
| | - Didier Raoult
- URMITE, UM 63, Centre National de la Recherche Scientifique 7278, IRD 198, INSERM U1095, Aix-Marseille University Marseille, France ; IHU Méditerranée Infection, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, Centre Hospitalo-Universitaire Timone Marseille, France ; Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University Jeddah, Saudi Arabia
| | - Bernard La Scola
- URMITE, UM 63, Centre National de la Recherche Scientifique 7278, IRD 198, INSERM U1095, Aix-Marseille University Marseille, France ; IHU Méditerranée Infection, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, Centre Hospitalo-Universitaire Timone Marseille, France
| | - Philippe Colson
- URMITE, UM 63, Centre National de la Recherche Scientifique 7278, IRD 198, INSERM U1095, Aix-Marseille University Marseille, France ; IHU Méditerranée Infection, Pôle des Maladies Infectieuses et Tropicales Clinique et Biologique, Fédération de Bactériologie-Hygiène-Virologie, Centre Hospitalo-Universitaire Timone Marseille, France
| |
Collapse
|
30
|
Baumgärtner W. Combatting the Myth of Neuropathology. Vet Pathol 2015; 52:994-7. [PMID: 26542276 DOI: 10.1177/0300985815600501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- W Baumgärtner
- Center of Systems Neuroscience, Department of Pathology, University of Veterinary Medicine, Hannover, Germany
| |
Collapse
|
31
|
Aflitos SA, Severing E, Sanchez-Perez G, Peters S, de Jong H, de Ridder D. Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data. BMC Bioinformatics 2015; 16:352. [PMID: 26525298 PMCID: PMC4630969 DOI: 10.1186/s12859-015-0806-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Accepted: 10/29/2015] [Indexed: 12/05/2022] Open
Abstract
Background Identification of biological specimens is a requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but generally do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances. Results We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on genome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100 % identification accuracy at supra-species level and 78 % accuracy at the species level. Conclusion CNIDARIA allows for fast, resource-efficient comparison and identification of both raw and assembled genome and transcriptome data. This can help answer both fundamental (e.g. in phylogeny, ecological diversity analysis) and practical questions (e.g. sequencing quality control, primer design). Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0806-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saulo Alves Aflitos
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands. .,Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands.
| | - Edouard Severing
- Laboratory of Genetics, Wageningen University, Wageningen, The Netherlands.
| | - Gabino Sanchez-Perez
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands. .,Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands.
| | - Sander Peters
- Applied Bioinformatics, Plant Research International, Wageningen, The Netherlands.
| | - Hans de Jong
- Laboratory of Genetics, Wageningen University, Wageningen, The Netherlands.
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University, Wageningen, The Netherlands.
| |
Collapse
|
32
|
Smits SL, Bodewes R, Ruiz-González A, Baumgärtner W, Koopmans MP, Osterhaus ADME, Schürch AC. Recovering full-length viral genomes from metagenomes. Front Microbiol 2015; 6:1069. [PMID: 26483782 PMCID: PMC4589665 DOI: 10.3389/fmicb.2015.01069] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 09/17/2015] [Indexed: 12/17/2022] Open
Abstract
Infectious disease metagenomics is driven by the question: “what is causing the disease?” in contrast to classical metagenome studies which are guided by “what is out there?” In case of a novel virus, a first step to eventually establishing etiology can be to recover a full-length viral genome from a metagenomic sample. However, retrieval of a full-length genome of a divergent virus is technically challenging and can be time-consuming and costly. Here we discuss different assembly and fragment linkage strategies such as iterative assembly, motif searches, k-mer frequency profiling, coverage profile binning, and other strategies used to recover genomes of potential viral pathogens in a timely and cost-effective manner.
Collapse
Affiliation(s)
- Saskia L Smits
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| | - Rogier Bodewes
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| | - Aritz Ruiz-González
- Department of Zoology and Animal Cell Biology, University of the Basque Country (UPV/EHU) Vitoria-Gasteiz, Spain ; Systematics, Biogeography and Population Dynamics Research Group, Lascaray Research Center, University of the Basque Country (UPV/EHU) Vitoria-Gasteiz, Spain ; Conservation Genetics Laboratory, National Institute for Environmental Protection and Research Bologna, Italy
| | - Wolfgang Baumgärtner
- Department of Pathology, University of Veterinary Medicine Hannover Hannover, Germany
| | - Marion P Koopmans
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands ; Centre for Infectious Diseases Research, Diagnostics and Screening, National Institute for Public Health and the Environment Bilthoven, Netherlands
| | - Albert D M E Osterhaus
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands ; Center for Infection Medicine and Zoonoses Research Hannover, Germany
| | - Anita C Schürch
- Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands
| |
Collapse
|
33
|
García-López R, Vázquez-Castellanos JF, Moya A. Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations. Front Bioeng Biotechnol 2015; 3:141. [PMID: 26442255 PMCID: PMC4585024 DOI: 10.3389/fbioe.2015.00141] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 09/03/2015] [Indexed: 01/01/2023] Open
Abstract
Metagenomic libraries consist of DNA fragments from diverse species, with varying genome size and abundance. High-throughput sequencing platforms produce large volumes of reads from these libraries, which may be assembled into contigs, ideally resembling the original larger genomic sequences. The uneven species distribution, along with the stochasticity in sample processing and sequencing bias, impacts the success of accurate sequence assembly. Several assemblers enable the processing of viral metagenomic data de novo, generally using overlap layout consensus or de Bruijn graph approaches for contig assembly. The success of viral genomic reconstruction in these datasets is limited by the degree of fragmentation of each genome in the sample, which is dependent on the sequencing effort and the genome length. Depending on ecological, biological, or procedural biases, some fragments have a higher prevalence, or coverage, in the assembly. However, assemblers must face challenges, such as the formation of chimerical structures and intra-species variability. Diversity calculation relies on the classification of the sequences that comprise a metagenomic dataset. Whenever the corresponding genomic and taxonomic information is available, contigs matching the same species can be classified accordingly and the coverage of its genome can be calculated for that species. This may be used to compare populations by estimating abundance and assessing species distribution from this data. Nevertheless, the coverage does not take into account the degree of fragmentation, or else genome completeness, and is not necessarily representative of actual species distribution in the samples. Furthermore, undetermined sequences are abundant in viral metagenomic datasets, resulting in several independent contigs that cannot be assigned by homology or genomic information. These may only be classified as different operational taxonomic units (OTUs), sometimes remaining inadvisably unrelated. Thus, calculations using contigs as different OTUs ultimately overestimate diversity when compared to diversity calculated from species coverage. In order to compare the effect of coverage and fragmentation, we generated three sets of simulated Illumina paired-end reads with different sequencing depths. We compared different assemblies performed with RayMeta, CLC Assembly Cell, MEGAHIT, SPAdes, Meta-IDBA, SOAPdenovo, Velvet, Metavelvet, and MIRA with the best attainable assemblies for each dataset (formed by arranging data using known genome coordinates) by calculating different assembly statistics. A new fragmentation score was included to estimate the degree of genome fragmentation of each taxon and adjust the coverage accordingly. The abundance in the metagenome was compared by bootstrapping the assembly data and hierarchically clustering them with the best possible assembly. Additionally, richness and diversity indexes were calculated for all the resulting assemblies and were assessed under two distributions: contigs as independent OTUs and sequences classified by species. Finally, we search for the strongest correlations between the diversity indexes and the different assembly statistics. Although fragmentation was dependent of genome coverage, it was not as heavily influenced by the assembler. The sequencing depth was the predominant attractor that influenced the success of the assemblies. The coverage increased notoriously in larger datasets, whereas fragmentation values remained lower and unsaturated. While still far from obtaining the ideal assemblies, the RayMeta, SPAdes, and the CLC assemblers managed to build the most accurate contigs with larger datasets while Meta-IDBA showed a good performance with the medium-sized dataset, even after the adjusted coverage was calculated. Their resulting assemblies showed the highest coverage scores and the lowest fragmentation values. Alpha diversity calculated from contigs as OTUs resulted in significantly higher values for all assemblies when compared with actual species distribution, showing an overestimation due to the increased predicted abundance. Conversely, using PHACCS resulted in lower values for all assemblers. Different association methods (random-forest, generalized linear models, and the Spearman correlation index) support the number of contigs, the coverage, and fragmentation as the assembly parameters that most affect the estimation of the alpha diversity. Coverage calculations may provide an insight into relative completeness of a genome but they overlook missing fragments or overly separated sequences in a genome. The assembly of a highly fragmented genomes with high coverage may still lead to the clustering of different OTUs that are actually different fragments of a genome. Thus, it proves useful to penalize coverage with a fragmentation score. Using contigs for calculating alpha diversity result in overestimation but it is usually the only approach available. Still, it is enough for sample comparison. The best approach may be determined by choosing the assembler that better fits the sequencing depth and adjusting the parameters for longer accurate contigs whenever possible whereas diversity may be calculated considering taxonomical and genomic information if available.
Collapse
Affiliation(s)
- Rodrigo García-López
- Área de Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO)-Salud Pública , Valencia , Spain ; Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València , Paterna , Spain ; Consorcio de Investigación Biomédica en Red especializado en Epidemiología y Salud Pública (CIBERESP) , Madrid , Spain
| | - Jorge Francisco Vázquez-Castellanos
- Área de Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO)-Salud Pública , Valencia , Spain ; Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València , Paterna , Spain ; Consorcio de Investigación Biomédica en Red especializado en Epidemiología y Salud Pública (CIBERESP) , Madrid , Spain
| | - Andrés Moya
- Área de Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO)-Salud Pública , Valencia , Spain ; Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València , Paterna , Spain ; Consorcio de Investigación Biomédica en Red especializado en Epidemiología y Salud Pública (CIBERESP) , Madrid , Spain
| |
Collapse
|