1
|
Hegarty B, Riddell V J, Bastien E, Langenfeld K, Lindback M, Saini JS, Wing A, Zhang J, Duhaime M. Benchmarking informatics approaches for virus discovery: caution is needed when combining in silico identification methods. mSystems 2024; 9:e0110523. [PMID: 38376167 PMCID: PMC10949488 DOI: 10.1128/msystems.01105-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/24/2024] [Indexed: 02/21/2024] Open
Abstract
Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral identification tools. However, the effectiveness of this strategy is unknown. Here, we benchmarked combinations of six widely used informatics tools for viral identification and analysis (VirSorter, VirSorter2, VIBRANT, DeepVirFinder, CheckV, and Kaiju), called "rulesets." Rulesets were tested against mock metagenomes composed of taxonomically diverse sequence types and diverse aquatic metagenomes to assess the effects of the degree of viral enrichment and habitat on tool performance. We found that six rulesets achieved equivalent accuracy [Matthews Correlation Coefficient (MCC) = 0.77, Padj ≥ 0.05]. Each contained VirSorter2, and five used our "tuning removal" rule designed to remove non-viral contamination. While DeepVirFinder, VIBRANT, and VirSorter were each found once in these high-accuracy rulesets, they were not found in combination with each other: combining tools does not lead to optimal performance. Our validation suggests that the MCC plateau at 0.77 is partly caused by inaccurate labeling within reference sequence databases. In aquatic metagenomes, our highest MCC ruleset identified more viral sequences in virus-enriched (44%-46%) than in cellular metagenomes (7%-19%). While improved algorithms may lead to more accurate viral identification tools, this should be done in tandem with careful curation of sequence databases. We recommend using the VirSorter2 ruleset and our empirically derived tuning removal rule. Our analysis provides insight into methods for in silico viral identification and will enable more robust viral identification from metagenomic data sets. IMPORTANCE The identification of viruses from environmental metagenomes using informatics tools has offered critical insights in microbial ecology. However, it remains difficult for researchers to know which tools optimize viral recovery for their specific study. In an attempt to recover more viruses, studies are increasingly combining the outputs from multiple tools without validating this approach. After benchmarking combinations of six viral identification tools against mock metagenomes and environmental samples, we found that these tools should only be combined cautiously. Two to four tool combinations maximized viral recovery and minimized non-viral contamination compared with either the single-tool or the five- to six-tool ones. By providing a rigorous overview of the behavior of in silico viral identification strategies and a pipeline to replicate our process, our findings guide the use of existing viral identification tools and offer a blueprint for feature engineering of new tools that will lead to higher-confidence viral discovery in microbiome studies.
Collapse
Affiliation(s)
- Bridget Hegarty
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - James Riddell V
- Department of Microbiology, The Ohio State University, Columbus, Ohio, USA
| | - Eric Bastien
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Kathryn Langenfeld
- Department of Civil and Environmental Engineering, Stanford University, Palo Alto, California, USA
| | - Morgan Lindback
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jaspreet S. Saini
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Anthony Wing
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jessica Zhang
- Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Melissa Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
2
|
Da Silva AG, Bach E, Ellwanger JH, Chies JAB. Tips and tools to obtain and assess mosquito viromes. Arch Microbiol 2024; 206:132. [PMID: 38436750 DOI: 10.1007/s00203-023-03813-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/06/2023] [Accepted: 12/22/2023] [Indexed: 03/05/2024]
Abstract
Due to their vectorial capacity, mosquitoes (Diptera: Culicidae) receive special attention from health authorities and entomologists. These cosmopolitan insects are responsible for the transmission of many viral diseases, such as dengue and yellow fever, causing huge impacts on human health and justifying the intensification of research focused on mosquito-borne diseases. In this context, the study of the virome of mosquitoes can contribute to anticipate the emergence and/or the reemergence of infectious diseases. The assessment of mosquito viromes also contributes to the surveillance of a wide variety of viruses found in these insects, allowing the early detection of pathogens with public health importance. However, the study of mosquito viromes can be challenging due to the number and complexities of steps involved in this type of research. Therefore, this article aims to describe, in a straightforward and simplified way, the steps necessary for obtention and assessment of mosquito viromes. In brief, this article explores: the capture and preservation of specimens; sampling strategies; treatment of samples before DNA/RNA extraction; extraction methodologies; enrichment and purification processes; sequencing choices; and bioinformatics analysis.
Collapse
Affiliation(s)
- Amanda Gonzalez Da Silva
- Laboratory of Immunobiology and Immunogenetics, Department of Genetics, Postgraduate Program in Genetics and Molecular Biology (PPGBM), Universidade Federal do Rio Grande do Sul (UFRGS), UFRGS. Av. Bento Gonçalves, 9500, Porto Alegre, Rio Grande do Sul, Brazil
| | - Evelise Bach
- Laboratory of Immunobiology and Immunogenetics, Department of Genetics, Postgraduate Program in Genetics and Molecular Biology (PPGBM), Universidade Federal do Rio Grande do Sul (UFRGS), UFRGS. Av. Bento Gonçalves, 9500, Porto Alegre, Rio Grande do Sul, Brazil
| | - Joel Henrique Ellwanger
- Laboratory of Immunobiology and Immunogenetics, Department of Genetics, Postgraduate Program in Genetics and Molecular Biology (PPGBM), Universidade Federal do Rio Grande do Sul (UFRGS), UFRGS. Av. Bento Gonçalves, 9500, Porto Alegre, Rio Grande do Sul, Brazil
| | - José Artur Bogo Chies
- Laboratory of Immunobiology and Immunogenetics, Department of Genetics, Postgraduate Program in Genetics and Molecular Biology (PPGBM), Universidade Federal do Rio Grande do Sul (UFRGS), UFRGS. Av. Bento Gonçalves, 9500, Porto Alegre, Rio Grande do Sul, Brazil.
| |
Collapse
|
3
|
Kaszab E, Bali K, Marton S, Ursu K, Farkas SL, Fehér E, Domán M, Martella V, Bányai K. Metagenomic Identification of Novel Eukaryotic Viruses with Small DNA Genomes in Pheasants. Animals (Basel) 2024; 14:237. [PMID: 38254406 PMCID: PMC10812470 DOI: 10.3390/ani14020237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
A panel of intestinal samples collected from common pheasants (Phasianus colchicus) between 2008 and 2017 was used for metagenomic investigation using an unbiased enrichment protocol and different bioinformatic pipelines. The number of sequence reads in the metagenomic analysis ranged from 1,419,265 to 17,507,704 with a viral sequence read rate ranging from 0.01% to 59%. When considering the sequence reads of eukaryotic viruses, RNA and DNA viruses were identified in the samples, including but not limited to coronaviruses, reoviruses, parvoviruses, and CRESS DNA viruses (i.e., circular Rep-encoding single-stranded DNA viruses). Partial or nearly complete genome sequences were reconstructed of at least three different parvoviruses (dependoparvovirus, aveparvovirus and chaphamaparvovirus), as well as gyroviruses and diverse CRESS DNA viruses. Generating information of virus diversity will serve as a basis for developing specific diagnostic tools and for structured epidemiological investigations, useful to assess the impact of these novel viruses on animal health.
Collapse
Affiliation(s)
- Eszter Kaszab
- HUN-REN Veterinary Medical Research Institute, 1143 Budapest, Hungary; (E.K.); (K.B.); (S.M.); (E.F.); (M.D.)
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, 1143 Budapest, Hungary
- One Health Institute, Faculty of Health Sciences, University of Debrecen, 4032 Debrecen, Hungary
| | - Krisztina Bali
- HUN-REN Veterinary Medical Research Institute, 1143 Budapest, Hungary; (E.K.); (K.B.); (S.M.); (E.F.); (M.D.)
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, 1143 Budapest, Hungary
| | - Szilvia Marton
- HUN-REN Veterinary Medical Research Institute, 1143 Budapest, Hungary; (E.K.); (K.B.); (S.M.); (E.F.); (M.D.)
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, 1143 Budapest, Hungary
| | - Krisztina Ursu
- Veterinary Diagnostic Directorate, National Food Chain Safety Office, 1143 Budapest, Hungary;
| | - Szilvia L. Farkas
- Department of Obstetrics and Food Animal Medicine Clinic, University of Veterinary Medicine, 1078 Budapest, Hungary;
| | - Enikő Fehér
- HUN-REN Veterinary Medical Research Institute, 1143 Budapest, Hungary; (E.K.); (K.B.); (S.M.); (E.F.); (M.D.)
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, 1143 Budapest, Hungary
| | - Marianna Domán
- HUN-REN Veterinary Medical Research Institute, 1143 Budapest, Hungary; (E.K.); (K.B.); (S.M.); (E.F.); (M.D.)
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, 1143 Budapest, Hungary
| | - Vito Martella
- Department of Veterinary Medicine, University of Bari Aldo Moro, 70010 Valenzano, Italy;
| | - Krisztián Bányai
- HUN-REN Veterinary Medical Research Institute, 1143 Budapest, Hungary; (E.K.); (K.B.); (S.M.); (E.F.); (M.D.)
- National Laboratory for Infectious Animal Diseases, Antimicrobial Resistance, Veterinary Public Health and Food Chain Safety, 1143 Budapest, Hungary
- Department of Pharmacology and Toxicology, University of Veterinary Medicine, 1078 Budapest, Hungary
| |
Collapse
|
4
|
Pavia G, Marascio N, Matera G, Quirino A. Does the Human Gut Virome Contribute to Host Health or Disease? Viruses 2023; 15:2271. [PMID: 38005947 PMCID: PMC10674713 DOI: 10.3390/v15112271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/04/2023] [Accepted: 11/16/2023] [Indexed: 11/26/2023] Open
Abstract
The human gastrointestinal (GI) tract harbors eukaryotic and prokaryotic viruses and their genomes, metabolites, and proteins, collectively known as the "gut virome". This complex community of viruses colonizing the enteric mucosa is pivotal in regulating host immunity. The mechanisms involved in cross communication between mucosal immunity and the gut virome, as well as their relationship in health and disease, remain largely unknown. Herein, we review the literature on the human gut virome's composition and evolution and the interplay between the gut virome and enteric mucosal immunity and their molecular mechanisms. Our review suggests that future research efforts should focus on unraveling the mechanisms of gut viruses in human homeostasis and pathophysiology and on developing virus-prompted precision therapies.
Collapse
Affiliation(s)
| | - Nadia Marascio
- Unit of Clinical Microbiology, Department of Health Sciences, “Magna Græcia” University Hospital of Catanzaro, 88100 Catanzaro, Italy
| | | | | |
Collapse
|
5
|
Shen K, Din AU, Sinha B, Zhou Y, Qian F, Shen B. Translational informatics for human microbiota: data resources, models and applications. Brief Bioinform 2023; 24:7152256. [PMID: 37141135 DOI: 10.1093/bib/bbad168] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 04/07/2023] [Accepted: 04/11/2023] [Indexed: 05/05/2023] Open
Abstract
With the rapid development of human intestinal microbiology and diverse microbiome-related studies and investigations, a large amount of data have been generated and accumulated. Meanwhile, different computational and bioinformatics models have been developed for pattern recognition and knowledge discovery using these data. Given the heterogeneity of these resources and models, we aimed to provide a landscape of the data resources, a comparison of the computational models and a summary of the translational informatics applied to microbiota data. We first review the existing databases, knowledge bases, knowledge graphs and standardizations of microbiome data. Then, the high-throughput sequencing techniques for the microbiome and the informatics tools for their analyses are compared. Finally, translational informatics for the microbiome, including biomarker discovery, personalized treatment and smart healthcare for complex diseases, are discussed.
Collapse
Affiliation(s)
- Ke Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Ahmad Ud Din
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Baivab Sinha
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Yi Zhou
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| | - Fuliang Qian
- Center for Systems Biology, Suzhou Medical College of Soochow University, Suzhou 215123, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Suzhou 215123, China
| | - Bairong Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610212, China
| |
Collapse
|
6
|
Ho SFS, Wheeler NE, Millard AD, van Schaik W. Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data. MICROBIOME 2023; 11:84. [PMID: 37085924 PMCID: PMC10120246 DOI: 10.1186/s40168-023-01533-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
BACKGROUND The prediction of bacteriophage sequences in metagenomic datasets has become a topic of considerable interest, leading to the development of many novel bioinformatic tools. A comparative analysis of ten state-of-the-art phage identification tools was performed to inform their usage in microbiome research. METHODS Artificial contigs generated from complete RefSeq genomes representing phages, plasmids, and chromosomes, and a previously sequenced mock community containing four phage species, were used to evaluate the precision, recall, and F1 scores of the tools. We also generated a dataset of randomly shuffled sequences to quantify false-positive calls. In addition, a set of previously simulated viromes was used to assess diversity bias in each tool's output. RESULTS VIBRANT and VirSorter2 achieved the highest F1 scores (0.93) in the RefSeq artificial contigs dataset, with several other tools also performing well. Kraken2 had the highest F1 score (0.86) in the mock community benchmark by a large margin (0.3 higher than DeepVirFinder in second place), mainly due to its high precision (0.96). Generally, k-mer-based tools performed better than reference similarity tools and gene-based methods. Several tools, most notably PPR-Meta, called a high number of false positives in the randomly shuffled sequences. When analysing the diversity of the genomes that each tool predicted from a virome set, most tools produced a viral genome set that had similar alpha- and beta-diversity patterns to the original population, with Seeker being a notable exception. CONCLUSIONS This study provides key metrics used to assess performance of phage detection tools, offers a framework for further comparison of additional viral discovery tools, and discusses optimal strategies for using these tools. We highlight that the choice of tool for identification of phages in metagenomic datasets, as well as their parameters, can bias the results and provide pointers for different use case scenarios. We have also made our benchmarking dataset available for download in order to facilitate future comparisons of phage identification tools. Video Abstract.
Collapse
Affiliation(s)
- Siu Fung Stanley Ho
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Nicole E. Wheeler
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Andrew D. Millard
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Willem van Schaik
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
7
|
Schackart KE, Graham JB, Ponsero AJ, Hurwitz BL. Evaluation of computational phage detection tools for metagenomic datasets. Front Microbiol 2023; 14:1078760. [PMID: 36760501 PMCID: PMC9902911 DOI: 10.3389/fmicb.2023.1078760] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/09/2023] [Indexed: 01/25/2023] Open
Abstract
Introduction As new computational tools for detecting phage in metagenomes are being rapidly developed, a critical need has emerged to develop systematic benchmarks. Methods In this study, we surveyed 19 metagenomic phage detection tools, 9 of which could be installed and run at scale. Those 9 tools were assessed on several benchmark challenges. Fragmented reference genomes are used to assess the effects of fragment length, low viral content, phage taxonomy, robustness to eukaryotic contamination, and computational resource usage. Simulated metagenomes are used to assess the effects of sequencing and assembly quality on the tool performances. Finally, real human gut metagenomes and viromes are used to assess the differences and similarities in the phage communities predicted by the tools. Results We find that the various tools yield strikingly different results. Generally, tools that use a homology approach (VirSorter, MARVEL, viralVerify, VIBRANT, and VirSorter2) demonstrate low false positive rates and robustness to eukaryotic contamination. Conversely, tools that use a sequence composition approach (VirFinder, DeepVirFinder, Seeker), and MetaPhinder, have higher sensitivity, including to phages with less representation in reference databases. These differences led to widely differing predicted phage communities in human gut metagenomes, with nearly 80% of contigs being marked as phage by at least one tool and a maximum overlap of 38.8% between any two tools. While the results were more consistent among the tools on viromes, the differences in results were still significant, with a maximum overlap of 60.65%. Discussion: Importantly, the benchmark datasets developed in this study are publicly available and reusable to enable the future comparability of new tools developed.
Collapse
Affiliation(s)
- Kenneth E. Schackart
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
| | - Jessica B. Graham
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| | - Alise J. Ponsero
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
- Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Bonnie L. Hurwitz
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
8
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
9
|
Waite DW, Liefting L, Delmiglio C, Chernyavtseva A, Ha HJ, Thompson JR. Development and Validation of a Bioinformatic Workflow for the Rapid Detection of Viruses in Biosecurity. Viruses 2022; 14:v14102163. [PMID: 36298719 PMCID: PMC9610911 DOI: 10.3390/v14102163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/25/2022] [Indexed: 11/05/2022] Open
Abstract
The field of biosecurity has greatly benefited from the widespread adoption of high-throughput sequencing technologies, for its ability to deeply query plant and animal samples for pathogens for which no tests exist. However, the bioinformatics analysis tools designed for rapid analysis of these sequencing datasets are not developed with this application in mind, limiting the ability of diagnosticians to standardise their workflows using published tool kits. We sought to assess previously published bioinformatic tools for their ability to identify plant- and animal-infecting viruses while distinguishing from the host genetic material. We discovered that many of the current generation of virus-detection pipelines are not adequate for this task, being outperformed by more generic classification tools. We created synthetic MinION and HiSeq libraries simulating plant and animal infections of economically important viruses and assessed a series of tools for their suitability for rapid and accurate detection of infection, and further tested the top performing tools against the VIROMOCK Challenge dataset to ensure that our findings were reproducible when compared with international standards. Our work demonstrated that several methods provide sensitive and specific detection of agriculturally important viruses in a timely manner and provides a key piece of ground truthing for method development in this space.
Collapse
Affiliation(s)
- David W. Waite
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
- Correspondence:
| | - Lia Liefting
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
| | - Catia Delmiglio
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
| | | | - Hye Jeong Ha
- Animal Health Laboratory, Ministry for Primary Industries, Upper Hutt 5018, New Zealand
| | - Jeremy R. Thompson
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
| |
Collapse
|
10
|
Chitcharoen S, Sivapornnukul P, Payungporn S. Revolutionized virome research using systems microbiology approaches. Exp Biol Med (Maywood) 2022; 247:1135-1147. [PMID: 35723062 PMCID: PMC9335507 DOI: 10.1177/15353702221102895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Currently, both pathogenic and commensal viruses are continuously being discovered and acknowledged as ubiquitous components of microbial communities. The advancements of systems microbiological approaches have changed the face of virome research. Here, we focus on viral metagenomic approach to study virus community and their interactions with other microbial members as well as their hosts. This review also summarizes challenges, limitations, and benefits of the current virome approaches. Potentially, the studies of virome can be further applied in various biological and clinical fields.
Collapse
Affiliation(s)
- Suwalak Chitcharoen
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand,Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Pavaret Sivapornnukul
- Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sunchai Payungporn
- Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Sunchai Payungporn.
| |
Collapse
|
11
|
virMine 2.0: Identifying Viral Sequences in Microbial Communities. Microbiol Resour Announc 2022; 11:e0010722. [PMID: 35499341 PMCID: PMC9119091 DOI: 10.1128/mra.00107-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Here, we present virMine 2.0, the next generation of the virMine software tool. virMine 2.0 uses an exclusion technique to remove nonviral data from sequencing reads and scores the remaining data based on relatedness to viral elements, eliminating the sole dependency on homology identification.
Collapse
|
12
|
Salabura A, Łuniewski A, Kucharska M, Myszak D, Dołęgowska B, Ciechanowski K, Kędzierska-Kapuza K, Wojciuk B. Urinary Tract Virome as an Urgent Target for Metagenomics. Life (Basel) 2021; 11:life11111264. [PMID: 34833140 PMCID: PMC8618529 DOI: 10.3390/life11111264] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 11/13/2021] [Accepted: 11/15/2021] [Indexed: 12/19/2022] Open
Abstract
Virome—a part of a microbiome—is a term used to describe all viruses found in the specific organism or system. Recently, as new technologies emerged, it has been confirmed that kidneys and the lower urinary tract are colonized not only by the previously described viruses, but also completely novel species. Viruses can be both pathogenic and protective, as they often carry important virulence factors, while at the same time represent anti-inflammatory functions. This paper aims to show and compare the viral species detected in various, specific clinical conditions. Because of the unique characteristics of viruses, new sequencing techniques and databases had to be developed to conduct research on the urinary virome. The dynamic development of research on the human microbiome suggests that the detailed studies on the urinary system virome will provide answers to many questions about the risk factors for civilization, cancer, and autoimmune diseases.
Collapse
Affiliation(s)
- Agata Salabura
- Clinic of Nephrology, Internal Medicine and Transplantation, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland;
- Correspondence: ; Tel.: +48-664-477-450
| | - Aleksander Łuniewski
- Department of Immunological Diagnostics, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland; (A.Ł.); (M.K.); (D.M.); (B.D.); (B.W.)
| | - Maria Kucharska
- Department of Immunological Diagnostics, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland; (A.Ł.); (M.K.); (D.M.); (B.D.); (B.W.)
| | - Denis Myszak
- Department of Immunological Diagnostics, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland; (A.Ł.); (M.K.); (D.M.); (B.D.); (B.W.)
| | - Barbara Dołęgowska
- Department of Immunological Diagnostics, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland; (A.Ł.); (M.K.); (D.M.); (B.D.); (B.W.)
| | - Kazimierz Ciechanowski
- Clinic of Nephrology, Internal Medicine and Transplantation, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland;
| | - Karolina Kędzierska-Kapuza
- Center of Postgraduate Medical Education in Warsaw, Department of Gastroenterological Surgery and Transplantology, 137 Wołoska St., 02-507 Warsaw, Poland;
| | - Bartosz Wojciuk
- Department of Immunological Diagnostics, Pomeranian Medical University in Szczecin, 70-123 Szczecin, Poland; (A.Ł.); (M.K.); (D.M.); (B.D.); (B.W.)
| |
Collapse
|
13
|
Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples. Viruses 2021; 13:v13102006. [PMID: 34696436 PMCID: PMC8541124 DOI: 10.3390/v13102006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/30/2021] [Accepted: 10/02/2021] [Indexed: 12/27/2022] Open
Abstract
According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.
Collapse
|
14
|
Wu S, Fang Z, Tan J, Li M, Wang C, Guo Q, Xu C, Jiang X, Zhu H. DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience 2021; 10:giab056. [PMID: 34498685 PMCID: PMC8427542 DOI: 10.1093/gigascience/giab056] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment. FINDINGS DeePhage uses a "one-hot" encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. CONCLUSIONS DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.
Collapse
Affiliation(s)
- Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Jie Tan
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Mo Li
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Chunhui Wang
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Congmin Xu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
- Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, Beijing, China
| |
Collapse
|
15
|
Glickman C, Hendrix J, Strong M. Simulation study and comparative evaluation of viral contiguous sequence identification tools. BMC Bioinformatics 2021; 22:329. [PMID: 34130621 PMCID: PMC8207588 DOI: 10.1186/s12859-021-04242-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 06/04/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Viruses, including bacteriophages, are important components of environmental and human associated microbial communities. Viruses can act as extracellular reservoirs of bacterial genes, can mediate microbiome dynamics, and can influence the virulence of clinical pathogens. Various targeted metagenomic analysis techniques detect viral sequences, but these methods often exclude large and genome integrated viruses. In this study, we evaluate and compare the ability of nine state-of-the-art bioinformatic tools, including Vibrant, VirSorter, VirSorter2, VirFinder, DeepVirFinder, MetaPhinder, Kraken 2, Phybrid, and a BLAST search using identified proteins from the Earth Virome Pipeline to identify viral contiguous sequences (contigs) across simulated metagenomes with different read distributions, taxonomic compositions, and complexities. RESULTS Of the tools tested in this study, VirSorter achieved the best F1 score while Vibrant had the highest average F1 score at predicting integrated prophages. Though less balanced in its precision and recall, Kraken2 had the highest average precision by a substantial margin. We introduced the machine learning tool, Phybrid, which demonstrated an improvement in average F1 score over tools such as MetaPhinder. The tool utilizes machine learning with both gene content and nucleotide features. The addition of nucleotide features improves the precision and recall compared to the gene content features alone.Viral identification by all tools was not impacted by underlying read distribution but did improve with contig length. Tool performance was inversely related to taxonomic complexity and varied by the phage host. For instance, Rhizobium and Enterococcus phages were identified consistently by the tools; whereas, Neisseria prophage sequences were commonly missed in this study. CONCLUSION This study benchmarked the performance of nine state-of-the-art bioinformatic tools to identify viral contigs across different simulation conditions. This study explored the ability of the tools to identify integrated prophage elements traditionally excluded from targeted sequencing approaches. Our comprehensive analysis of viral identification tools to assess their performance in a variety of situations provides valuable insights to viral researchers looking to mine viral elements from publicly available metagenomic data.
Collapse
Affiliation(s)
- Cody Glickman
- Center for Genes, Environment, and Health, National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA.
- Computational Bioscience, University of Colorado Anschutz, 12801 E 17th Avenue, Aurora, CO, 80045, USA.
| | - Jo Hendrix
- Center for Genes, Environment, and Health, National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
- Computational Bioscience, University of Colorado Anschutz, 12801 E 17th Avenue, Aurora, CO, 80045, USA
| | - Michael Strong
- Center for Genes, Environment, and Health, National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
- Computational Bioscience, University of Colorado Anschutz, 12801 E 17th Avenue, Aurora, CO, 80045, USA
| |
Collapse
|
16
|
Boeckaerts D, Stock M, Criel B, Gerstmans H, De Baets B, Briers Y. Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins. Sci Rep 2021; 11:1467. [PMID: 33446856 PMCID: PMC7809048 DOI: 10.1038/s41598-021-81063-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 12/30/2020] [Indexed: 12/04/2022] Open
Abstract
Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.
Collapse
Affiliation(s)
- Dimitri Boeckaerts
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Bjorn Criel
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Hans Gerstmans
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven, Belgium
- MeBioS-Biosensors group, Department of BioSystems, KU Leuven, Leuven, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yves Briers
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.
| |
Collapse
|
17
|
Plyusnin I, Kant R, Jääskeläinen AJ, Sironen T, Holm L, Vapalahti O, Smura T. Novel NGS pipeline for virus discovery from a wide spectrum of hosts and sample types. Virus Evol 2020; 6:veaa091. [PMID: 33408878 PMCID: PMC7772471 DOI: 10.1093/ve/veaa091] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The study of the microbiome data holds great potential for elucidating the biological and metabolic functioning of living organisms and their role in the environment. Metagenomic analyses have shown that humans, along with for example, domestic animals, wildlife and arthropods, are colonized by an immense community of viruses. The current Coronavirus pandemic (COVID-19) heightens the need to rapidly detect previously unknown viruses in an unbiased way. The increasing availability of metagenomic data in this era of next-generation sequencing (NGS), along with increasingly affordable sequencing technologies, highlight the need for reliable and comprehensive methods to manage such data. In this article, we present a novel bioinformatics pipeline called LAZYPIPE for identifying both previously known and novel viruses in host associated or environmental samples and give examples of virus discovery based on it. LAZYPIPE is a Unix-based pipeline for automated assembling and taxonomic profiling of NGS libraries implemented as a collection of C++, Perl, and R scripts.
Collapse
Affiliation(s)
- Ilya Plyusnin
- Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland
| | - Ravi Kant
- Department of Veterinary Bioscience, University of Helsinki, Helsinki 00014, Finland
| | - Anne J Jääskeläinen
- Department of Virology and Immunology, University of Helsinki and Helsinki University Hospital, Helsinki 00014, Finland
| | - Tarja Sironen
- Department of Veterinary Bioscience, University of Helsinki, Helsinki 00014, Finland
| | - Liisa Holm
- Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland
| | - Olli Vapalahti
- Department of Veterinary Bioscience, University of Helsinki, Helsinki 00014, Finland
| | - Teemu Smura
- Department of Virology, University of Helsinki, Helsinki 00014, Finland
| |
Collapse
|
18
|
Yan A, Butcher J, Mack D, Stintzi A. Virome Sequencing of the Human Intestinal Mucosal-Luminal Interface. Front Cell Infect Microbiol 2020; 10:582187. [PMID: 33194818 PMCID: PMC7642909 DOI: 10.3389/fcimb.2020.582187] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 09/14/2020] [Indexed: 01/12/2023] Open
Abstract
While the human gut virome has been increasingly explored in recent years, nearly all studies have been limited to fecal sampling. The mucosal-luminal interface has been established as a viable sample type for profiling the microbial biogeography of the gastrointestinal tract. We have developed a protocol to extract nucleic acids from viruses at the mucosal-luminal interface of the proximal and distal colon. Colonic viromes from pediatric patients with Crohn's disease demonstrated high interpatient diversity and low but significant intrapatient variation between sites. Whole metagenomics was also performed to explore virome-bacteriome interactions and to compare the viral communities observed in virome and whole metagenomic sequencing. A site-specific study of the human gut virome is a necessary step to advance our understanding of virome-bacteriome-host interactions in human diseases.
Collapse
Affiliation(s)
- Austin Yan
- Department of Biochemistry, Microbiology, and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
| | - James Butcher
- Department of Biochemistry, Microbiology, and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
| | - David Mack
- Department of Pediatrics, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.,Inflammatory Bowel Disease Centre and CHEO Research Institute, Children's Hospital of Eastern Ontario, Ottawa, ON, Canada
| | - Alain Stintzi
- Department of Biochemistry, Microbiology, and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
19
|
Saak CC, Dinh CB, Dutton RJ. Experimental approaches to tracking mobile genetic elements in microbial communities. FEMS Microbiol Rev 2020; 44:606-630. [PMID: 32672812 PMCID: PMC7476777 DOI: 10.1093/femsre/fuaa025] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 06/29/2020] [Indexed: 12/19/2022] Open
Abstract
Horizontal gene transfer is an important mechanism of microbial evolution and is often driven by the movement of mobile genetic elements between cells. Due to the fact that microbes live within communities, various mechanisms of horizontal gene transfer and types of mobile elements can co-occur. However, the ways in which horizontal gene transfer impacts and is impacted by communities containing diverse mobile elements has been challenging to address. Thus, the field would benefit from incorporating community-level information and novel approaches alongside existing methods. Emerging technologies for tracking mobile elements and assigning them to host organisms provide promise for understanding the web of potential DNA transfers in diverse microbial communities more comprehensively. Compared to existing experimental approaches, chromosome conformation capture and methylome analyses have the potential to simultaneously study various types of mobile elements and their associated hosts. We also briefly discuss how fermented food microbiomes, given their experimental tractability and moderate species complexity, make ideal models to which to apply the techniques discussed herein and how they can be used to address outstanding questions in the field of horizontal gene transfer in microbial communities.
Collapse
Affiliation(s)
- Christina C Saak
- Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Cong B Dinh
- Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Rachel J Dutton
- Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
20
|
Su X, Jing G, Zhang Y, Wu S. Method development for cross-study microbiome data mining: Challenges and opportunities. Comput Struct Biotechnol J 2020; 18:2075-2080. [PMID: 32802279 PMCID: PMC7419250 DOI: 10.1016/j.csbj.2020.07.020] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/22/2020] [Accepted: 07/24/2020] [Indexed: 01/26/2023] Open
Abstract
During the past decade, tremendous amount of microbiome sequencing data has been generated to study on the dynamic associations between microbial profiles and environments. How to precisely and efficiently decipher large-scale of microbiome data and furtherly take advantages from it has become one of the most essential bottlenecks for microbiome research at present. In this mini-review, we focus on the three key steps of analyzing cross-study microbiome datasets, including microbiome profiling, data integrating and data mining. By introducing the current bioinformatics approaches and discussing their limitations, we prospect the opportunities in development of computational methods for the three steps, and propose the promising solutions to multi-omics data analysis for comprehensive understanding and rapid investigation of microbiome from different angles, which could potentially promote the data-driven research by providing a broader view of the "microbiome data space".
Collapse
Affiliation(s)
- Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071 China
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101 China
| | - Gongchao Jing
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101 China
| | - Yufeng Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071 China
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101 China
| | - Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071 China
| |
Collapse
|
21
|
Khan Mirzaei M, Xue J, Costa R, Ru J, Schulz S, Taranu ZE, Deng L. Challenges of Studying the Human Virome - Relevant Emerging Technologies. Trends Microbiol 2020; 29:171-181. [PMID: 32622559 DOI: 10.1016/j.tim.2020.05.021] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/27/2020] [Accepted: 05/28/2020] [Indexed: 01/17/2023]
Abstract
In this review we provide an overview of current challenges and advances in bacteriophage research within the growing field of viromics. In particular, we discuss, from a human virome study perspective, the current and emerging technologies available, their limitations in terms of de novo discoveries, and possible solutions to overcome present experimental and computational biases associated with low abundance of viral DNA or RNA. We summarize recent breakthroughs in metagenomics assembling tools and single-cell analysis, which have the potential to increase our understanding of phage biology, diversity, and interactions with both the microbial community and the human body. We expect that these recent and future advances in the field of viromics will have a strong impact on how we develop phage-based therapeutic approaches.
Collapse
Affiliation(s)
- Mohammadali Khan Mirzaei
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Jinling Xue
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Rita Costa
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Jinlong Ru
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Sarah Schulz
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany
| | - Zofia E Taranu
- Aquatic Contaminants Research Division (ACRD), Environment and Climate Change Canada (ECCC), Montréal, QC H2Y 2E7, Canada
| | - Li Deng
- Institute of Virology, Helmholtz Centre Munich and Technical University of Munich, Neuherberg, Bavaria 85764, Germany.
| |
Collapse
|
22
|
Khot V, Strous M, Hawley AK. Computational approaches in viral ecology. Comput Struct Biotechnol J 2020; 18:1605-1612. [PMID: 32670501 PMCID: PMC7334295 DOI: 10.1016/j.csbj.2020.06.019] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 01/21/2023] Open
Abstract
Dynamic virus-host interactions play a critical role in regulating microbial community structure and function. Yet for decades prior to the genomics era, viruses were largely overlooked in microbial ecology research, as only low-throughput culture-based methods of discovering viruses were available. With the advent of metagenomics, culture-independent techniques have provided exciting opportunities to discover and study new viruses. Here, we review recently developed computational methods for identifying viral sequences, exploring viral diversity in environmental samples, and predicting hosts from metagenomic sequence data. Methods to analyze viruses in silico utilize unconventional approaches to tackle challenges unique to viruses, such as vast diversity, mosaic viral genomes, and the lack of universal marker genes. As the field of viral ecology expands exponentially, computational advances have become increasingly important to gain insight into the role viruses in diverse habitats.
Collapse
Affiliation(s)
- Varada Khot
- Department of Geoscience, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Marc Strous
- Department of Geoscience, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Alyse K. Hawley
- Department of Geoscience, University of Calgary, Calgary, AB T2N 1N4, Canada
| |
Collapse
|
23
|
Metaviral
SPAdes: assembly of viruses from metagenomic data. Bioinformatics 2020; 36:4126-4129. [DOI: 10.1093/bioinformatics/btaa490] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 02/21/2020] [Accepted: 05/08/2020] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Although the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth’s virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies.
Results
We describe a MetaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked MetaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models and demonstrated that it improves on the state-of-the-art viral identification pipelines.
Availability and implementation
Metaviral SPAdes includes ViralAssembly, ViralVerify and ViralComplete modules that are available as standalone packages: https://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ and https://github.com/ablab/viralComplete/.
Contact
d.antipov@spbu.ru
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
|
24
|
Xavier JB, Young VB, Skufca J, Ginty F, Testerman T, Pearson AT, Macklin P, Mitchell A, Shmulevich I, Xie L, Caporaso JG, Crandall KA, Simone NL, Godoy-Vitorino F, Griffin TJ, Whiteson KL, Gustafson HH, Slade DJ, Schmidt TM, Walther-Antonio MRS, Korem T, Webb-Robertson BJM, Styczynski MP, Johnson WE, Jobin C, Ridlon JM, Koh AY, Yu M, Kelly L, Wargo JA. The Cancer Microbiome: Distinguishing Direct and Indirect Effects Requires a Systemic View. Trends Cancer 2020; 6:192-204. [PMID: 32101723 PMCID: PMC7098063 DOI: 10.1016/j.trecan.2020.01.004] [Citation(s) in RCA: 140] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 12/29/2019] [Accepted: 01/06/2020] [Indexed: 02/06/2023]
Abstract
The collection of microbes that live in and on the human body - the human microbiome - can impact on cancer initiation, progression, and response to therapy, including cancer immunotherapy. The mechanisms by which microbiomes impact on cancers can yield new diagnostics and treatments, but much remains unknown. The interactions between microbes, diet, host factors, drugs, and cell-cell interactions within the cancer itself likely involve intricate feedbacks, and no single component can explain all the behavior of the system. Understanding the role of host-associated microbial communities in cancer systems will require a multidisciplinary approach combining microbial ecology, immunology, cancer cell biology, and computational biology - a systems biology approach.
Collapse
Affiliation(s)
- Joao B Xavier
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
| | - Vincent B Young
- Department of Internal Medicine, Division of Infectious Diseases, The University of Michigan Medical School, Ann Arbor, MI, USA
| | - Joseph Skufca
- Department of Mathematics, Clarkson University, Potsdam, NY, USA
| | | | - Traci Testerman
- Department of Pathology, Microbiology, and Immunology, University of South Carolina School of Medicine, Columbia, SC, USA
| | - Alexander T Pearson
- Section of Hematology/Oncology, Department of Medicine, Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, IL, USA
| | - Paul Macklin
- Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA
| | - Amir Mitchell
- Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | | | - Lei Xie
- Hunter College, Department of Computer Science, New York, NY, USA
| | - J Gregory Caporaso
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Nicole L Simone
- Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, USA
| | - Filipa Godoy-Vitorino
- Department of Microbiology and Medical Zoology, School of Medicine, University of Puerto Rico, San Juan, Puerto Rico
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Katrine L Whiteson
- Department of Molecular Biology and Biochemistry, University of California Irvine, Irvine, CA, USA
| | - Heather H Gustafson
- Seattle Children's Research Institute, Ben Towne Center for Childhood Cancer Research, Seattle, WA, USA
| | - Daniel J Slade
- Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | | | - Marina R S Walther-Antonio
- Department of Surgery, Department of Obstetrics and Gynecology, and Microbiome Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Tal Korem
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Mark P Styczynski
- School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - W Evan Johnson
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Christian Jobin
- Departments of Medicine, Anatomy, and Cell Biology, and of Infectious Diseases and Immunology, University of Florida, Gainesville, FL, USA
| | - Jason M Ridlon
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew Y Koh
- University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Michael Yu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | | | - Jennifer A Wargo
- The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
25
|
Du XP, Cai ZH, Zuo P, Meng FX, Zhu JM, Zhou J. Temporal Variability of Virioplankton during a Gymnodinium catenatum Algal Bloom. Microorganisms 2020; 8:microorganisms8010107. [PMID: 31940944 PMCID: PMC7023004 DOI: 10.3390/microorganisms8010107] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 12/18/2019] [Accepted: 01/10/2020] [Indexed: 01/02/2023] Open
Abstract
Viruses are key biogeochemical engines in the regulation of the dynamics of phytoplankton. However, there has been little research on viral communities in relation to algal blooms. Using the virMine tool, we analyzed viral information from metagenomic data of field dinoflagellate (Gymnodinium catenatum) blooms at different stages. Species identification indicated that phages were the main species. Unifrac analysis showed clear temporal patterns in virioplankton dynamics. The viral community was dominated by Siphoviridae, Podoviridae, and Myoviridae throughout the whole bloom cycle. However, some changes were observed at different phases of the bloom; the relatively abundant Siphoviridae and Myoviridae dominated at pre-bloom and peak bloom stages, while at the post-bloom stage, the members of Phycodnaviridae and Microviridae were more abundant. Temperature and nutrients were the main contributors to the dynamic structure of the viral community. Some obvious correlations were found between dominant viral species and host biomass. Functional analysis indicated some functional genes had dramatic response in algal-associated viral assemblages, especially the CAZyme encoding genes. This work expands the existing knowledge of algal-associated viruses by characterizing viral composition and function across a complete algal bloom cycle. Our data provide supporting evidence that viruses participate in dinoflagellate bloom dynamics under natural conditions.
Collapse
Affiliation(s)
- Xiao-Peng Du
- The Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Zhong-Hua Cai
- The Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Ping Zuo
- The School of Geography and Ocean Science, Nanjing University, Nanjing 210000, China;
| | - Fan-Xu Meng
- Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310000, China
| | - Jian-Ming Zhu
- The Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Jin Zhou
- The Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
- Correspondence:
| |
Collapse
|
26
|
Douglas GM, Langille MGI. Current and Promising Approaches to Identify Horizontal Gene Transfer Events in Metagenomes. Genome Biol Evol 2019; 11:2750-2766. [PMID: 31504488 PMCID: PMC6777429 DOI: 10.1093/gbe/evz184] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2019] [Indexed: 12/16/2022] Open
Abstract
High-throughput shotgun metagenomics sequencing has enabled the profiling of myriad natural communities. These data are commonly used to identify gene families and pathways that were potentially gained or lost in an environment and which may be involved in microbial adaptation. Despite the widespread interest in these events, there are no established best practices for identifying gene gain and loss in metagenomics data. Horizontal gene transfer (HGT) represents several mechanisms of gene gain that are especially of interest in clinical microbiology due to the rapid spread of antibiotic resistance genes in natural communities. Several additional mechanisms of gene gain and loss, including gene duplication, gene loss-of-function events, and de novo gene birth are also important to consider in the context of metagenomes but have been less studied. This review is largely focused on detecting HGT in prokaryotic metagenomes, but methods for detecting these other mechanisms are first discussed. For this article to be self-contained, we provide a general background on HGT and the different possible signatures of this process. Lastly, we discuss how improved assembly of genomes from metagenomes would be the most straight-forward approach for improving the inference of gene gain and loss events. Several recent technological advances could help improve metagenome assemblies: long-read sequencing, determining the physical proximity of contigs, optical mapping of short sequences along chromosomes, and single-cell metagenomics. The benefits and limitations of these advances are discussed and open questions in this area are highlighted.
Collapse
Affiliation(s)
- Gavin M Douglas
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Morgan G I Langille
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|