51
|
Pratama AA, Bolduc B, Zayed AA, Zhong ZP, Guo J, Vik DR, Gazitúa MC, Wainaina JM, Roux S, Sullivan MB. Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation. PeerJ 2021; 9:e11447. [PMID: 34178438 PMCID: PMC8210812 DOI: 10.7717/peerj.11447] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 04/22/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). RESULTS The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. CONCLUSION Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses 'hidden' in diverse sequence datasets.
Collapse
Affiliation(s)
- Akbar Adjie Pratama
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Benjamin Bolduc
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Ahmed A. Zayed
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Zhi-Ping Zhong
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, United States of America
| | - Jiarong Guo
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Dean R. Vik
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | | | - James M. Wainaina
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
- Infectious Diseases Institute at The Ohio State University, Ohio State University, Columbus, OH, United States of America
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Matthew B. Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
- Environmental and Geodetic Engineering, Ohio State University, Department of Civil, Columbus, OH, United States of America
| |
Collapse
|
52
|
Global overview and major challenges of host prediction methods for uncultivated phages. Curr Opin Virol 2021; 49:117-126. [PMID: 34126465 DOI: 10.1016/j.coviro.2021.05.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 05/20/2021] [Accepted: 05/22/2021] [Indexed: 12/14/2022]
Abstract
Bacterial communities play critical roles across all of Earth's biomes, affecting human health and global ecosystem functioning. They do so under strong constraints exerted by viruses, that is, bacteriophages or 'phages'. Phages can reshape bacterial communities' structure, influence long-term evolution of bacterial populations, and alter host cell metabolism during infection. Metagenomics approaches, that is, shotgun sequencing of environmental DNA or RNA, recently enabled large-scale exploration of phage genomic diversity, yielding several millions of phage genomes now to be further analyzed and characterized. One major challenge however is the lack of direct host information for these phages. Several methods and tools have been proposed to bioinformatically predict the potential host(s) of uncultivated phages based only on genome sequence information. Here we review these different approaches and highlight their distinct strengths and limitations. We also outline complementary experimental assays which are being proposed to validate and refine these bioinformatic predictions.
Collapse
|