51
|
Versoza CJ, Howell AA, Aftab T, Blanco M, Brar A, Chaffee E, Howell N, Leach W, Lobatos J, Luca M, Maddineni M, Mirji R, Mitra C, Strasser M, Munig S, Patel Z, So M, Sy M, Weiss S, Pfeifer SP. Comparative Genomics of Closely-Related Gordonia Cluster DR Bacteriophages. Viruses 2022; 14:v14081647. [PMID: 36016269 PMCID: PMC9413003 DOI: 10.3390/v14081647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/16/2022] [Accepted: 07/25/2022] [Indexed: 12/10/2022] Open
Abstract
Bacteriophages infecting bacteria of the genus Gordonia have increasingly gained interest in the scientific community for their diverse applications in agriculture, biotechnology, and medicine, ranging from biocontrol agents in wastewater management to the treatment of opportunistic pathogens in pulmonary disease patients. However, due to the time and costs associated with experimental isolation and cultivation, host ranges for many bacteriophages remain poorly characterized, hindering a more efficient usage of bacteriophages in these areas. Here, we perform a series of computational genomic inferences to predict the putative host ranges of all Gordonia cluster DR bacteriophages known to date. Our analyses suggest that BiggityBass (as well as several of its close relatives) is likely able to infect host bacteria from a wide range of genera—from Gordonia to Nocardia to Rhodococcus, making it a suitable candidate for future phage therapy and wastewater treatment strategies.
Collapse
Affiliation(s)
- Cyril J. Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Abigail A. Howell
- Biodesign Institute, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Tanya Aftab
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Madison Blanco
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Akarshi Brar
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Elaine Chaffee
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Nicholas Howell
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Willow Leach
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Jackelyn Lobatos
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Michael Luca
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Meghna Maddineni
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Ruchira Mirji
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Corinne Mitra
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Maria Strasser
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Saige Munig
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Zeel Patel
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Minerva So
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Makena Sy
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Sarah Weiss
- School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; (T.A.); (M.B.); (A.B.); (E.C.); (N.H.); (J.L.); (M.L.); (R.M.); (C.M.); (M.S.); (S.M.); (Z.P.); (M.S.); (M.S.); (S.W.)
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA;
- Correspondence:
| |
Collapse
|
52
|
Mining of Thousands of Prokaryotic Genomes Reveals High Abundance of Prophages with a Strictly Narrow Host Range. mSystems 2022; 7:e0032622. [PMID: 35880895 PMCID: PMC9426530 DOI: 10.1128/msystems.00326-22] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Phages and prophages are one of the principal modulators of microbial populations. However, much of their diversity is still poorly understood. Here, we extracted 33,624 prophages from 13,713 complete prokaryotic genomes to explore the prophage diversity and their relationships with their host. Our results reveal that prophages were present in 75% of the genomes studied. In addition, Enterobacterales were significantly enriched in prophages. We also found that pathogens are a significant reservoir of prophages. Finally, we determined that the prophage relatedness and the range of genomic hosts were delimited by the evolutionary relationships of their hosts. On a broader level, we got insights into the prophage population, identified in thousands of publicly available prokaryotic genomes, by comparing the prophage distribution and relatedness between them and their hosts. IMPORTANCE Phages and prophages play an essential role in controlling their host populations either by modulating the host abundance or providing them with genes that benefit the host. The constant growth in next-generation sequencing technology has caused the development of powerful computational tools to identify phages and prophages with high precision. Making it possible to explore the prophage populations integrated into host genomes on a large scale. However, it is still a new and under-explored area, and efforts are still required to identify prophage populations to understand their dynamics with their hosts.
Collapse
|
53
|
Historical contingencies and phage induction diversify bacterioplankton communities at the microscale. Proc Natl Acad Sci U S A 2022; 119:e2117748119. [PMID: 35862452 PMCID: PMC9335236 DOI: 10.1073/pnas.2117748119] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
In many natural environments, microorganisms decompose microscale resource patches made of complex organic matter. The growth and collapse of populations on these resource patches unfold within spatial ranges of a few hundred micrometers or less, making such microscale ecosystems hotspots of heterotrophic metabolism. Despite the potential importance of patch-level dynamics for the large-scale functioning of heterotrophic microbial communities, we have not yet been able to delineate the ecological processes that control natural populations at the microscale. Here, we address this challenge by characterizing the natural marine communities that assembled on over 1,000 individual microscale particles of chitin, the most abundant marine polysaccharide. Using low-template shotgun metagenomics and imaging, we find significant variation in microscale community composition despite the similarity in initial species pools across replicates. Chitin-degrading taxa that were rare in seawater established large populations on a subset of particles, resulting in a wide range of predicted chitinolytic abilities and biomass at the level of individual particles. We show, through a mathematical model, that this variability can be attributed to stochastic colonization and historical contingencies affecting the tempo of growth on particles. We find evidence that one biological process leading to such noisy growth across particles is differential predation by temperate bacteriophages of chitin-degrading strains, the keystone members of the community. Thus, initial stochasticity in assembly states on individual particles, amplified through ecological interactions, may have significant consequences for the diversity and functionality of systems of microscale patches.
Collapse
|
54
|
Li J, Yang F, Xiao M, Li A. Advances and challenges in cataloging the human gut virome. Cell Host Microbe 2022; 30:908-916. [PMID: 35834962 DOI: 10.1016/j.chom.2022.06.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 06/02/2022] [Accepted: 06/07/2022] [Indexed: 11/17/2022]
Abstract
The human gut virome, which is often referred to as the "dark matter" of the gut microbiome, remains understudied. A better understanding of the composition and variations of the gut virome across populations is critical for exploring its impact on diseases and health. A series of advances in the characterization of human gut virome have unveiled high genetic diversity and various functional potentials of gut viruses. Here, we summarize the recently available human gut virome databases and discuss their features, procedures, and challenges with the intention to provide a reference to researchers to use while choosing a profiling database. We also propose a "best practice" for cataloging the viral population.
Collapse
Affiliation(s)
- Junhua Li
- BGI-Shenzhen, Shenzhen 518083, China; Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China.
| | | | - Minfeng Xiao
- BGI-Shenzhen, Shenzhen 518083, China; Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China.
| | - Aixin Li
- BGI-Shenzhen, Shenzhen 518083, China; Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China
| |
Collapse
|
55
|
Chu Y, Zhao Z, Cai L, Zhang G. Viral diversity and biogeochemical potential revealed in different prawn-culture sediments by virus-enriched metagenome analysis. ENVIRONMENTAL RESEARCH 2022; 210:112901. [PMID: 35227678 DOI: 10.1016/j.envres.2022.112901] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 02/01/2022] [Accepted: 02/03/2022] [Indexed: 06/14/2023]
Abstract
As the most numerous biological entities on Earth, viruses affect the microbial dynamics, metabolism and biogeochemical cycles in the aquatic ecosystems. Viral diversity and functions in ocean have been relatively well studied, but our understanding of viruses in mariculture systems is limited. To fill this knowledge gap, we studied viral diversity and potential biogeochemical impacts of sediments from four different prawn-mariculture ecosystems (mono-culture of prawn and poly-culture of prawn with jellyfish, sea cucumber, and clam) using a metagenomic approach with prior virus-like particles (VLPs) separation. We found that the order Caudovirales was the predominant viral category and accounted for the most volume (78.39% of classified viruses). Sediment viruses were verified to have a high diversity by using the construct phylogenetic tree of terL gene, with three potential novel clades being identified. Meanwhile, compared with viruses inhabiting other ecosystems based on gene-sharing network, our results revealed that mariculture sediments harbored considerable unexplored viral diversity and that maricultural species were potentially important drivers of the viral community structure. Notably, viral auxiliary metabolic genes were identified and suggested that viruses influence carbon and sulfur cycling, as well as cofactors/vitamins and amino acid metabolism, which indirectly participate in biogeochemical cycling. Overall, our findings revealed the genomic diversity and ecological function of viral communities in prawn mariculture sediments, and suggested the role of viruses in microbial ecology and biogeochemistry.
Collapse
Affiliation(s)
- Yunmeng Chu
- Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, 361021, Fujian, China
| | - Zelong Zhao
- Shanghai BIOZERON Biotechnology Co., Ltd., Shanghai, 201800, China
| | - Lixi Cai
- Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, 361021, Fujian, China; Faculty of Basic Medicine, Putian University, Putian, 351100, Fujian, China
| | - Guangya Zhang
- Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, 361021, Fujian, China.
| |
Collapse
|
56
|
Andrade-Martínez JS, Camelo Valera LC, Chica Cárdenas LA, Forero-Junco L, López-Leal G, Moreno-Gallego JL, Rangel-Pineros G, Reyes A. Computational Tools for the Analysis of Uncultivated Phage Genomes. Microbiol Mol Biol Rev 2022; 86:e0000421. [PMID: 35311574 PMCID: PMC9199400 DOI: 10.1128/mmbr.00004-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Over a century of bacteriophage research has uncovered a plethora of fundamental aspects of their biology, ecology, and evolution. Furthermore, the introduction of community-level studies through metagenomics has revealed unprecedented insights on the impact that phages have on a range of ecological and physiological processes. It was not until the introduction of viral metagenomics that we began to grasp the astonishing breadth of genetic diversity encompassed by phage genomes. Novel phage genomes have been reported from a diverse range of biomes at an increasing rate, which has prompted the development of computational tools that support the multilevel characterization of these novel phages based solely on their genome sequences. The impact of these technologies has been so large that, together with MAGs (Metagenomic Assembled Genomes), we now have UViGs (Uncultivated Viral Genomes), which are now officially recognized by the International Committee for the Taxonomy of Viruses (ICTV), and new taxonomic groups can now be created based exclusively on genomic sequence information. Even though the available tools have immensely contributed to our knowledge of phage diversity and ecology, the ongoing surge in software programs makes it challenging to keep up with them and the purpose each one is designed for. Therefore, in this review, we describe a comprehensive set of currently available computational tools designed for the characterization of phage genome sequences, focusing on five specific analyses: (i) assembly and identification of phage and prophage sequences, (ii) phage genome annotation, (iii) phage taxonomic classification, (iv) phage-host interaction analysis, and (v) phage microdiversity.
Collapse
Affiliation(s)
- Juan Sebastián Andrade-Martínez
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Carolina Camelo Valera
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Luis Alberto Chica Cárdenas
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Forero-Junco
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Plant and Environmental Science, University of Copenhagen, Frederiksberg, Denmark
| | - Gamaliel López-Leal
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - J. Leonardo Moreno-Gallego
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Guillermo Rangel-Pineros
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
57
|
Zhou F, Gan R, Zhang F, Ren C, Yu L, Si Y, Huang Z. PHISDetector: A Tool to Detect Diverse In Silico Phage-host Interaction Signals for Virome Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:508-523. [PMID: 35272051 PMCID: PMC9801046 DOI: 10.1016/j.gpb.2022.02.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/22/2021] [Accepted: 02/28/2022] [Indexed: 01/26/2023]
Abstract
Phage-microbe interactions are appealing systems to study coevolution, and have also been increasingly emphasized due to their roles in human health, disease, and the development of novel therapeutics. Phage-microbe interactions leave diverse signals in bacterial and phage genomic sequences, defined as phage-host interaction signals (PHISs), which include clustered regularly interspaced short palindromic repeats (CRISPR) targeting, prophage, and protein-protein interaction signals. In the present study, we developed a novel tool phage-host interaction signal detector (PHISDetector) to predict phage-host interactions by detecting and integrating diverse in silico PHISs, and scoring the probability of phage-host interactions using machine learning models based on PHIS features. We evaluated the performance of PHISDetector on multiple benchmark datasets and application cases. When tested on a dataset of 758 annotated phage-host pairs, PHISDetector yields the prediction accuracies of 0.51 and 0.73 at the species and genus levels, respectively, outperforming other phage-host prediction tools. When applied to on 125,842 metagenomic viral contigs (mVCs) derived from 3042 geographically diverse samples, a detection rate of 54.54% could be achieved. Furthermore, PHISDetector could predict infecting phages for 85.6% of 368 multidrug-resistant (MDR) bacteria and 30% of 454 human gut bacteria obtained from the National Institutes of Health (NIH) Human Microbiome Project (HMP). The PHISDetector can be run either as a web server (http://www.microbiome-bigdata.com/PHISDetector/) for general users to study individual inputs or as a stand-alone version (https://github.com/HIT-ImmunologyLab/PHISDetector) to process massive phage contigs from virome studies.
Collapse
Affiliation(s)
- Fengxia Zhou
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Rui Gan
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Fan Zhang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Chunyan Ren
- Department of Hematology/oncology, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Ling Yu
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Yu Si
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Zhiwei Huang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China,Corresponding author.
| |
Collapse
|
58
|
Chen H, Mayer A, Balasubramanian V. A scaling law in CRISPR repertoire sizes arises from the avoidance of autoimmunity. Curr Biol 2022; 32:2897-2907.e5. [PMID: 35659862 DOI: 10.1016/j.cub.2022.05.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/13/2022] [Accepted: 05/09/2022] [Indexed: 12/28/2022]
Abstract
Some prokaryotes possess CRISPR-Cas systems that use DNA segments called spacers, which are acquired from invading phages, to guide immune defense. Here, we propose that cross-reactive CRISPR targeting can, however, lead to "heterologous autoimmunity," whereby foreign spacers guide self-targeting in a spacer-length-dependent fashion. Balancing antiviral defense against autoimmunity predicts a scaling relation between spacer length and CRISPR repertoire size. We find evidence for this scaling through a comparative analysis of sequenced prokaryotic genomes and show that this association also holds at the level of CRISPR types. By contrast, the scaling is absent in strains with nonfunctional CRISPR loci. Finally, we demonstrate that stochastic spacer loss can explain variations around the scaling relation, even between strains of the same species. Our results suggest that heterologous autoimmunity is a selective factor shaping the evolution of CRISPR-Cas systems, analogous to the trade-offs between immune specificity, breadth, and autoimmunity that constrain the diversity of adaptive immune systems in vertebrates.
Collapse
Affiliation(s)
- Hanrong Chen
- David Rittenhouse Laboratory, Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA; Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore 138672, Singapore.
| | - Andreas Mayer
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | - Vijay Balasubramanian
- David Rittenhouse Laboratory, Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA; Theoretische Natuurkunde, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| |
Collapse
|
59
|
Shang J, Sun Y. CHERRY: a Computational metHod for accuratE pRediction of virus-pRokarYotic interactions using a graph encoder-decoder model. Brief Bioinform 2022; 23:6589865. [PMID: 35595715 PMCID: PMC9487644 DOI: 10.1093/bib/bbac182] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 04/01/2022] [Accepted: 04/24/2022] [Indexed: 01/01/2023] Open
Abstract
Prokaryotic viruses, which infect bacteria and archaea, are key players in microbial communities. Predicting the hosts of prokaryotic viruses helps decipher the dynamic relationship between microbes. Experimental methods for host prediction cannot keep pace with the fast accumulation of sequenced phages. Thus, there is a need for computational host prediction. Despite some promising results, computational host prediction remains a challenge because of the limited known interactions and the sheer amount of sequenced phages by high-throughput sequencing technologies. The state-of-the-art methods can only achieve 43% accuracy at the species level. In this work, we formulate host prediction as link prediction in a knowledge graph that integrates multiple protein and DNA-based sequence features. Our implementation named CHERRY can be applied to predict hosts for newly discovered viruses and to identify viruses infecting targeted bacteria. We demonstrated the utility of CHERRY for both applications and compared its performance with 11 popular host prediction methods. To our best knowledge, CHERRY has the highest accuracy in identifying virus–prokaryote interactions. It outperforms all the existing methods at the species level with an accuracy increase of 37%. In addition, CHERRY’s performance on short contigs is more stable than other tools.
Collapse
Affiliation(s)
- Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China SAR
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China SAR
| |
Collapse
|
60
|
Shi LD, Dong X, Liu Z, Yang Y, Lin JG, Li M, Gu JD, Zhu LZ, Zhao HP. A mixed blessing of viruses in wastewater treatment plants. WATER RESEARCH 2022; 215:118237. [PMID: 35245718 DOI: 10.1016/j.watres.2022.118237] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 02/23/2022] [Accepted: 02/24/2022] [Indexed: 06/14/2023]
Abstract
Activated sludge of wastewater treatment plants harbors a very high diversity of both microorganisms and viruses, wherein the latter control microbial dynamics and metabolisms by infection and lysis of cells. However, it remains poorly understood how viruses impact the biochemical processes of activated sludge, for example in terms of treatment efficiency and pollutant removal. Using metagenomic and metatranscriptomic deep sequencing, the present study recovered thousands of viral sequences from activated sludge samples of three conventional wastewater treatment plants. Gene-sharing network indicated that most of viruses could not be assigned to known viral genera, implying activated sludge as an underexplored reservoir for new viruses and viral diversity. In silico predictions of virus-host linkages demonstrated that infected microbial hosts, mostly belonging to bacteria, were transcriptionally active and able to hydrolyze polymers including starches, celluloses, and proteins. Some viruses encode auxiliary metabolic genes (AMGs) involved in carbon, nitrogen, and sulfur cycling, and antibiotic resistance genes (ARGs) for resistance to multiple drugs. The virus-encoded AMGs may enhance the biodegradation of contaminants like starches and celluloses, suggesting a positive role for viruses in strengthening the performance of activated sludge. However, ARGs would be disseminated to different microorganisms using viruses as gene shuttles, demonstrating the possibility for viruses to facilitate the spread of antibiotic resistance in the environment. Collectively, this study highlights the mixed blessing of viruses in wastewater treatment plants, and deciphers how they manipulate the biochemical processes in the activated sludge, with implications for both environmental protection and ecosystem security.
Collapse
Affiliation(s)
- Ling-Dong Shi
- MOE Key Lab of Environmental Remediation and Ecosystem Health, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiyang Dong
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China; Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China
| | - Zongbao Liu
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Yuchun Yang
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Guangzhou 510275, China
| | - Jih-Gaw Lin
- Institute of Environmental Engineering, National Yang Ming Chiao Tung University, 1001 University Road, Hsinchu 30010, Taiwan
| | - Meng Li
- Archaeal Biology Center, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China; Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen 518060, China
| | - Ji-Dong Gu
- Environmental Science and Engineering Program, Guangdong Technion - Israel Institute of Technology, 241 Daxue Road, Shantou, Guangdong 515063, China
| | - Li-Zhong Zhu
- MOE Key Lab of Environmental Remediation and Ecosystem Health, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - He-Ping Zhao
- MOE Key Lab of Environmental Remediation and Ecosystem Health, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
| |
Collapse
|
61
|
Host-Associated Phages Disperse across the Extraterrestrial Analogue Antarctica. Appl Environ Microbiol 2022; 88:e0031522. [PMID: 35499326 DOI: 10.1128/aem.00315-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Extreme Antarctic conditions provide one of the closest analogues of extraterrestrial environments. Since air and snow samples, especially from polar regions, yield DNA amounts in the lower picogram range, binning of prokaryotic genomes is challenging and renders studying the dispersal of biological entities across these environments difficult. Here, we hypothesized that dispersal of host-associated bacteriophages (adsorbed, replicating, or prophages) across the Antarctic continent can be tracked via their genetic signatures, aiding our understanding of virus and host dispersal across long distances. Phage genome fragments (PGFs) reconstructed from surface snow metagenomes of three Antarctic stations were assigned to four host genomes, mainly Betaproteobacteria, including Ralstonia spp. We reconstructed the complete genome of a temperate phage with nearly complete alignment to a prophage in the reference genome of Ralstonia pickettii 12D. PGFs from different stations were related to each other at the genus level and matched similar hosts. Metagenomic read mapping and nucleotide polymorphism analysis revealed a wide dispersal of highly identical PGFs, 13 of which were detected in seawater from the Western Antarctic Peninsula at a distance of 5,338 km from the snow sampling stations. Our results suggest that host-associated phages, especially of Ralstonia sp., disperse over long distances despite the harsh conditions of the Antarctic continent. Given that 14 phages associated with two R. pickettii draft genomes isolated from space equipment were identified, we conclude that Ralstonia phages are ideal mobile genetic elements to track dispersal and contamination in ecosystems relevant for astrobiology. IMPORTANCE Host-associated phages of the bacterium Ralstonia identified in snow samples can be used to track microbial dispersal over thousands of kilometers across the Antarctic continent, which functions as an extraterrestrial analogue because of its harsh environmental conditions. Due to the presence of these bacteria carrying genome-integrated prophages on space-related equipment and the potential for dispersal of host-associated phages demonstrated here, our work has implications for planetary protection, a discipline in astrobiology interested in preventing contamination of celestial bodies with alien biomolecules or forms of life.
Collapse
|
62
|
McKay LJ, Nigro OD, Dlakić M, Luttrell KM, Rusch DB, Fields MW, Inskeep WP. Sulfur cycling and host-virus interactions in Aquificales-dominated biofilms from Yellowstone's hottest ecosystems. THE ISME JOURNAL 2022; 16:842-855. [PMID: 34650231 PMCID: PMC8857204 DOI: 10.1038/s41396-021-01132-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 09/20/2021] [Accepted: 09/27/2021] [Indexed: 12/26/2022]
Abstract
Modern linkages among magmatic, geochemical, and geobiological processes provide clues about the importance of thermophiles in the origin of biogeochemical cycles. The aim of this study was to identify the primary chemoautotrophs and host-virus interactions involved in microbial colonization and biogeochemical cycling at sublacustrine, vapor-dominated vents that represent the hottest measured ecosystems in Yellowstone National Park (~140 °C). Filamentous microbial communities exposed to extreme thermal and geochemical gradients were sampled using a remotely operated vehicle and subjected to random metagenome sequencing and microscopic analyses. Sulfurihydrogenibium (phylum Aquificae) was the predominant lineage (up to 84% relative abundance) detected at vents that discharged high levels of dissolved H2, H2S, and CO2. Metabolic analyses indicated carbon fixation by Sulfurihydrogenibium spp. was powered by the oxidation of reduced sulfur and H2, which provides organic carbon for heterotrophic community members. Highly variable Sulfurihydrogenibium genomes suggested the importance of intra-population diversity under extreme environmental and viral pressures. Numerous lytic viruses (primarily unclassified taxa) were associated with diverse archaea and bacteria in the vent community. Five circular dsDNA uncultivated virus genomes (UViGs) of ~40 kbp length were linked to the Sulfurihydrogenibium metagenome-assembled genome (MAG) by CRISPR spacer matches. Four UViGs contained consistent genome architecture and formed a monophyletic cluster with the recently proposed Pyrovirus genus within the Caudovirales. Sulfurihydrogenibium spp. also contained CRISPR arrays linked to plasmid DNA with genes for a novel type IV filament system and a highly expressed β-barrel porin. A diverse suite of transcribed secretion systems was consistent with direct microscopic analyses, which revealed an extensive extracellular matrix likely critical to community structure and function. We hypothesize these attributes are fundamental to the establishment and survival of microbial communities in highly turbulent, extreme-gradient environments.
Collapse
Affiliation(s)
- Luke J. McKay
- grid.41891.350000 0001 2156 6108Department of Land Resources & Environmental Sciences, Montana State University, Bozeman, MT 59717 USA ,grid.41891.350000 0001 2156 6108Thermal Biology Institute, Montana State University, Bozeman, MT 59717 USA ,grid.41891.350000 0001 2156 6108Center for Biofilm Engineering, Montana State University, Bozeman, MT 59717 USA
| | - Olivia D. Nigro
- grid.256872.c0000 0000 8741 0387Department of Natural Science, Hawaii Pacific University, Honolulu, HI 96813 USA
| | - Mensur Dlakić
- grid.41891.350000 0001 2156 6108Department of Microbiology & Cell Biology, Montana State University, Bozeman, MT 59717 USA
| | - Karen M. Luttrell
- grid.64337.350000 0001 0662 7451Department of Geology & Geophysics, Louisiana State University, Baton Rouge, LA 70803 USA
| | - Douglas B. Rusch
- grid.411377.70000 0001 0790 959XCenter for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405 USA
| | - Matthew W. Fields
- grid.41891.350000 0001 2156 6108Thermal Biology Institute, Montana State University, Bozeman, MT 59717 USA ,grid.41891.350000 0001 2156 6108Department of Microbiology & Cell Biology, Montana State University, Bozeman, MT 59717 USA
| | - William P. Inskeep
- grid.41891.350000 0001 2156 6108Department of Land Resources & Environmental Sciences, Montana State University, Bozeman, MT 59717 USA ,grid.41891.350000 0001 2156 6108Thermal Biology Institute, Montana State University, Bozeman, MT 59717 USA
| |
Collapse
|
63
|
Versoza CJ, Pfeifer SP. Computational Prediction of Bacteriophage Host Ranges. Microorganisms 2022; 10:149. [PMID: 35056598 PMCID: PMC8778386 DOI: 10.3390/microorganisms10010149] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/06/2022] [Accepted: 01/11/2022] [Indexed: 12/27/2022] Open
Abstract
Increased antibiotic resistance has prompted the development of bacteriophage agents for a multitude of applications in agriculture, biotechnology, and medicine. A key factor in the choice of agents for these applications is the host range of a bacteriophage, i.e., the bacterial genera, species, and strains a bacteriophage is able to infect. Although experimental explorations of host ranges remain the gold standard, such investigations are inherently limited to a small number of viruses and bacteria amendable to cultivation. Here, we review recently developed bioinformatic tools that offer a promising and high-throughput alternative by computationally predicting the putative host ranges of bacteriophages, including those challenging to grow in laboratory environments.
Collapse
Affiliation(s)
- Cyril J. Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Susanne P. Pfeifer
- Center for Mechanisms of Evolution, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
64
|
Abstract
Motivation Phage–host associations play important roles in microbial communities. But in natural communities, as opposed to culture-based lab studies where phages are discovered and characterized metagenomically, their hosts are generally not known. Several programs have been developed for predicting which phage infects which host based on various sequence similarity measures or machine learning approaches. These are often based on whole viral and host genomes, but in metagenomics-based studies, we rarely have whole genomes but rather must rely on contigs that are sometimes as short as hundreds of bp long. Therefore, we need programs that predict hosts of phage contigs on the basis of these short contigs. Although most existing programs can be applied to metagenomic datasets for these predictions, their accuracies are generally low. Here, we develop ContigNet, a convolutional neural network-based model capable of predicting phage–host matches based on relatively short contigs, and compare it to previously published VirHostMatcher (VHM) and WIsH. Results On the validation set, ContigNet achieves 72–85% area under the receiver operating characteristic curve (AUROC) scores, compared to the maximum of 68% by VHM or WIsH for contigs of lengths between 200 bps to 50 kbps. We also apply the model to the Metagenomic Gut Virus (MGV) catalogue, a dataset containing a wide range of draft genomes from metagenomic samples and achieve 60–70% AUROC scores compared to that of VHM and WIsH of 52%. Surprisingly, ContigNet can also be used to predict plasmid-host contig associations with high accuracy, indicating a similar genetic exchange between mobile genetic elements and their hosts. Availability and implementation The source code of ContigNet and related datasets can be downloaded from https://github.com/tianqitang1/ContigNet.
Collapse
Affiliation(s)
- Tianqi Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Fengzhu Sun
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
65
|
Lawrence D, Campbell DE, Schriefer LA, Rodgers R, Walker FC, Turkin M, Droit L, Parkes M, Handley SA, Baldridge MT. Single-cell genomics for resolution of conserved bacterial genes and mobile genetic elements of the human intestinal microbiota using flow cytometry. Gut Microbes 2022; 14:2029673. [PMID: 35130125 PMCID: PMC8824198 DOI: 10.1080/19490976.2022.2029673] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 12/03/2021] [Accepted: 01/07/2022] [Indexed: 02/04/2023] Open
Abstract
As our understanding of the importance of the human microbiota in health and disease grows, so does our need to carefully resolve and delineate its genomic content. 16S rRNA gene-based analyses yield important insights into taxonomic composition, and metagenomics-based approaches reveal the functional potential of microbial communities. However, these methods generally fail to directly link genetic features, including bacterial genes and mobile genetic elements, to each other and to their source bacterial genomes. Further, they are inadequate to capture the microdiversity present within a genus, species, or strain of bacteria within these complex communities. Here, we present a method utilizing fluorescence-activated cell sorting for isolation of single bacterial cells, amplifying their genomes, screening them by 16S rRNA gene analysis, and selecting cells for genomic sequencing. We apply this method to both a cultured laboratory strain of Escherichia coli and human stool samples. Our analyses reveal the capacity of this method to provide nearly complete coverage of bacterial genomes when applied to isolates and partial genomes of bacterial species recovered from complex communities. Additionally, this method permits exploration and comparison of conserved and variable genomic features between individual cells. We generate assemblies of novel genomes within the Ruminococcaceae family and the Holdemanella genus by combining several 16S rRNA gene-matched single cells, and report novel prophages and conjugative transposons for both Bifidobacterium and Ruminococcaceae. Thus, we demonstrate an approach for flow cytometric separation and sequencing of single bacterial cells from the human microbiota, which yields a variety of critical insights into both the functional potential of individual microbes and the variation among those microbes. This method definitively links a variety of conserved and mobile genomic features, and can be extended to further resolve diverse elements present in the human microbiota.
Collapse
Affiliation(s)
- Dylan Lawrence
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Danielle E. Campbell
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Lawrence A. Schriefer
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Rachel Rodgers
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Forrest C. Walker
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Marissa Turkin
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Lindsay Droit
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Miles Parkes
- Division of Gastroenterology Addenbrooke’s Hospital and Department of Medicine, University of Cambridge, Cambridge, UK
| | - Scott A. Handley
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Megan T. Baldridge
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
66
|
Marine viruses and climate change: Virioplankton, the carbon cycle, and our future ocean. Adv Virus Res 2022. [DOI: 10.1016/bs.aivir.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
67
|
Lood C, Boeckaerts D, Stock M, De Baets B, Lavigne R, van Noort V, Briers Y. Digital phagograms: predicting phage infectivity through a multilayer machine learning approach. Curr Opin Virol 2021; 52:174-181. [PMID: 34952265 DOI: 10.1016/j.coviro.2021.12.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 11/26/2021] [Accepted: 12/04/2021] [Indexed: 12/19/2022]
Abstract
Machine learning has been broadly implemented to investigate biological systems. In this regard, the field of phage biology has embraced machine learning to elucidate and predict phage-host interactions, based on receptor-binding proteins, (anti-)defense systems, prophage detection, and life cycle recognition. Here, we highlight the enormous potential of integrating information from omics data with insights from systems biology to better understand phage-host interactions. We conceptualize and discuss the potential of a multilayer model that mirrors the phage infection process, integrating adsorption, bacterial pan-immune components and hijacking of the bacterial metabolism to predict phage infectivity. In the future, this model can offer insights into the underlying mechanisms of the infection process, and digital phagograms can support phage cocktail design and phage engineering.
Collapse
Affiliation(s)
- Cédric Lood
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven, Belgium; Centre of Microbial and Plant Genetics, Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
| | - Dimitri Boeckaerts
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium; KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium; BIOBIX, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Rob Lavigne
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven, Belgium.
| | - Vera van Noort
- Centre of Microbial and Plant Genetics, Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium; Institute of Biology, Leiden University, Leiden, The Netherlands.
| | - Yves Briers
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.
| |
Collapse
|
68
|
Gu C, Liang Y, Li J, Shao H, Jiang Y, Zhou X, Gao C, Li X, Zhang W, Guo C, He H, Wang H, Sung YY, Mok WJ, Wong LL, Suttle CA, McMinn A, Tian J, Wang M. Saline lakes on the Qinghai-Tibet Plateau harbor unique viral assemblages mediating microbial environmental adaption. iScience 2021; 24:103439. [PMID: 34988389 PMCID: PMC8710556 DOI: 10.1016/j.isci.2021.103439] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 10/10/2021] [Accepted: 11/10/2021] [Indexed: 12/21/2022] Open
Abstract
The highest plateau on Earth, Qinghai-Tibet Plateau, contains thousands of lakes with broad salinity and diverse and unique microbial communities. However, little is known about their co-occurring viruses. Herein, we identify 4,560 viral Operational Taxonomic Units (vOTUs) from six viromes of three saline lakes on Qinghai-Tibet Plateau, with less than 1% that could be classified. Most of the predicted vOTUs were associated with the dominant bacterial and archaeal phyla. Virus-encoded auxiliary metabolic genes suggest that viruses influence microbial metabolisms of carbon, nitrogen, sulfur, and lipid; the antibiotic resistance mediation; and their salinity adaption. The six viromes clustered together with the ice core viromes and bathypelagic ocean viromes and might represent a new viral habitat. This study has revealed the unique characteristics and potential ecological roles of DNA viromes in the lakes of the highest plateau and established a foundation for the recognition of the viral roles in plateau lake ecosystems.
Collapse
Affiliation(s)
- Chengxiang Gu
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Yantao Liang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Jiansen Li
- Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining 810008, China
- Key Laboratory of Crust-Mantle Materials and Environments, School of Earth and Space Sciences, University of Science and Technology of China, Hefei 230026, China
| | - Hongbing Shao
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Yong Jiang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Xinhao Zhou
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Chen Gao
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Xianrong Li
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Wenjing Zhang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Cui Guo
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Hui He
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Hualong Wang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
| | - Yeong Yik Sung
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu (UMT), 21030 Kuala Nerus, Malaysia
| | - Wen Jye Mok
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu (UMT), 21030 Kuala Nerus, Malaysia
| | - Li Lian Wong
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu (UMT), 21030 Kuala Nerus, Malaysia
| | - Curtis A. Suttle
- Departments of Earth, Ocean and Atmospheric Sciences, Microbiology and Immunology, and Botany and Institute for the Oceans and Fisheries, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Andrew McMinn
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, TAS 7001, Australia
| | - Jiwei Tian
- Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Laboratory of Physical Oceanography, Ministry of Education, Ocean University of China, Qingdao 266100, China
| | - Min Wang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, and Key Lab of Polar Oceanography and Global Ocean Change, Ocean University of China, Qingdao 266003, China
- UMT-OUC Joint Center for Marine Studies, Qingdao 266003, China
- The affiliated hospital of Qingdao University, Qingdao 266000, China
| |
Collapse
|
69
|
Shang J, Sun Y. Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning. BMC Biol 2021; 19:250. [PMID: 34819064 PMCID: PMC8611875 DOI: 10.1186/s12915-021-01180-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/29/2021] [Indexed: 11/23/2022] Open
Abstract
Background Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction. Results In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifically designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class). Conclusion HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa. Supplementary Information The online version contains supplementary material available at (10.1186/s12915-021-01180-4).
Collapse
Affiliation(s)
- Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
70
|
Zielezinski A, Barylski J, Karlowski WM. Taxonomy-aware, sequence similarity ranking reliably predicts phage-host relationships. BMC Biol 2021; 19:223. [PMID: 34625070 PMCID: PMC8501573 DOI: 10.1186/s12915-021-01146-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 09/06/2021] [Indexed: 12/02/2022] Open
Abstract
Background Characterizing phage–host interactions is critical to understanding the ecological role of both partners and effective isolation of phage therapeuticals. Unfortunately, experimental methods for studying these interactions are markedly slow, low-throughput, and unsuitable for phages or hosts difficult to maintain in laboratory conditions. Therefore, a number of in silico methods emerged to predict prokaryotic hosts based on viral sequences. One of the leading approaches is the application of the BLAST tool that searches for local similarities between viral and microbial genomes. However, this prediction method has three major limitations: (i) top-scoring sequences do not always point to the actual host; (ii) mosaic virus genomes may match to many, typically related, bacteria; and (iii) viral and host sequences may diverge beyond the point where their relationship can be detected by a BLAST alignment. Results We created an extension to BLAST, named Phirbo, that improves host prediction quality beyond what is obtainable from standard BLAST searches. The tool harnesses information concerning sequence similarity and bacteria relatedness to predict phage–host interactions. Phirbo was evaluated on three benchmark sets of known virus–host pairs, and it improved precision and recall by 11–40 percentage points over currently available, state-of-the-art, alignment-based, alignment-free, and machine-learning host prediction tools. Moreover, the discriminatory power of Phirbo for the recognition of virus–host relationships surpassed the results of other tools by at least 10 percentage points (area under the curve = 0.95), yielding a mean host prediction accuracy of 57% and 68% at the genus and family levels, respectively, and drops by 12 percentage points when using only a fraction of viral genome sequences (3 kb). Finally, we provide insights into a repertoire of protein and ncRNA genes that are shared between phages and hosts and may be prone to horizontal transfer during infection. Conclusions Our results suggest that Phirbo is a simple and effective tool for predicting phage–host relationships. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01146-6.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| | - Jakub Barylski
- Molecular Virology Research Unit, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
71
|
Diversity and distribution of viruses inhabiting the deepest ocean on Earth. THE ISME JOURNAL 2021; 15:3094-3110. [PMID: 33972725 PMCID: PMC8443753 DOI: 10.1038/s41396-021-00994-y] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 04/08/2021] [Accepted: 04/20/2021] [Indexed: 02/01/2023]
Abstract
As the most abundant biological entities on the planet, viruses significantly influence the overall functioning of marine ecosystems. The abundance, distribution, and biodiversity of viral communities in the upper ocean have been relatively well studied, but our understanding of viruses in the hadal biosphere remains poor. Here, we established the oceanic trench viral genome dataset (OTVGD) by analysing 19 microbial metagenomes derived from seawater and sediment samples of the Mariana, Yap, and Kermadec Trenches. The trench viral communities harbored remarkably high novelty, and they were predicted to infect ecologically important microbial clades, including Thaumarchaeota and Oleibacter. Significant inter-trench and intra-trench exchange of viral communities was proposed. Moreover, viral communities in different habitats (seawater/sediment and depth-stratified ocean zones) exhibited distinct niche-dependent distribution patterns and genomic properties. Notably, microbes and viruses in the hadopelagic seawater seemed to preferably adopt lysogenic lifestyles compared to those in the upper ocean. Furthermore, niche-specific auxiliary metabolic genes were identified in the hadal viral genomes, and a novel viral D-amino acid oxidase was functionally and phylogenetically characterized, suggesting the contribution of these genes in the utilization of refractory organic matter. Together, these findings highlight the genomic novelty, dynamic movement, and environment-driven diversification of viral communities in oceanic trenches, and suggest that viruses may influence the hadal ecosystem by reprogramming the metabolism of their hosts and modulating the community of keystone microbes.
Collapse
|
72
|
Current challenges to virus discovery by meta-transcriptomics. Curr Opin Virol 2021; 51:48-55. [PMID: 34592710 DOI: 10.1016/j.coviro.2021.09.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 08/16/2021] [Accepted: 09/14/2021] [Indexed: 12/13/2022]
Abstract
Meta-transcriptomic next-generation sequencing has transformed virus discovery, dramatically expanding our knowledge of the known virosphere. Nevertheless, the use of meta-transcriptomics for virus discovery faces important challenges. As this technology becomes more widely adopted, the proportion of viral sequences in public databases with incorrect (e.g. mis-assignment of host) or limited information (e.g. lacking taxonomic classification) is likely to grow, limiting their utility in bioinformatic pipelines for virus discovery. In addition, we currently lack the bioinformatic tools that can accurately identify viruses showing little or no sequence similarity to database viruses or those that represent likely reagent contaminants. Herein, we outline some of the challenges to effective meta-transcriptomic virus discovery as well as their potential solutions.
Collapse
|
73
|
Ruohan W, Xianglilan Z, Jianping W, Shuai Cheng LI. DeepHost: phage host prediction with convolutional neural network. Brief Bioinform 2021; 23:6374063. [PMID: 34553750 DOI: 10.1093/bib/bbab385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/10/2021] [Accepted: 08/27/2021] [Indexed: 01/21/2023] Open
Abstract
Next-generation sequencing expands the known phage genomes rapidly. Unlike culture-based methods, the hosts of phages discovered from next-generation sequencing data remain uncharacterized. The high diversity of the phage genomes makes the host assignment task challenging. To solve the issue, we proposed a phage host prediction tool-DeepHost. To encode the phage genomes into matrices, we design a genome encoding method that applied various spaced $k$-mer pairs to tolerate sequence variations, including insertion, deletions, and mutations. DeepHost applies a convolutional neural network to predict host taxonomies. DeepHost achieves the prediction accuracy of 96.05% at the genus level (72 taxonomies) and 90.78% at the species level (118 taxonomies), which outperforms the existing phage host prediction tools by 10.16-30.48% and achieves comparable results to BLAST. For the genomes without hits in BLAST, DeepHost obtains the accuracy of 38.00% at the genus level and 26.47% at the species level, making it suitable for genomes of less homologous sequences with the existing datasets. DeepHost is alignment-free, and it is faster than BLAST, especially for large datasets. DeepHost is available at https://github.com/deepomicslab/DeepHost.
Collapse
Affiliation(s)
- Wang Ruohan
- Department of Computer Science at City University of Hong Kong
| | - Zhang Xianglilan
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology
| | - Wang Jianping
- Department of Computer Science at City University of Hong Kong
| | - L I Shuai Cheng
- Department of Computer Science at City University of Hong Kong
| |
Collapse
|
74
|
Sørensen AN, Woudstra C, Sørensen MCH, Brøndsted L. Subtypes of tail spike proteins predicts the host range of Ackermannviridae phages. Comput Struct Biotechnol J 2021; 19:4854-4867. [PMID: 34527194 PMCID: PMC8432352 DOI: 10.1016/j.csbj.2021.08.030] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 12/01/2022] Open
Abstract
Phages belonging to the Ackermannviridae family encode up to four tail spike proteins (TSPs), each recognizing a specific receptor of their bacterial hosts. Here, we determined the TSPs diversity of 99 Ackermannviridae phages by performing a comprehensive in silico analysis. Based on sequence diversity, we assigned all TSPs into distinctive subtypes of TSP1, TSP2, TSP3 and TSP4, and found each TSP subtype to be specifically associated with the genera (Kuttervirus, Agtrevirus, Limestonevirus, Taipeivirus) of the Ackermannviridae family. Further analysis showed that the N-terminal XD1 and XD2 domains in TSP2 and TSP4, hinging the four TSPs together, are preserved. In contrast, the C-terminal receptor binding modules were only conserved within TSP subtypes, except for some Kuttervirus TSP1s and TSP3s that were similar to specific TSP4s. A conserved motif in TSP1, TSP3 and TSP4 of Kuttervirus phages may allow recombination between receptor binding modules, thus altering host recognition. The receptors for numerous uncharacterized phages expressing TSPs in the same subtypes were predicted using previous host range data. To validate our predictions, we experimentally determined the host recognition of three of the four TSPs expressed by kuttervirus S117. We confirmed that S117 TSP1 and TSP2 bind to their predicted host receptors, and identified the receptor for TSP3, which is shared by 51 other Kuttervirus phages. Kuttervirus phages were thus shown encode a vast genetic diversity of potentially exchangeable TSPs influencing host recognition. Overall, our study demonstrates that comprehensive in silico and host range analysis of TSPs can predict host recognition of Ackermannviridae phages.
Collapse
Key Words
- ANI, Average nucleotide identity
- Ackermannviridae family
- Bacteriophage
- CPS, Capsular polysaccharide
- EOP, Efficiency of plating
- Escherichia coli O:157
- Host range
- LB, Luria-Bertani
- LPS, Lipopolysaccharide
- NCBI, National Center for Biotechnology Information
- O-antigen
- ORF, Open reading frame
- PFU, Plaque formation unit
- RBP, Receptor binding protein
- Receptor-binding proteins
- Salmonella
- TSP, Tail spike protein
- Tail spike proteins
- VriC, Virulence-associated protein
Collapse
Affiliation(s)
- Anders Nørgaard Sørensen
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| | - Cedric Woudstra
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| | - Martine C Holst Sørensen
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| | - Lone Brøndsted
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| |
Collapse
|
75
|
Wu S, Fang Z, Tan J, Li M, Wang C, Guo Q, Xu C, Jiang X, Zhu H. DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience 2021; 10:giab056. [PMID: 34498685 PMCID: PMC8427542 DOI: 10.1093/gigascience/giab056] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment. FINDINGS DeePhage uses a "one-hot" encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. CONCLUSIONS DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.
Collapse
Affiliation(s)
- Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Jie Tan
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Mo Li
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Chunhui Wang
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Congmin Xu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
- Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, Beijing, China
| |
Collapse
|
76
|
Li M, Wang Y, Li F, Zhao Y, Liu M, Zhang S, Bin Y, Smith AI, Webb GI, Li J, Song J, Xia J. A Deep Learning-Based Method for Identification of Bacteriophage-Host Interaction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1801-1810. [PMID: 32813660 PMCID: PMC8703204 DOI: 10.1109/tcbb.2020.3017386] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.
Collapse
|
77
|
Li M, Zhang W. PHIAF: prediction of phage-host interactions with GAN-based data augmentation and sequence-based feature fusion. Brief Bioinform 2021; 23:6362109. [PMID: 34472593 DOI: 10.1093/bib/bbab348] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 07/05/2021] [Accepted: 07/18/2021] [Indexed: 01/01/2023] Open
Abstract
Phage therapy has become one of the most promising alternatives to antibiotics in the treatment of bacterial diseases, and identifying phage-host interactions (PHIs) helps to understand the possible mechanism through which a phage infects bacteria to guide the development of phage therapy. Compared with wet experiments, computational methods of identifying PHIs can reduce costs and save time and are more effective and economic. In this paper, we propose a PHI prediction method with a generative adversarial network (GAN)-based data augmentation and sequence-based feature fusion (PHIAF). First, PHIAF applies a GAN-based data augmentation module, which generates pseudo PHIs to alleviate the data scarcity. Second, PHIAF fuses the features originated from DNA and protein sequences for better performance. Third, PHIAF utilizes an attention mechanism to consider different contributions of DNA/protein sequence-derived features, which also provides interpretability of the prediction model. In computational experiments, PHIAF outperforms other state-of-the-art PHI prediction methods when evaluated via 5-fold cross-validation (AUC and AUPR are 0.88 and 0.86, respectively). An ablation study shows that data augmentation, feature fusion and an attention mechanism are all beneficial to improve the prediction performance of PHIAF. Additionally, four new PHIs with the highest PHIAF score in the case study were verified by recent literature. In conclusion, PHIAF is a promising tool to accelerate the exploration of phage therapy.
Collapse
Affiliation(s)
- Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
78
|
Tan J, Fang Z, Wu S, Guo Q, Jiang X, Zhu H. HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes. Bioinformatics 2021; 38:543-545. [PMID: 34383025 PMCID: PMC8723153 DOI: 10.1093/bioinformatics/btab585] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 07/27/2021] [Accepted: 08/10/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY We present HoPhage (Host of Phage) to identify the host of a given phage fragment from metavirome data at the genus level. HoPhage integrates two modules using a deep learning algorithm and a Markov chain model, respectively. HoPhage achieves 47.90% and 82.47% mean accuracy at the genus and phylum levels for ∼1-kb long artificial phage fragments when predicting host among 50 genera, representing 7.54-20.22% and 13.55-24.31% improvement, respectively. By testing on three real virome samples, HoPhage yields 81.11% mean accuracy at the genus level within a much broader candidate host range. AVAILABILITY AND IMPLEMENTATION HoPhage is available at http://cqb.pku.edu.cn/ZhuLab/HoPhage/data/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jie Tan
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | | |
Collapse
|
79
|
Li Z, Pan D, Wei G, Pi W, Zhang C, Wang JH, Peng Y, Zhang L, Wang Y, Hubert CRJ, Dong X. Deep sea sediments associated with cold seeps are a subsurface reservoir of viral diversity. THE ISME JOURNAL 2021; 15:2366-2378. [PMID: 33649554 PMCID: PMC8319345 DOI: 10.1038/s41396-021-00932-y] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/04/2021] [Accepted: 02/08/2021] [Indexed: 12/11/2022]
Abstract
In marine ecosystems, viruses exert control on the composition and metabolism of microbial communities, influencing overall biogeochemical cycling. Deep sea sediments associated with cold seeps are known to host taxonomically diverse microbial communities, but little is known about viruses infecting these microorganisms. Here, we probed metagenomes from seven geographically diverse cold seeps across global oceans to assess viral diversity, virus-host interaction, and virus-encoded auxiliary metabolic genes (AMGs). Gene-sharing network comparisons with viruses inhabiting other ecosystems reveal that cold seep sediments harbour considerable unexplored viral diversity. Most cold seep viruses display high degrees of endemism with seep fluid flux being one of the main drivers of viral community composition. In silico predictions linked 14.2% of the viruses to microbial host populations with many belonging to poorly understood candidate bacterial and archaeal phyla. Lysis was predicted to be a predominant viral lifestyle based on lineage-specific virus/host abundance ratios. Metabolic predictions of prokaryotic host genomes and viral AMGs suggest that viruses influence microbial hydrocarbon biodegradation at cold seeps, as well as other carbon, sulfur and nitrogen cycling via virus-induced mortality and/or metabolic augmentation. Overall, these findings reveal the global diversity and biogeography of cold seep viruses and indicate how viruses may manipulate seep microbial ecology and biogeochemistry.
Collapse
Affiliation(s)
- Zexin Li
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China
| | - Donald Pan
- Department of Ecology and Environmental Studies, The Water School, Florida Gulf Coast University, Fort Myers, FL, USA
| | - Guangshan Wei
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China
- Key Laboratory of Marine Genetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, China
| | - Weiling Pi
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China
| | - Chuwen Zhang
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China
| | - Jiang-Hai Wang
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China
| | - Yongyi Peng
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China
| | - Lu Zhang
- Key Laboratory of Coastal Environment and Resources of Zhejiang Province, School of Engineering, Westlake University, Hangzhou, China
- Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou, China
| | - Yong Wang
- Department of Life Science, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, China
| | - Casey R J Hubert
- Department of Biological Sciences, University of Calgary, Calgary, AB, Canada
| | - Xiyang Dong
- School of Marine Sciences, Sun Yat-Sen University, Zhuhai, China.
| |
Collapse
|
80
|
Abstract
Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.
Collapse
Affiliation(s)
- Lee Call
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| | - Stephen Nayfach
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| |
Collapse
|
81
|
Zhong ZP, Tian F, Roux S, Gazitúa MC, Solonenko NE, Li YF, Davis ME, Van Etten JL, Mosley-Thompson E, Rich VI, Sullivan MB, Thompson LG. Glacier ice archives nearly 15,000-year-old microbes and phages. MICROBIOME 2021; 9:160. [PMID: 34281625 PMCID: PMC8290583 DOI: 10.1186/s40168-021-01106-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 05/31/2021] [Indexed: 05/05/2023]
Abstract
BACKGROUND Glacier ice archives information, including microbiology, that helps reveal paleoclimate histories and predict future climate change. Though glacier-ice microbes are studied using culture or amplicon approaches, more challenging metagenomic approaches, which provide access to functional, genome-resolved information and viruses, are under-utilized, partly due to low biomass and potential contamination. RESULTS We expand existing clean sampling procedures using controlled artificial ice-core experiments and adapted previously established low-biomass metagenomic approaches to study glacier-ice viruses. Controlled sampling experiments drastically reduced mock contaminants including bacteria, viruses, and free DNA to background levels. Amplicon sequencing from eight depths of two Tibetan Plateau ice cores revealed common glacier-ice lineages including Janthinobacterium, Polaromonas, Herminiimonas, Flavobacterium, Sphingomonas, and Methylobacterium as the dominant genera, while microbial communities were significantly different between two ice cores, associating with different climate conditions during deposition. Separately, ~355- and ~14,400-year-old ice were subject to viral enrichment and low-input quantitative sequencing, yielding genomic sequences for 33 vOTUs. These were virtually all unique to this study, representing 28 novel genera and not a single species shared with 225 environmentally diverse viromes. Further, 42.4% of the vOTUs were identifiable temperate, which is significantly higher than that in gut, soil, and marine viromes, and indicates that temperate phages are possibly favored in glacier-ice environments before being frozen. In silico host predictions linked 18 vOTUs to co-occurring abundant bacteria (Methylobacterium, Sphingomonas, and Janthinobacterium), indicating that these phages infected ice-abundant bacterial groups before being archived. Functional genome annotation revealed four virus-encoded auxiliary metabolic genes, particularly two motility genes suggest viruses potentially facilitate nutrient acquisition for their hosts. Finally, given their possible importance to methane cycling in ice, we focused on Methylobacterium viruses by contextualizing our ice-observed viruses against 123 viromes and prophages extracted from 131 Methylobacterium genomes, revealing that the archived viruses might originate from soil or plants. CONCLUSIONS Together, these efforts further microbial and viral sampling procedures for glacier ice and provide a first window into viral communities and functions in ancient glacier environments. Such methods and datasets can potentially enable researchers to contextualize new discoveries and begin to incorporate glacier-ice microbes and their viruses relative to past and present climate change in geographically diverse regions globally. Video Abstract.
Collapse
Affiliation(s)
- Zhi-Ping Zhong
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Funing Tian
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Simon Roux
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Natalie E Solonenko
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Yueh-Fen Li
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Mary E Davis
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA
| | - James L Van Etten
- Department of Plant Pathology and Nebraska Center for Virology, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Ellen Mosley-Thompson
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
- Department of Geography, Ohio State University, Columbus, OH, USA
| | - Virginia I Rich
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Matthew B Sullivan
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA.
- Department of Microbiology, Ohio State University, Columbus, OH, USA.
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA.
- Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, USA.
| | - Lonnie G Thompson
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA.
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA.
- School of Earth Sciences, Ohio State University, Columbus, OH, USA.
| |
Collapse
|
82
|
Nami Y, Imeni N, Panahi B. Application of machine learning in bacteriophage research. BMC Microbiol 2021; 21:193. [PMID: 34174831 PMCID: PMC8235560 DOI: 10.1186/s12866-021-02256-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 06/08/2021] [Indexed: 12/20/2022] Open
Abstract
Phages are one of the key components in the structure, dynamics, and interactions of microbial communities in different bins. It has a clear impact on human health and the food industry. Bacteriophage characterization using in vitro approaches are time/cost consuming and laborious tasks. On the other hand, with the advent of new high-throughput sequencing technology, the development of a powerful computational framework to characterize the newly identified bacteriophages is inevitable for future research. Machine learning includes powerful techniques that enable the analysis of complex datasets for knowledge discovery and pattern recognition. In this study, we have conducted a comprehensive review of machine learning methods application using different types of features were applied in various aspects of bacteriophage research including, automated curation, identification, classification, host species recognition, virion protein identification, and life cycle prediction. Moreover, potential limitations and advantages of the developed frameworks were discussed.
Collapse
Affiliation(s)
- Yousef Nami
- Department of Food Biotechnology, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Nazila Imeni
- Young Researchers and Elite Clube, Marand Branch, Islamic Azad University, Marand, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
83
|
Coutinho FH, Zaragoza-Solas A, López-Pérez M, Barylski J, Zielezinski A, Dutilh BE, Edwards R, Rodriguez-Valera F. RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content. PATTERNS 2021; 2:100274. [PMID: 34286299 PMCID: PMC8276007 DOI: 10.1016/j.patter.2021.100274] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 11/23/2020] [Accepted: 05/07/2021] [Indexed: 02/06/2023]
Abstract
Culture-independent approaches have recently shed light on the genomic diversity of viruses of prokaryotes. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), that uses scores to 43,644 protein clusters to assign hosts to complete or fragmented genomes of viruses of Archaea and Bacteria. RaFAH displayed performance comparable with that of other methods for virus-host prediction in three different benchmarks encompassing viruses from RefSeq, single amplified genomes, and metagenomes. RaFAH was applied to assembled metagenomic datasets of uncultured viruses from eight different biomes of medical, biotechnological, and environmental relevance. Our analyses led to the identification of 537 sequences of archaeal viruses representing unknown lineages, whose genomes encode novel auxiliary metabolic genes, shedding light on how these viruses interfere with the host molecular machinery. RaFAH is available at https://sourceforge.net/projects/rafah/. RaFAH was developed to predict the hosts of viruses of Bacteria and Archaea RaFAH displayed comparable or superior performance to other host-prediction tools RaFAH performed well across viromes from eight different ecosystems RaFAH identified hundreds of genomic sequences as derived from viruses of Archaea
Viruses that infect Bacteria and Archaea are ubiquitous and extremely abundant. Recent advances have led to the discovery of many thousands of complete and partial genomes of these biological entities. Understanding the biology of these viruses and how they influence their ecosystems depends on knowing which hosts they infect. We developed a tool that uses data from complete or fragmented genomes to predict the hosts of viruses using a machine-learning approach. Our tool, RaFAH, displayed performance comparable with or superior to that of other host-prediction tools. In addition, it identified hundreds of sequences as derived from the genomes of viruses of Archaea, which are one of the least characterized fractions of the global virosphere.
Collapse
Affiliation(s)
- Felipe Hernandes Coutinho
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain
| | - Asier Zaragoza-Solas
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain
| | - Mario López-Pérez
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain
| | - Jakub Barylski
- Molecular Virology Research Unit, Faculty of Biology, Adam Mickiewicz University Poznan, 61-614 Poznan, Poland
| | - Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, 61-614 Poznan, Poland
| | - Bas E Dutilh
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Medical Centre/Radboud Institute for Molecular Life Sciences, 6525 GA Nijmegen, the Netherlands.,Theoretical Biology and Bioinformatics, Science for Life, Utrecht University (UU), 3584 CH Utrecht, the Netherlands
| | - Robert Edwards
- College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
| | - Francisco Rodriguez-Valera
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain.,Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| |
Collapse
|
84
|
Global overview and major challenges of host prediction methods for uncultivated phages. Curr Opin Virol 2021; 49:117-126. [PMID: 34126465 DOI: 10.1016/j.coviro.2021.05.003] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 05/20/2021] [Accepted: 05/22/2021] [Indexed: 12/14/2022]
Abstract
Bacterial communities play critical roles across all of Earth's biomes, affecting human health and global ecosystem functioning. They do so under strong constraints exerted by viruses, that is, bacteriophages or 'phages'. Phages can reshape bacterial communities' structure, influence long-term evolution of bacterial populations, and alter host cell metabolism during infection. Metagenomics approaches, that is, shotgun sequencing of environmental DNA or RNA, recently enabled large-scale exploration of phage genomic diversity, yielding several millions of phage genomes now to be further analyzed and characterized. One major challenge however is the lack of direct host information for these phages. Several methods and tools have been proposed to bioinformatically predict the potential host(s) of uncultivated phages based only on genome sequence information. Here we review these different approaches and highlight their distinct strengths and limitations. We also outline complementary experimental assays which are being proposed to validate and refine these bioinformatic predictions.
Collapse
|
85
|
Buchholz HH, Michelsen ML, Bolaños LM, Browne E, Allen MJ, Temperton B. Efficient dilution-to-extinction isolation of novel virus-host model systems for fastidious heterotrophic bacteria. THE ISME JOURNAL 2021; 15:1585-1598. [PMID: 33495565 PMCID: PMC8163748 DOI: 10.1038/s41396-020-00872-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 12/01/2020] [Accepted: 12/07/2020] [Indexed: 02/08/2023]
Abstract
Microbes and their associated viruses are key drivers of biogeochemical processes in marine and soil biomes. While viruses of phototrophic cyanobacteria are well-represented in model systems, challenges of isolating marine microbial heterotrophs and their viruses have hampered experimental approaches to quantify the importance of viruses in nutrient recycling. A resurgence in cultivation efforts has improved the availability of fastidious bacteria for hypothesis testing, but this has not been matched by similar efforts to cultivate their associated bacteriophages. Here, we describe a high-throughput method for isolating important virus-host systems for fastidious heterotrophic bacteria that couples advances in culturing of hosts with sequential enrichment and isolation of associated phages. Applied to six monthly samples from the Western English Channel, we first isolated one new member of the globally dominant bacterial SAR11 clade and three new members of the methylotrophic bacterial clade OM43. We used these as bait to isolate 117 new phages, including the first known siphophage-infecting SAR11, and the first isolated phage for OM43. Genomic analyses of 13 novel viruses revealed representatives of three new viral genera, and infection assays showed that the viruses infecting SAR11 have ecotype-specific host ranges. Similar to the abundant human-associated phage ɸCrAss001, infection dynamics within the majority of isolates suggested either prevalent lysogeny or chronic infection, despite a lack of associated genes, or host phenotypic bistability with lysis putatively maintained within a susceptible subpopulation. Broader representation of important virus-host systems in culture collections and genomic databases will improve both our understanding of virus-host interactions, and accuracy of computational approaches to evaluate ecological patterns from metagenomic data.
Collapse
Affiliation(s)
| | | | | | - Emily Browne
- School of Biosciences, University of Exeter, Exeter, UK
| | - Michael J Allen
- School of Biosciences, University of Exeter, Exeter, UK
- Plymouth Marine Laboratory, Plymouth, UK
| | - Ben Temperton
- School of Biosciences, University of Exeter, Exeter, UK.
| |
Collapse
|
86
|
Computational Viromics: Applications of the Computational Biology in Viromics Studies. Virol Sin 2021; 36:1256-1260. [PMID: 34057678 PMCID: PMC8165334 DOI: 10.1007/s12250-021-00395-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 04/14/2021] [Indexed: 12/30/2022] Open
|
87
|
Abstract
Viruses play an essential role in shaping microbial community structures and serve as reservoirs for genetic diversity in many ecosystems. In hyperarid desert environments, where life itself becomes scarce and loses diversity, the interactions between viruses and host populations have remained elusive. Here, we resolved host-virus interactions in the soil metagenomes of the Atacama Desert hyperarid core, one of the harshest terrestrial environments on Earth. We show evidence of diverse viruses infecting a wide range of hosts found in sites up to 205 km apart. Viral genomes carried putative extremotolerance features (i.e., spore formation proteins) and auxiliary metabolic genes, indicating that viruses could mediate the spread of microbial resilience against environmental stress across the desert. We propose a mutualistic model of host-virus interactions in the hyperarid core where viruses seek protection in microbial cells as lysogens or pseudolysogens, while viral extremotolerance genes aid survival of their hosts. Our results suggest that the host-virus interactions in the Atacama Desert soils are dynamic and complex, shaping uniquely adapted microbiomes in this highly selective and hostile environment.IMPORTANCE Deserts are one of the largest and rapidly expanding terrestrial ecosystems characterized by low biodiversity and biomass. The hyperarid core of the Atacama Desert, previously thought to be devoid of life, is one of the harshest environments, supporting only scant biomass of highly adapted microbes. While there is growing evidence that viruses play essential roles in shaping the diversity and structure of nearly every ecosystem, very little is known about the role of viruses in desert soils, especially where viral contact with viable hosts is significantly reduced. Our results demonstrate that diverse viruses are widely dispersed across the desert, potentially spreading key stress resilience and metabolic genes to ensure host survival. The desertification accelerated by climate change expands both the ecosystem cover and the ecological significance of the desert virome. This study sheds light on the complex virus-host interplay that shapes the unique microbiome in desert soils.
Collapse
|
88
|
Mock F, Viehweger A, Barth E, Marz M. VIDHOP, viral host prediction with deep learning. Bioinformatics 2021; 37:318-325. [PMID: 32777818 PMCID: PMC7454304 DOI: 10.1093/bioinformatics/btaa705] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 07/17/2020] [Accepted: 08/03/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Zoonosis, the natural transmission of infections from animals to humans, is a far-reaching global problem. The recent outbreaks of Zikavirus, Ebolavirus, and Coronavirus are examples of viral zoonosis, which occur more frequently due to globalization. In case of a virus outbreak, it is helpful to know which host organism was the original carrier of the virus to prevent further spreading of viral infection. Recent approaches aim to predict a viral host based on the viral genome, often in combination with the potential host genome and arbitrarily selected features. These methods are limited in the number of different hosts they can predict or the accuracy of the prediction. Results Here, we present a fast and accurate deep learning approach for viral host prediction, which is based on the viral genome sequence only. We tested our deep neural network (DNN) on three different virus species (influenza A virus, rabies lyssavirus, rotavirus A). We achieved for each virus species an AUC between 0.93 and 0.98, allowing highly accurate predictions while using only fractions (100-400 bp) of the viral genome sequences. We show that deep neural networks are suitable to predict the host of a virus, even with a limited amount of sequences and highly unbalanced available data. The trained DNNs are the core of our virus-host prediction tool VIDHOP (VIrus Deep learning HOst Prediction). VIDHOP also allows the user to train and use models for other viruses. Availability VIDHOP is freely available under https://github.com/flomock/vidhop Supplementary information Available at DOI 10.17605/OSF.IO/UXT7
Collapse
Affiliation(s)
- Florian Mock
- RNA Bioinformatics/High Throughput Analysis, Faculty of Mathematics and Computer Science, Jena 07743, Germany
| | - Adrian Viehweger
- RNA Bioinformatics/High Throughput Analysis, Faculty of Mathematics and Computer Science, Jena 07743, Germany
| | - Emanuel Barth
- Bioinformatics Core Facility Jena, Friedrich Schiller University Jena, Jena 07743, Germany
| | - Manja Marz
- RNA Bioinformatics/High Throughput Analysis, Faculty of Mathematics and Computer Science, Jena 07743, Germany.,RNA Bioinformatics/High Throughput Analysis, Leibnitz Institute for Age Research - Fritz Lipmann Institute (FLI), Jena 07743, Germany.,RNA Bioinformatics/High Throughput Analysis, German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig 04103, Germany.,RNA Bioinformatics/High Throughput Analysis, European Virus Bioinformatics Center (EVBC), Jena 07743, Germany
| |
Collapse
|
89
|
Mongia A, Saha SK, Chouzenoux E, Majumdar A. A computational approach to aid clinicians in selecting anti-viral drugs for COVID-19 trials. Sci Rep 2021; 11:9047. [PMID: 33907209 PMCID: PMC8079380 DOI: 10.1038/s41598-021-88153-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 04/01/2021] [Indexed: 02/02/2023] Open
Abstract
The year 2020 witnessed a heavy death toll due to COVID-19, calling for a global emergency. The continuous ongoing research and clinical trials paved the way for vaccines. But, the vaccine efficacy in the long run is still questionable due to the mutating coronavirus, which makes drug re-positioning a reasonable alternative. COVID-19 has hence fast-paced drug re-positioning for the treatment of COVID-19 and its symptoms. This work builds computational models using matrix completion techniques to predict drug-virus association for drug re-positioning. The aim is to assist clinicians with a tool for selecting prospective antiviral treatments. Since the virus is known to mutate fast, the tool is likely to help clinicians in selecting the right set of antivirals for the mutated isolate. The main contribution of this work is a manually curated database publicly shared, comprising of existing associations between viruses and their corresponding antivirals. The database gathers similarity information using the chemical structure of drugs and the genomic structure of viruses. Along with this database, we make available a set of state-of-the-art computational drug re-positioning tools based on matrix completion. The tools are first analysed on a standard set of experimental protocols for drug target interactions. The best performing ones are applied for the task of re-positioning antivirals for COVID-19. These tools select six drugs out of which four are currently under various stages of trial, namely Remdesivir (as a cure), Ribavarin (in combination with others for cure), Umifenovir (as a prophylactic and cure) and Sofosbuvir (as a cure). Another unanimous prediction is Tenofovir alafenamide, which is a novel Tenofovir prodrug developed in order to improve renal safety when compared to its original counterpart (older version) Tenofovir disoproxil. Both are under trail, the former as a cure and the latter as a prophylactic. These results establish that the computational methods are in sync with the state-of-practice. We also demonstrate how the drugs to be used against the virus would vary as SARS-Cov-2 mutates over time by predicting the drugs for the mutated strains, suggesting the importance of such a tool in drug prediction. We believe this work would open up possibilities for applying machine learning models to clinical research for drug-virus association prediction and other similar biological problems.
Collapse
Affiliation(s)
| | - Sanjay Kr Saha
- Department of Community Medicine, IPGMER Kolkata, Kolkata, India
| | - Emilie Chouzenoux
- CVN, Inria Saclay, University of Paris Saclay, 91190, Gif-sur-Yvette, France.
| | | |
Collapse
|
90
|
Abdelsattar AS, Dawoud A, Makky S, Nofal R, Aziz RK, El-Shibiny A. Bacteriophages: from isolation to application. Curr Pharm Biotechnol 2021; 23:337-360. [PMID: 33902418 DOI: 10.2174/1389201022666210426092002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/29/2021] [Accepted: 03/11/2021] [Indexed: 11/22/2022]
Abstract
Bacteriophages are considered as a potential alternative to fight pathogenic bacteria during the antibiotic resistance era. With their high specificity, they are being widely used in various applications: medicine, food industry, agriculture, animal farms, biotechnology, diagnosis, etc. Many techniques have been designed by different researchers for phage isolation, purification, and amplification, each of which has strengths and weaknesses. However, all aim at having a reasonably pure phage sample that can be further characterized. Phages can be characterized based on their physiological, morphological or inactivation tests. Microscopy, in particular, has opened a wide gate not only for visualizing phage morphological structure, but also for monitoring biochemistry and behavior. Meanwhile, computational analysis of phage genomes provides more details about phage history, lifestyle, and potential for toxigenic or lysogenic conversion, which translate to safety in biocontrol and phage therapy applications. This review summarizes phage application pipelines at different levels and addresses specific restrictions and knowledge gaps in the field. Recently developed computational approaches, which are used in phage genome analysis, are critically assessed. We hope that this assessment provides researchers with useful insights for selection of suitable approaches for Phage-related research aims and applications.
Collapse
Affiliation(s)
- Abdallah S Abdelsattar
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Alyaa Dawoud
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Salsabil Makky
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Rana Nofal
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Ramy K Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Qasr El-Ainy St, Cairo. Egypt
| | - Ayman El-Shibiny
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| |
Collapse
|
91
|
Sequence Comparison Without Alignment: The SpaM Approaches. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2231:121-134. [PMID: 33289890 DOI: 10.1007/978-1-0716-1036-7_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Sequence alignment is at the heart of DNA and protein sequence analysis. For the data volumes that are nowadays produced by massively parallel sequencing technologies, however, pairwise and multiple alignment methods are often too slow. Therefore, fast alignment-free approaches to sequence comparison have become popular in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches are using the length of maximal word matches. While these methods are very fast, most of them rely on ad hoc measures of sequences similarity or dissimilarity that are hard to interpret. In this chapter, I describe a number of alignment-free methods that we developed in recent years. Our approaches are based on spaced-word matches ("SpaM"), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions. Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences using a stochastic model of molecular evolution.
Collapse
|
92
|
Hernandez-Alias X, Benisty H, Schaefer MH, Serrano L. Translational adaptation of human viruses to the tissues they infect. Cell Rep 2021; 34:108872. [PMID: 33730572 PMCID: PMC7962955 DOI: 10.1016/j.celrep.2021.108872] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 12/15/2020] [Accepted: 02/23/2021] [Indexed: 12/22/2022] Open
Abstract
Viruses need to hijack the translational machinery of the host cell for a productive infection to happen. However, given the dynamic landscape of tRNA pools among tissues, it is unclear whether different viruses infecting different tissues have adapted their codon usage toward their tropism. Here, we collect the coding sequences of 502 human-infecting viruses and determine that tropism explains changes in codon usage. Using the tRNA abundances across 23 human tissues from The Cancer Genome Atlas (TCGA), we build an in silico model of translational efficiency that validates the correspondence of the viral codon usage with the translational machinery of their tropism. For instance, we detect that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is specifically adapted to the upper respiratory tract and alveoli. Furthermore, this correspondence is specifically defined in early viral proteins. The observed tissue-specific translational efficiency could be useful for the development of antiviral therapies and vaccines.
Collapse
Affiliation(s)
- Xavier Hernandez-Alias
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain.
| | - Hannah Benisty
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Martin H Schaefer
- IEO European Institute of Oncology IRCCS, Department of Experimental Oncology, Via Adamello 16, Milan 20139, Italy.
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona 08002, Spain; ICREA, Pg. Lluís Companys 23, Barcelona 08010, Spain.
| |
Collapse
|
93
|
Ennis CC, Haeffner NN, Keyser CD, Leonard ST, Macdonald-Shedd AC, Savoie AM, Cronin TJ, Veldsman WP, Barden P, Chak STC, Baeza JA. Comparative mitochondrial genomics of sponge-dwelling snapping shrimps in the genus Synalpheus: Exploring differences between eusocial and non-eusocial species and insights into phylogenetic relationships in caridean shrimps. Gene 2021; 786:145624. [PMID: 33798681 DOI: 10.1016/j.gene.2021.145624] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/18/2021] [Accepted: 03/26/2021] [Indexed: 11/29/2022]
Abstract
The genus Synalpheus is a cosmopolitan clade of marine shrimps found in most tropical regions. Species in this genus exhibit a range of social organizations, including pair-forming, communal breeding, and eusociality, the latter only known to have evolved within this genus in the marine realm. This study examines the complete mitochondrial genomes of seven species of Synalpheus and explores differences between eusocial and non-eusocial species considering that eusociality has been shown before to affect the strength of purifying selection in mitochondrial protein coding genes. The AT-rich mitochondrial genomes of Synalpheus range from 15,421 bp to 15,782 bp in length and comprise, invariably, 13 protein-coding genes (PCGs), two ribosomal RNA genes, and 22 transfer RNA genes. A 648 bp to 994 bp long intergenic space is assumed to be the D-loop. Mitochondrial gene synteny is identical among the studied shrimps. No major differences occur between eusocial and non-eusocial species in nucleotide composition and codon usage profiles of PCGs and in the secondary structure of tRNA genes. Maximum likelihood phylogenetic analysis of the complete concatenated PCG complement of 90 species supports the monophyly of the genus Synalpheus and its family Alpheidae. Moreover, the monophyletic status of the caridean families Alvinocaridae, Atyidae, Thoridae, Lysmatidae, Palaemonidae, and Pandalidae within caridean shrimps are fully or highly supported by the analysis. We therefore conclude that mitochondrial genomes contain sufficient phylogenetic information to resolve relationships at high taxonomic levels within the Caridea. Our analysis of mitochondrial genomes in the genus Synalpheus contributes to the understanding of the coevolution between genomic architecture and sociality in caridean shrimps and other marine organisms.
Collapse
Affiliation(s)
- Caroline C Ennis
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA
| | - Nariah N Haeffner
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA
| | - Cameron D Keyser
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA
| | - Shannon T Leonard
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA
| | | | - Avery M Savoie
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA
| | - Timothy J Cronin
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA
| | - Werner P Veldsman
- Simon F. S. Li Marine Science Laboratory, School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong, SAR, China
| | - Phillip Barden
- Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ 07102, USA; Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA.
| | - Solomon T C Chak
- Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ 07102, USA; Department of Biological Sciences, SUNY College at Old Westbury, Old Westbury, NY 11568, USA.
| | - J Antonio Baeza
- Department of Biological Sciences, 132 Long Hall, Clemson University, Clemson, SC 29634, USA; Smithsonian Marine Station at Fort Pierce, 701 Seaway Drive, Fort Pierce, Florida 34949, USA; Departamento de Biología Marina, Facultad de Ciencias del Mar, Universidad Católica del Norte, Larrondo 1281, Coquimbo, Chile.
| |
Collapse
|
94
|
Li Y, Handley SA, Baldridge MT. The dark side of the gut: Virome-host interactions in intestinal homeostasis and disease. J Exp Med 2021; 218:211916. [PMID: 33760921 PMCID: PMC8006857 DOI: 10.1084/jem.20201044] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/23/2020] [Accepted: 11/25/2020] [Indexed: 12/19/2022] Open
Abstract
The diverse enteric viral communities that infect microbes and the animal host collectively constitute the gut virome. Although recent advances in sequencing and analysis of metaviromes have revealed the complexity of the virome and facilitated discovery of new viruses, our understanding of the enteric virome is still incomplete. Recent studies have uncovered how virome–host interactions can contribute to beneficial or detrimental outcomes for the host. Understanding the complex interactions between enteric viruses and the intestinal immune system is a prerequisite for elucidating their role in intestinal diseases. In this review, we provide an overview of the enteric virome composition and summarize recent findings about how enteric viruses are sensed by and, in turn, modulate host immune responses during homeostasis and disease.
Collapse
Affiliation(s)
- Yuhao Li
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO.,Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO
| | - Scott A Handley
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO.,Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO
| | - Megan T Baldridge
- Division of Infectious Diseases, Department of Medicine, Washington University School of Medicine, St. Louis, MO.,Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO
| |
Collapse
|
95
|
Moon K, Cho JC. Metaviromics coupled with phage-host identification to open the viral 'black box'. J Microbiol 2021; 59:311-323. [PMID: 33624268 DOI: 10.1007/s12275-021-1016-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 01/28/2021] [Accepted: 01/28/2021] [Indexed: 12/22/2022]
Abstract
Viruses are found in almost all biomes on Earth, with bacteriophages (phages) accounting for the majority of viral particles in most ecosystems. Phages have been isolated from natural environments using the plaque assay and liquid medium-based dilution culturing. However, phage cultivation is restricted by the current limitations in the number of culturable bacterial strains. Unlike prokaryotes, which possess universally conserved 16S rRNA genes, phages lack universal marker genes for viral taxonomy, thus restricting cultureindependent analyses of viral diversity. To circumvent these limitations, shotgun viral metagenome sequencing (i.e., metaviromics) has been developed to enable the extensive sequencing of a variety of viral particles present in the environment and is now widely used. Using metaviromics, numerous studies on viral communities have been conducted in oceans, lakes, rivers, and soils, resulting in many novel phage sequences. Furthermore, auxiliary metabolic genes such as ammonic monooxygenase C and β-lactamase have been discovered in viral contigs assembled from viral metagenomes. Current attempts to identify putative bacterial hosts of viral metagenome sequences based on sequence homology have been limited due to viral sequence variations. Therefore, culture-independent approaches have been developed to predict bacterial hosts using single-cell genomics and fluorescentlabeling. This review focuses on recent viral metagenome studies conducted in natural environments, especially in aquatic ecosystems, and their contributions to phage ecology. Here, we concluded that although metaviromics is a key tool for the study of viral ecology, this approach must be supplemented with phage-host identification, which in turn requires the cultivation of phage-bacteria systems.
Collapse
Affiliation(s)
- Kira Moon
- Biological Resources Utilization Division, Honam National Institute of Biological Resources, Mokpo, 58762, Republic of Korea
| | - Jang-Cheon Cho
- Department of Biological Sciences and Bioengineering, Inha University, Incheon, 22212, Republic of Korea.
| |
Collapse
|
96
|
Yutin N, Benler S, Shmakov SA, Wolf YI, Tolstoy I, Rayko M, Antipov D, Pevzner PA, Koonin EV. Analysis of metagenome-assembled viral genomes from the human gut reveals diverse putative CrAss-like phages with unique genomic features. Nat Commun 2021; 12:1044. [PMID: 33594055 PMCID: PMC7886860 DOI: 10.1038/s41467-021-21350-w] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 01/19/2021] [Indexed: 12/11/2022] Open
Abstract
CrAssphage is the most abundant human-associated virus and the founding member of a large group of bacteriophages, discovered in animal-associated and environmental metagenomes, that infect bacteria of the phylum Bacteroidetes. We analyze 4907 Circular Metagenome Assembled Genomes (cMAGs) of putative viruses from human gut microbiomes and identify nearly 600 genomes of crAss-like phages that account for nearly 87% of the DNA reads mapped to these cMAGs. Phylogenetic analysis of conserved genes demonstrates the monophyly of crAss-like phages, a putative virus order, and of 5 branches, potential families within that order, two of which have not been identified previously. The phage genomes in one of these families are almost twofold larger than the crAssphage genome (145-192 kilobases), with high density of self-splicing introns and inteins. Many crAss-like phages encode suppressor tRNAs that enable read-through of UGA or UAG stop-codons, mostly, in late phage genes. A distinct feature of the crAss-like phages is the recurrent switch of the phage DNA polymerase type between A and B families. Thus, comparative genomic analysis of the expanded assemblage of crAss-like phages reveals aspects of genome architecture and expression as well as phage biology that were not apparent from the previous work on phage genomics.
Collapse
Affiliation(s)
- Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Sean Benler
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Sergei A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | - Mike Rayko
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA.
| |
Collapse
|
97
|
Zhu L, Yan C, Duan G. Prediction of Virus-Receptor Interactions Based on Improving Similarities. J Comput Biol 2021; 28:650-659. [PMID: 33481654 DOI: 10.1089/cmb.2020.0544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Viral infectious diseases have been seriously threatening human health. The receptor binding is the first step of viral infection. Predicting virus-receptor interactions will be helpful for the interaction mechanism of viruses and receptors, and further find some effective ways of preventing and treating viral infectious diseases so as to reduce the morbidity and mortality caused by viruses. Some computation algorithms have been proposed for identifying potential virus-receptor interactions. However, a common problem in those methods is the presence of noise in the similarity network. A new computational model (Network Enhancement and the Regularized Least Squares [NERLS]) is proposed to predict virus-receptor interactions based on improving similarities by Network Enhancement (NE). NERLS integrates the virus sequence similarity, the receptor sequence similarity and known virus-receptor interactions. We compute the virus sequence similarity and known virus-receptor interactions to construct the virus similarity network. The receptor similarity network is constructed by the Gaussian interaction profile kernel similarity and the receptor sequence similarity. To obtain the final virus similarity network and the final receptor similarity network, NE is, respectively, applied for reducing the noise of the virus similarity network and the receptor similarity network. Finally, NERLS employs the regularized least squares to predict interactions of viruses and receptors. The experiment results show that NERLS achieves the area under curve value of 0.893 and 0.921 in 10-fold cross-validation and leave-one-out cross-validation, respectively, which is consistently superior to four related methods [which include Initial interaction scores method via the neighbors and the Laplacian regularized Least Square (IILLS), Bi-random walk on a heterogeneous network (BRWH), Laplacian regularized least squares classifier (LapRLS), and Collaborative matrix factorization (CMF)]. Furthermore, a case study also demonstrates that NERLS effectively predicts potential virus-receptor interactions.
Collapse
Affiliation(s)
- Lingzhi Zhu
- School of Computer Science and Engineering, Central South University, Changsha, China.,School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Cheng Yan
- School of Computer Science and Engineering, Central South University, Changsha, China.,School of Computer and Information, Qiannan Normal University for Nationalities, Duyun, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
98
|
Pons JC, Paez-Espino D, Riera G, Ivanova N, Kyrpides NC, Llabrés M. VPF-Class: Taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. Bioinformatics 2021; 37:1805-1813. [PMID: 33471063 PMCID: PMC8830756 DOI: 10.1093/bioinformatics/btab026] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 12/11/2020] [Accepted: 01/13/2021] [Indexed: 12/03/2022] Open
Abstract
Motivation Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. Results In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. Availability and implementation The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joan Carles Pons
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | | | - Gabriel Riera
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Berkeley, 94720, USA
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Berkeley, 94720, USA
| | - Mercè Llabrés
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, 07122, Spain
| |
Collapse
|
99
|
Boeckaerts D, Stock M, Criel B, Gerstmans H, De Baets B, Briers Y. Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins. Sci Rep 2021; 11:1467. [PMID: 33446856 PMCID: PMC7809048 DOI: 10.1038/s41598-021-81063-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 12/30/2020] [Indexed: 12/04/2022] Open
Abstract
Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.
Collapse
Affiliation(s)
- Dimitri Boeckaerts
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Bjorn Criel
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Hans Gerstmans
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven, Belgium
- MeBioS-Biosensors group, Department of BioSystems, KU Leuven, Leuven, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yves Briers
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.
| |
Collapse
|
100
|
Lu C, Zhang Z, Cai Z, Zhu Z, Qiu Y, Wu A, Jiang T, Zheng H, Peng Y. Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC Biol 2021; 19:5. [PMID: 33441133 PMCID: PMC7807511 DOI: 10.1186/s12915-020-00938-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 12/09/2020] [Indexed: 12/19/2022] Open
Abstract
Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.
Collapse
Affiliation(s)
- Congyu Lu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Zheng Zhang
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Zena Cai
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Zhaozhong Zhu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Ye Qiu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Aiping Wu
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.,Suzhou Institute of Systems Medicine, Suzhou, 215123, Jiangsu, China
| | - Taijiao Jiang
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.,Suzhou Institute of Systems Medicine, Suzhou, 215123, Jiangsu, China
| | - Heping Zheng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Yousong Peng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
| |
Collapse
|