1
|
Chen J, Sun C, Dong Y, Jin M, Lai S, Jia L, Zhao X, Wang H, Gao NL, Bork P, Liu Z, Chen W, Zhao X. Efficient Recovery of Complete Gut Viral Genomes by Combined Short- and Long-Read Sequencing. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305818. [PMID: 38240578 PMCID: PMC10987132 DOI: 10.1002/advs.202305818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/01/2023] [Indexed: 04/04/2024]
Abstract
Current metagenome assembled human gut phage catalogs contained mostly fragmented genomes. Here, comprehensive gut virome detection procedure is developed involving virus-like particle (VLP) enrichment from ≈500 g feces and combined sequencing of short- and long-read. Applied to 135 samples, a Chinese Gut Virome Catalog (CHGV) is assembled consisting of 21,499 non-redundant viral operational taxonomic units (vOTUs) that are significantly longer than those obtained by short-read sequencing and contained ≈35% (7675) complete genomes, which is ≈nine times more than those in the Gut Virome Database (GVD, ≈4%, 1,443). Interestingly, the majority (≈60%, 13,356) of the CHGV vOTUs are obtained by either long-read or hybrid assemblies, with little overlap with those assembled from only the short-read data. With this dataset, vast diversity of the gut virome is elucidated, including the identification of 32% (6,962) novel vOTUs compare to public gut virome databases, dozens of phages that are more prevalent than the crAssphages and/or Gubaphages, and several viral clades that are more diverse than the two. Finally, the functional capacities are also characterized of the CHGV encoded proteins and constructed a viral-host interaction network to facilitate future research and applications.
Collapse
Affiliation(s)
- Jingchao Chen
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Yanqi Dong
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Menglu Jin
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
| | - Senying Lai
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Longhao Jia
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Xueyang Zhao
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Na L. Gao
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- Department of Laboratory MedicineZhongnan Hospital of Wuhan UniversityWuhan UniversityWuhan430071China
| | - Peer Bork
- European Molecular Biology LaboratoryStructural and Computational Biology Unit69117HeidelbergGermany
- Max Delbrück Centre for Molecular Medicine13125BerlinGermany
- Yonsei Frontier Lab (YFL)Yonsei University03722SeoulSouth Korea
- Department of BioinformaticsBiocenterUniversity of Würzburg97070WürzburgGermany
| | - Zhi Liu
- Department of BiotechnologyCollege of Life Science and TechnologyHuazhong University of Science and Technology430074WuhanChina
| | - Wei‐Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
- Institution of Medical Artificial IntelligenceBinzhou Medical UniversityYantai264003China
| | - Xing‐Ming Zhao
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
- MOE Key Laboratory of Computational Neuroscience and Brain‐Inspired Intelligenceand MOE Frontiers Center for Brain ScienceFudan UniversityShanghai200433China
- State Key Laboratory of Medical NeurobiologyInstitute of Brain ScienceFudan UniversityShanghai200433China
- International Human Phenome Institutes (Shanghai)Shanghai200433China
| |
Collapse
|
2
|
Liu X, Liu Y, Liu J, Zhang H, Shan C, Guo Y, Gong X, Cui M, Li X, Tang M. Correlation between the gut microbiome and neurodegenerative diseases: a review of metagenomics evidence. Neural Regen Res 2024; 19:833-845. [PMID: 37843219 PMCID: PMC10664138 DOI: 10.4103/1673-5374.382223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/19/2023] [Accepted: 06/17/2023] [Indexed: 10/17/2023] Open
Abstract
A growing body of evidence suggests that the gut microbiota contributes to the development of neurodegenerative diseases via the microbiota-gut-brain axis. As a contributing factor, microbiota dysbiosis always occurs in pathological changes of neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis. High-throughput sequencing technology has helped to reveal that the bidirectional communication between the central nervous system and the enteric nervous system is facilitated by the microbiota's diverse microorganisms, and for both neuroimmune and neuroendocrine systems. Here, we summarize the bioinformatics analysis and wet-biology validation for the gut metagenomics in neurodegenerative diseases, with an emphasis on multi-omics studies and the gut virome. The pathogen-associated signaling biomarkers for identifying brain disorders and potential therapeutic targets are also elucidated. Finally, we discuss the role of diet, prebiotics, probiotics, postbiotics and exercise interventions in remodeling the microbiome and reducing the symptoms of neurodegenerative diseases.
Collapse
Affiliation(s)
- Xiaoyan Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yi Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
- Institute of Animal Husbandry, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu Province, China
| | - Junlin Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Hantao Zhang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Chaofan Shan
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yinglu Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Xun Gong
- Department of Rheumatology & Immunology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Mengmeng Cui
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Xiubin Li
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| |
Collapse
|
3
|
Mahony J. Biological and bioinformatic tools for the discovery of unknown phage-host combinations. Curr Opin Microbiol 2024; 77:102426. [PMID: 38246125 DOI: 10.1016/j.mib.2024.102426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 12/21/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024]
Abstract
The field of microbial ecology has been transformed by metagenomics in recent decades and has culminated in vast datasets that facilitate the bioinformatic dissection of complex microbial communities. Recently, attention has turned from defining the microbiota composition to the interactions and relationships that occur between members of the microbiota. Within complex microbiota, the identification of bacteriophage-host combinations has been a major challenge. Recent developments in artificial intelligence tools to predict protein structure and function as well as the relationships between bacteria and their infecting bacteriophages allow a strategic approach to identifying and validating phage-host relationships. However, biological validation of these predictions remains essential and will serve to improve the existing predictive tools. In this review, I provide an overview of the most recent developments in both bioinformatic and experimental approaches to predicting and experimentally validating unknown phage-host combinations.
Collapse
Affiliation(s)
- Jennifer Mahony
- School of Microbiology & APC Microbiome Ireland, University College Cork, Western Road, T12 YT20 Cork, Ireland.
| |
Collapse
|
4
|
Yin H, Wu S, Tan J, Guo Q, Li M, Guo J, Wang Y, Jiang X, Zhu H. IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning. Gigascience 2024; 13:giae018. [PMID: 38649300 PMCID: PMC11034026 DOI: 10.1093/gigascience/giae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND The virome obtained through virus-like particle enrichment contains a mixture of prokaryotic and eukaryotic virus-derived fragments. Accurate identification and classification of these elements are crucial to understanding their roles and functions in microbial communities. However, the rapid mutation rates of viral genomes pose challenges in developing high-performance tools for classification, potentially limiting downstream analyses. FINDINGS We present IPEV, a novel method to distinguish prokaryotic and eukaryotic viruses in viromes, with a 2-dimensional convolutional neural network combining trinucleotide pair relative distance and frequency. Cross-validation assessments of IPEV demonstrate its state-of-the-art precision, significantly improving the F1-score by approximately 22% on an independent test set compared to existing methods when query viruses share less than 30% sequence similarity with known viruses. Furthermore, IPEV outperforms other methods in accuracy on marine and gut virome samples based on annotations by sequence alignments. IPEV reduces runtime by at most 1,225 times compared to existing methods under the same computing configuration. We also utilized IPEV to analyze longitudinal samples and found that the gut virome exhibits a higher degree of temporal stability than previously observed in persistent personal viromes, providing novel insights into the resilience of the gut virome in individuals. CONCLUSIONS IPEV is a high-performance, user-friendly tool that assists biologists in identifying and classifying prokaryotic and eukaryotic viruses within viromes. The tool is available at https://github.com/basehc/IPEV.
Collapse
Affiliation(s)
- Hengchuang Yin
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Jie Tan
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Mo Li
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
| | - Jinyuan Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Yaqi Wang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China
| | - Huaiqiu Zhu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
5
|
Zhang YZ, Liu Y, Bai Z, Fujimoto K, Uematsu S, Imoto S. Zero-shot-capable identification of phage-host relationships with whole-genome sequence representation by contrastive learning. Brief Bioinform 2023; 24:bbad239. [PMID: 37466138 PMCID: PMC10516345 DOI: 10.1093/bib/bbad239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/17/2023] [Accepted: 06/08/2023] [Indexed: 07/20/2023] Open
Abstract
Accurately identifying phage-host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage-host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage-host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space. Specifically, we rephrase whole-genome sequences with frequency chaos game representation (FCGR) and learn latent embeddings that 'encapsulate' phages and host relationships through contrastive learning. The contrastive learning method works well on the imbalanced dataset. Based on the learned embeddings, a proposed pipeline named CL4PHI can predict known hosts and unseen hosts in training. We compare our method with two recently proposed state-of-the-art learning-based methods on their benchmark datasets. The experiment results demonstrate that the proposed method using contrastive learning improves the prediction accuracy on known hosts and demonstrates a zero-shot prediction capability on unseen hosts. In terms of potential applications, the rapid pace of genome sequencing across different species has resulted in a vast amount of whole-genome sequencing data that require efficient computational methods for identifying phage-host interactions. The proposed approach is expected to address this need by efficiently processing whole-genome sequences of phages and prokaryotic hosts and capturing features related to phage-host relationships for genome sequence representation. This approach can be used to accelerate the discovery of phage-host interactions and aid in the development of phage-based therapies for infectious diseases.
Collapse
Affiliation(s)
- Yao-zhong Zhang
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Yunjie Liu
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Zeheng Bai
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Kosuke Fujimoto
- Department of Immunology and Genomics, Graduate School of Medicine, Osaka Metropolitan University, Asahi-machi 1-4-3, Abeno-ku, 545-8585 Osaka, Japan
- Division of Metagenome Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Satoshi Uematsu
- Department of Immunology and Genomics, Graduate School of Medicine, Osaka Metropolitan University, Asahi-machi 1-4-3, Abeno-ku, 545-8585 Osaka, Japan
- Division of Metagenome Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| |
Collapse
|
6
|
Gonzales MEM, Ureta JC, Shrestha AMS. Protein embeddings improve phage-host interaction prediction. PLoS One 2023; 18:e0289030. [PMID: 37486915 PMCID: PMC10365317 DOI: 10.1371/journal.pone.0289030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/07/2023] [Indexed: 07/26/2023] Open
Abstract
With the growing interest in using phages to combat antimicrobial resistance, computational methods for predicting phage-host interactions have been explored to help shortlist candidate phages. Most existing models consider entire proteomes and rely on manual feature engineering, which poses difficulty in selecting the most informative sequence properties to serve as input to the model. In this paper, we framed phage-host interaction prediction as a multiclass classification problem that takes as input the embeddings of a phage's receptor-binding proteins, which are known to be the key machinery for host recognition, and predicts the host genus. We explored different protein language models to automatically encode these protein sequences into dense embeddings without the need for additional alignment or structural information. We show that the use of embeddings of receptor-binding proteins presents improvements over handcrafted genomic and protein sequence features. The highest performance was obtained using the transformer-based protein language model ProtT5, resulting in a 3% to 4% increase in weighted F1 and recall scores across different prediction confidence thresholds, compared to using selected handcrafted sequence features.
Collapse
Affiliation(s)
- Mark Edward M Gonzales
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| | - Jennifer C Ureta
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| | - Anish M S Shrestha
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Systems and Computational Biology Research Unit, Center for Natural Sciences and Environmental Research, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| |
Collapse
|
7
|
Roux S, Camargo AP, Coutinho FH, Dabdoub SM, Dutilh BE, Nayfach S, Tritt A. iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol 2023; 21:e3002083. [PMID: 37083735 PMCID: PMC10155999 DOI: 10.1371/journal.pbio.3002083] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/03/2023] [Accepted: 03/15/2023] [Indexed: 04/22/2023] Open
Abstract
The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.
Collapse
Affiliation(s)
- Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Antonio Pedro Camargo
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Shareef M Dabdoub
- Division of Biostatistics and Computational Biology, University of Iowa College of Dentistry, Iowa City, Iowa, United States of America
| | - Bas E Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, the Netherlands
| | - Stephen Nayfach
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Andrew Tritt
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
8
|
Viral Metagenomic Analysis of the Fecal Samples in Domestic Dogs (Canis lupus familiaris). Viruses 2023; 15:v15030685. [PMID: 36992396 PMCID: PMC10058366 DOI: 10.3390/v15030685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/24/2023] [Accepted: 03/02/2023] [Indexed: 03/08/2023] Open
Abstract
Canine diarrhea is a common intestinal illness that is usually caused by viruses, bacteria, and parasites, and canine diarrhea may induce morbidity and mortality of domestic dogs if treated improperly. Recently, viral metagenomics was applied to investigate the signatures of the enteric virome in mammals. In this research, the characteristics of the gut virome in healthy dogs and dogs with diarrhea were analyzed and compared using viral metagenomics. The alpha diversity analysis indicated that the richness and diversity of the gut virome in the dogs with diarrhea were much higher than the healthy dogs, while the beta diversity analysis revealed that the gut virome of the two groups was quite different. At the family level, the predominant viruses in the canine gut virome were certified to be Microviridae, Parvoviridae, Siphoviridae, Inoviridae, Podoviridae, Myoviridae, and others. At the genus level, the predominant viruses in the canine gut virome were certified to be Protoparvovirus, Inovirus, Chlamydiamicrovirus, Lambdavirus, Dependoparvovirus, Lightbulbvirus, Kostyavirus, Punavirus, Lederbergvirus, Fibrovirus, Peduovirus, and others. However, the viral communities between the two groups differed significantly. The unique viral taxa identified in the healthy dogs group were Chlamydiamicrovirus and Lightbulbvirus, while the unique viral taxa identified in the dogs with diarrhea group were Inovirus, Protoparvovirus, Lambdavirus, Dependoparvovirus, Kostyavirus, Punavirus, and other viruses. Phylogenetic analysis based on the near-complete genome sequences showed that the CPV strains collected in this study together with other CPV Chinese isolates clustered into a separate branch, while the identified CAV-2 strain D5-8081 and AAV-5 strain AAV-D5 were both the first near-complete genome sequences in China. Moreover, the predicted bacterial hosts of phages were certified to be Campylobacter, Escherichia, Salmonella, Pseudomonas, Acinetobacter, Moraxella, Mediterraneibacter, and other commensal microbiota. In conclusion, the enteric virome of the healthy dogs group and the dogs with diarrhea group was investigated and compared using viral metagenomics, and the viral communities might influence canine health and disease by interacting with the commensal gut microbiome.
Collapse
|
9
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
10
|
Andrianjakarivony HF, Bettarel Y, Armougom F, Desnues C. Phage-Host Prediction Using a Computational Tool Coupled with 16S rRNA Gene Amplicon Sequencing. Viruses 2022; 15:76. [PMID: 36680116 PMCID: PMC9862649 DOI: 10.3390/v15010076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Metagenomics studies have revealed tremendous viral diversity in aquatic environments. Yet, while the genomic data they have provided is extensive, it is unannotated. For example, most phage sequences lack accurate information about their bacterial host, which prevents reliable phage identification and the investigation of phage-host interactions. This study aimed to take this knowledge further, using a viral metagenomic framework to decipher the composition and diversity of phage communities and to predict their bacterial hosts. To this end, we used water and sediment samples collected from seven sites with varying contamination levels in the Ebrié Lagoon in Abidjan, Ivory Coast. The bacterial communities were characterized using the 16S rRNA metabarcoding approach, and a framework was developed to investigate the virome datasets that: (1) identified phage contigs with VirSorter and VIBRANT; (2) classified these contigs with MetaPhinder using the phage database (taxonomic annotation); and (3) predicted the phages' bacterial hosts with a machine learning-based tool: the Prokaryotic Virus-Host Predictor. The findings showed that the taxonomic profiles of phages and bacteria were specific to sediment or water samples. Phage sequences assigned to the Microviridae family were widespread in sediment samples, whereas phage sequences assigned to the Siphoviridae, Myoviridae and Podoviridae families were predominant in water samples. In terms of bacterial communities, the phyla Latescibacteria, Zixibacteria, Bacteroidetes, Acidobacteria, Calditrichaeota, Gemmatimonadetes, Cyanobacteria and Patescibacteria were most widespread in sediment samples, while the phyla Epsilonbacteraeota, Tenericutes, Margulisbacteria, Proteobacteria, Actinobacteria, Planctomycetes and Marinimicrobia were most prevalent in water samples. Significantly, the relative abundance of bacterial communities (at major phylum level) estimated by 16S rRNA metabarcoding and phage-host prediction were significantly similar. These results demonstrate the reliability of this novel approach for predicting the bacterial hosts of phages from shotgun metagenomic sequencing data.
Collapse
Affiliation(s)
- Harilanto Felana Andrianjakarivony
- Microbes, Evolution, Phylogeny, and Infection (MEΦI), IHU—Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005 Marseille, France
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| | - Yvan Bettarel
- MARBEC, Marine Biodiversity, Exploitation & Conservation, Université de Montpellier, CNRS, Ifremer, IRD, 093 Place Eugène Bataillon, 34090 Montpellier, France
| | - Fabrice Armougom
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| | - Christelle Desnues
- Microbes, Evolution, Phylogeny, and Infection (MEΦI), IHU—Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005 Marseille, France
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| |
Collapse
|
11
|
Abstract
Motivation Phage–host associations play important roles in microbial communities. But in natural communities, as opposed to culture-based lab studies where phages are discovered and characterized metagenomically, their hosts are generally not known. Several programs have been developed for predicting which phage infects which host based on various sequence similarity measures or machine learning approaches. These are often based on whole viral and host genomes, but in metagenomics-based studies, we rarely have whole genomes but rather must rely on contigs that are sometimes as short as hundreds of bp long. Therefore, we need programs that predict hosts of phage contigs on the basis of these short contigs. Although most existing programs can be applied to metagenomic datasets for these predictions, their accuracies are generally low. Here, we develop ContigNet, a convolutional neural network-based model capable of predicting phage–host matches based on relatively short contigs, and compare it to previously published VirHostMatcher (VHM) and WIsH. Results On the validation set, ContigNet achieves 72–85% area under the receiver operating characteristic curve (AUROC) scores, compared to the maximum of 68% by VHM or WIsH for contigs of lengths between 200 bps to 50 kbps. We also apply the model to the Metagenomic Gut Virus (MGV) catalogue, a dataset containing a wide range of draft genomes from metagenomic samples and achieve 60–70% AUROC scores compared to that of VHM and WIsH of 52%. Surprisingly, ContigNet can also be used to predict plasmid-host contig associations with high accuracy, indicating a similar genetic exchange between mobile genetic elements and their hosts. Availability and implementation The source code of ContigNet and related datasets can be downloaded from https://github.com/tianqitang1/ContigNet.
Collapse
Affiliation(s)
- Tianqi Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Fengzhu Sun
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
12
|
Shang J, Sun Y. Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning. BMC Biol 2021; 19:250. [PMID: 34819064 PMCID: PMC8611875 DOI: 10.1186/s12915-021-01180-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/29/2021] [Indexed: 11/23/2022] Open
Abstract
Background Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction. Results In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifically designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class). Conclusion HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa. Supplementary Information The online version contains supplementary material available at (10.1186/s12915-021-01180-4).
Collapse
Affiliation(s)
- Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|