1
|
Copeland CJ, Roddy JW, Schmidt AK, Secor P, Wheeler T. VIBES: a workflow for annotating and visualizing viral sequences integrated into bacterial genomes. NAR Genom Bioinform 2024; 6:lqae030. [PMID: 38584872 PMCID: PMC10993291 DOI: 10.1093/nargab/lqae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 02/05/2024] [Accepted: 03/18/2024] [Indexed: 04/09/2024] Open
Abstract
Bacteriophages are viruses that infect bacteria. Many bacteriophages integrate their genomes into the bacterial chromosome and become prophages. Prophages may substantially burden or benefit host bacteria fitness, acting in some cases as parasites and in others as mutualists. Some prophages have been demonstrated to increase host virulence. The increasing ease of bacterial genome sequencing provides an opportunity to deeply explore prophage prevalence and insertion sites. Here we present VIBES (Viral Integrations in Bacterial genomES), a workflow intended to automate prophage annotation in complete bacterial genome sequences. VIBES provides additional context to prophage annotations by annotating bacterial genes and viral proteins in user-provided bacterial and viral genomes. The VIBES pipeline is implemented as a Nextflow-driven workflow, providing a simple, unified interface for execution on local, cluster and cloud computing environments. For each step of the pipeline, a container including all necessary software dependencies is provided. VIBES produces results in simple tab-separated format and generates intuitive and interactive visualizations for data exploration. Despite VIBES's primary emphasis on prophage annotation, its generic alignment-based design allows it to be deployed as a general-purpose sequence similarity search manager. We demonstrate the utility of the VIBES prophage annotation workflow by searching for 178 Pf phage genomes across 1072 Pseudomonas spp. genomes.
Collapse
Affiliation(s)
- Conner J Copeland
- Division of Biological Sciences, University of Montana, Missoula, MT, 59812, USA
| | - Jack W Roddy
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Amelia K Schmidt
- Division of Biological Sciences, University of Montana, Missoula, MT, 59812, USA
| | - Patrick R Secor
- Division of Biological Sciences, University of Montana, Missoula, MT, 59812, USA
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
2
|
Dantas CWD, Martins DT, Nogueira WG, Alegria OVC, Ramos RTJ. Tools and methodology to in silico phage discovery in freshwater environments. Front Microbiol 2024; 15:1390726. [PMID: 38881659 PMCID: PMC11176557 DOI: 10.3389/fmicb.2024.1390726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/16/2024] [Indexed: 06/18/2024] Open
Abstract
Freshwater availability is essential, and its maintenance has become an enormous challenge. Due to population growth and climate changes, freshwater sources are becoming scarce, imposing the need for strategies for its reuse. Currently, the constant discharge of waste into water bodies from human activities leads to the dissemination of pathogenic bacteria, negatively impacting water quality from the source to the infrastructure required for treatment, such as the accumulation of biofilms. Current water treatment methods cannot keep pace with bacterial evolution, which increasingly exhibits a profile of multidrug resistance to antibiotics. Furthermore, using more powerful disinfectants may affect the balance of aquatic ecosystems. Therefore, there is a need to explore sustainable ways to control the spreading of pathogenic bacteria. Bacteriophages can infect bacteria and archaea, hijacking their host machinery to favor their replication. They are widely abundant globally and provide a biological alternative to bacterial treatment with antibiotics. In contrast to common disinfectants and antibiotics, bacteriophages are highly specific, minimizing adverse effects on aquatic microbial communities and offering a lower cost-benefit ratio in production compared to antibiotics. However, due to the difficulty involving cultivating and identifying environmental bacteriophages, alternative approaches using NGS metagenomics in combination with some bioinformatic tools can help identify new bacteriophages that can be useful as an alternative treatment against resistant bacteria. In this review, we discuss advances in exploring the virome of freshwater, as well as current applications of bacteriophages in freshwater treatment, along with current challenges and future perspectives.
Collapse
Affiliation(s)
- Carlos Willian Dias Dantas
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - David Tavares Martins
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Wylerson Guimarães Nogueira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Oscar Victor Cardenas Alegria
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Thiago Jucá Ramos
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| |
Collapse
|
3
|
Luebbert L, Sullivan DK, Carilli M, Hjörleifsson KE, Winnett AV, Chari T, Pachter L. Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.11.571168. [PMID: 38168363 PMCID: PMC10760059 DOI: 10.1101/2023.12.11.571168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering up to 1012 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.
Collapse
Affiliation(s)
- Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Alexander Viloria Winnett
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
4
|
Flores VS, Amgarten DE, Iha BKV, Ryon KA, Danko D, Tierney BT, Mason C, da Silva AM, Setubal JC. Discovery and description of novel phage genomes from urban microbiomes sampled by the MetaSUB consortium. Sci Rep 2024; 14:7913. [PMID: 38575625 PMCID: PMC10994904 DOI: 10.1038/s41598-024-58226-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/26/2024] [Indexed: 04/06/2024] Open
Abstract
Bacteriophages are recognized as the most abundant members of microbiomes and have therefore a profound impact on microbial communities through the interactions with their bacterial hosts. The International Metagenomics and Metadesign of Subways and Urban Biomes Consortium (MetaSUB) has sampled mass-transit systems in 60 cities over 3 years using metagenomics, throwing light into these hitherto largely unexplored urban environments. MetaSUB focused primarily on the bacterial community. In this work, we explored MetaSUB metagenomic data in order to recover and analyze bacteriophage genomes. We recovered and analyzed 1714 phage genomes with size at least 40 kbp, from the class Caudoviricetes, the vast majority of which (80%) are novel. The recovered genomes were predicted to belong to temperate (69%) and lytic (31%) phages. Thirty-three of these genomes have more than 200 kbp, and one of them reaches 572 kbp, placing it among the largest phage genomes ever found. In general, the phages tended to be site-specific or nearly so, but 194 genomes could be identified in every city from which phage genomes were retrieved. We predicted hosts for 48% of the phages and observed general agreement between phage abundance and the respective bacterial host abundance, which include the most common nosocomial multidrug-resistant pathogens. A small fraction of the phage genomes are carriers of antibiotic resistance genes, and such genomes tended to be particularly abundant in the sites where they were found. We also detected CRISPR-Cas systems in five phage genomes. This study expands the previously reported MetaSUB results and is a contribution to the knowledge about phage diversity, global distribution, and phage genome content.
Collapse
Affiliation(s)
- Vinicius S Flores
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, 05508-000, Brazil
| | - Deyvid E Amgarten
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, 05508-000, Brazil
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| | - Bruno Koshin Vázquez Iha
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, 05508-000, Brazil
| | | | | | - Braden T Tierney
- Weill Cornell Medicine, New York, NY, USA
- Harvard Medical School, Cambridge, MA, USA
| | | | - Aline Maria da Silva
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, 05508-000, Brazil.
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, 05508-000, Brazil.
| |
Collapse
|
5
|
Pinto Y, Chakraborty M, Jain N, Bhatt AS. Phage-inclusive profiling of human gut microbiomes with Phanta. Nat Biotechnol 2024; 42:651-662. [PMID: 37231259 DOI: 10.1038/s41587-023-01799-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/20/2023] [Indexed: 05/27/2023]
Abstract
Due to technical limitations, most gut microbiome studies have focused on prokaryotes, overlooking viruses. Phanta, a virome-inclusive gut microbiome profiling tool, overcomes the limitations of assembly-based viral profiling methods by using customized k-mer-based classification tools and incorporating recently published catalogs of gut viral genomes. Phanta's optimizations consider the small genome size of viruses, sequence homology with prokaryotes and interactions with other gut microbes. Extensive testing of Phanta on simulated data demonstrates that it quickly and accurately quantifies prokaryotes and viruses. When applied to 245 fecal metagenomes from healthy adults, Phanta identifies ~200 viral species per sample, ~5× more than standard assembly-based methods. We observe a ~2:1 ratio between DNA viruses and bacteria, with higher interindividual variability of the gut virome compared to the gut bacteriome. In another cohort, we observe that Phanta performs equally well on bulk versus virus-enriched metagenomes, making it possible to study prokaryotes and viruses in a single experiment, with a single analysis.
Collapse
Affiliation(s)
- Yishay Pinto
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA
| | | | - Navami Jain
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA
| | - Ami S Bhatt
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA.
| |
Collapse
|
6
|
Liu X, Liu Y, Liu J, Zhang H, Shan C, Guo Y, Gong X, Cui M, Li X, Tang M. Correlation between the gut microbiome and neurodegenerative diseases: a review of metagenomics evidence. Neural Regen Res 2024; 19:833-845. [PMID: 37843219 PMCID: PMC10664138 DOI: 10.4103/1673-5374.382223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/19/2023] [Accepted: 06/17/2023] [Indexed: 10/17/2023] Open
Abstract
A growing body of evidence suggests that the gut microbiota contributes to the development of neurodegenerative diseases via the microbiota-gut-brain axis. As a contributing factor, microbiota dysbiosis always occurs in pathological changes of neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis. High-throughput sequencing technology has helped to reveal that the bidirectional communication between the central nervous system and the enteric nervous system is facilitated by the microbiota's diverse microorganisms, and for both neuroimmune and neuroendocrine systems. Here, we summarize the bioinformatics analysis and wet-biology validation for the gut metagenomics in neurodegenerative diseases, with an emphasis on multi-omics studies and the gut virome. The pathogen-associated signaling biomarkers for identifying brain disorders and potential therapeutic targets are also elucidated. Finally, we discuss the role of diet, prebiotics, probiotics, postbiotics and exercise interventions in remodeling the microbiome and reducing the symptoms of neurodegenerative diseases.
Collapse
Affiliation(s)
- Xiaoyan Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yi Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
- Institute of Animal Husbandry, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu Province, China
| | - Junlin Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Hantao Zhang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Chaofan Shan
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yinglu Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Xun Gong
- Department of Rheumatology & Immunology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Mengmeng Cui
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Xiubin Li
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| |
Collapse
|
7
|
Yu X, Cheng L, Yi X, Li B, Li X, Liu X, Liu Z, Kong X. Gut phageome: challenges in research and impact on human microbiota. Front Microbiol 2024; 15:1379382. [PMID: 38585689 PMCID: PMC10995246 DOI: 10.3389/fmicb.2024.1379382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 03/11/2024] [Indexed: 04/09/2024] Open
Abstract
The human gut microbiome plays a critical role in maintaining our health. Fluctuations in the diversity and structure of the gut microbiota have been implicated in the pathogenesis of several metabolic and inflammatory conditions. Dietary patterns, medication, smoking, alcohol consumption, and physical activity can all influence the abundance of different types of microbiota in the gut, which in turn can affect the health of individuals. Intestinal phages are an essential component of the gut microbiome, but most studies predominantly focus on the structure and dynamics of gut bacteria while neglecting the role of phages in shaping the gut microbiome. As bacteria-killing viruses, the distribution of bacteriophages in the intestine, their role in influencing the intestinal microbiota, and their mechanisms of action remain elusive. Herein, we present an overview of the current knowledge of gut phages, their lifestyles, identification, and potential impact on the gut microbiota.
Collapse
Affiliation(s)
- Xiao Yu
- NHC Key Laboratory of Pneumoconiosis, Shanxi Key Laboratory of Respiratory Diseases, Department of Pulmonary and Critical Care Medicine, The First Hospital of Shanxi Medical University, Taiyuan, China
| | - Li Cheng
- Department of Clinical Laboratory and Pathology, Hospital of Shanxi People’s Armed Police, Taiyuan, China
| | - Xin Yi
- Academy of Medical Sciences, Shanxi Medical University, Taiyuan, China
| | - Bing Li
- Academy of Medical Sciences, Shanxi Medical University, Taiyuan, China
| | - Xueqin Li
- Department of Pulmonary and Critical Care Medicine, The General Hospital of Jincheng Coal Industry Group, Jincheng, China
| | - Xiang Liu
- NHC Key Laboratory of Pneumoconiosis, Shanxi Key Laboratory of Respiratory Diseases, Department of Pulmonary and Critical Care Medicine, The First Hospital of Shanxi Medical University, Taiyuan, China
| | - Zhihong Liu
- NHC Key Laboratory of Pneumoconiosis, Shanxi Key Laboratory of Respiratory Diseases, Department of Pulmonary and Critical Care Medicine, The First Hospital of Shanxi Medical University, Taiyuan, China
| | - Xiaomei Kong
- NHC Key Laboratory of Pneumoconiosis, Shanxi Key Laboratory of Respiratory Diseases, Department of Pulmonary and Critical Care Medicine, The First Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
8
|
Hegarty B, Riddell V J, Bastien E, Langenfeld K, Lindback M, Saini JS, Wing A, Zhang J, Duhaime M. Benchmarking informatics approaches for virus discovery: caution is needed when combining in silico identification methods. mSystems 2024; 9:e0110523. [PMID: 38376167 PMCID: PMC10949488 DOI: 10.1128/msystems.01105-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/24/2024] [Indexed: 02/21/2024] Open
Abstract
Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral identification tools. However, the effectiveness of this strategy is unknown. Here, we benchmarked combinations of six widely used informatics tools for viral identification and analysis (VirSorter, VirSorter2, VIBRANT, DeepVirFinder, CheckV, and Kaiju), called "rulesets." Rulesets were tested against mock metagenomes composed of taxonomically diverse sequence types and diverse aquatic metagenomes to assess the effects of the degree of viral enrichment and habitat on tool performance. We found that six rulesets achieved equivalent accuracy [Matthews Correlation Coefficient (MCC) = 0.77, Padj ≥ 0.05]. Each contained VirSorter2, and five used our "tuning removal" rule designed to remove non-viral contamination. While DeepVirFinder, VIBRANT, and VirSorter were each found once in these high-accuracy rulesets, they were not found in combination with each other: combining tools does not lead to optimal performance. Our validation suggests that the MCC plateau at 0.77 is partly caused by inaccurate labeling within reference sequence databases. In aquatic metagenomes, our highest MCC ruleset identified more viral sequences in virus-enriched (44%-46%) than in cellular metagenomes (7%-19%). While improved algorithms may lead to more accurate viral identification tools, this should be done in tandem with careful curation of sequence databases. We recommend using the VirSorter2 ruleset and our empirically derived tuning removal rule. Our analysis provides insight into methods for in silico viral identification and will enable more robust viral identification from metagenomic data sets. IMPORTANCE The identification of viruses from environmental metagenomes using informatics tools has offered critical insights in microbial ecology. However, it remains difficult for researchers to know which tools optimize viral recovery for their specific study. In an attempt to recover more viruses, studies are increasingly combining the outputs from multiple tools without validating this approach. After benchmarking combinations of six viral identification tools against mock metagenomes and environmental samples, we found that these tools should only be combined cautiously. Two to four tool combinations maximized viral recovery and minimized non-viral contamination compared with either the single-tool or the five- to six-tool ones. By providing a rigorous overview of the behavior of in silico viral identification strategies and a pipeline to replicate our process, our findings guide the use of existing viral identification tools and offer a blueprint for feature engineering of new tools that will lead to higher-confidence viral discovery in microbiome studies.
Collapse
Affiliation(s)
- Bridget Hegarty
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - James Riddell V
- Department of Microbiology, The Ohio State University, Columbus, Ohio, USA
| | - Eric Bastien
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Kathryn Langenfeld
- Department of Civil and Environmental Engineering, Stanford University, Palo Alto, California, USA
| | - Morgan Lindback
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jaspreet S. Saini
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Anthony Wing
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jessica Zhang
- Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Melissa Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
9
|
Zhong ZP, Du J, Köstlbacher S, Pjevac P, Orlić S, Sullivan MB. Viral potential to modulate microbial methane metabolism varies by habitat. Nat Commun 2024; 15:1857. [PMID: 38424049 PMCID: PMC10904782 DOI: 10.1038/s41467-024-46109-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 02/06/2024] [Indexed: 03/02/2024] Open
Abstract
Methane is a potent greenhouse gas contributing to global warming. Microorganisms largely drive the biogeochemical cycling of methane, yet little is known about viral contributions to methane metabolism (MM). We analyzed 982 publicly available metagenomes from host-associated and environmental habitats containing microbial MM genes, expanding the known MM auxiliary metabolic genes (AMGs) from three to 24, including seven genes exclusive to MM pathways. These AMGs are recovered on 911 viral contigs predicted to infect 14 prokaryotic phyla including Halobacteriota, Methanobacteriota, and Thermoproteota. Of those 24, most were encoded by viruses from rumen (16/24), with substantially fewer by viruses from environmental habitats (0-7/24). To search for additional MM AMGs from an environmental habitat, we generate metagenomes from methane-rich sediments in Vrana Lake, Croatia. Therein, we find diverse viral communities, with most viruses predicted to infect methanogens and methanotrophs and some encoding 13 AMGs that can modulate host metabolisms. However, none of these AMGs directly participate in MM pathways. Together these findings suggest that the extent to which viruses use AMGs to modulate host metabolic processes (e.g., MM) varies depending on the ecological properties of the habitat in which they dwell and is not always predictable by habitat biogeochemical properties.
Collapse
Affiliation(s)
- Zhi-Ping Zhong
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Jingjie Du
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Division of Nutritional Science, Cornell University, Ithaca, NY, USA
| | - Stephan Köstlbacher
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
- Doctoral School in Microbiology and Environmental Science, University of Vienna, Vienna, Austria
- Laboratory of Microbiology, Wageningen University and Research, Wageningen, the Netherlands
| | - Petra Pjevac
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
- Joint Microbiome Facility of the Medical University of Vienna and the University of Vienna, Vienna, Austria
| | - Sandi Orlić
- Division of Materials Chemistry, Ruđer Bošković Institute, Zagreb, Croatia.
- Center of Excellence for Science and Technology-Integration of Mediterranean Region, Zagreb, Croatia.
| | - Matthew B Sullivan
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA.
- Department of Microbiology, Ohio State University, Columbus, OH, USA.
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA.
- Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, USA.
| |
Collapse
|
10
|
Pirnay JP, Merabishvili M, De Vos D, Verbeken G. Bacteriophage Production in Compliance with Regulatory Requirements. Methods Mol Biol 2024; 2734:89-115. [PMID: 38066364 DOI: 10.1007/978-1-0716-3523-0_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
In this chapter, we discuss production requirements for therapeutic bacteriophage preparations. We review the current regulatory expectancies and focus on pragmatic production processes, implementing relevant controls to ensure the quality, safety, and efficacy of the final products. The information disclosed in this chapter can also serve as a basis for discussions with competent authorities regarding the implementation of expedited bacteriophage product development and licensing pathways, taking into account some peculiarities of bacteriophages (as compared to conventional medicines), such as their specificity for, and co-evolution with, their bacterial hosts. To maximize the potential of bacteriophages as natural controllers of bacterial populations, the implemented regulatory frameworks and manufacturing processes should not only cater to defined bacteriophage products. But, they should also facilitate personalized approaches in which bacteriophages are selected ad hoc and even trained to target the patient's infecting bacterial strain(s), whether or not in combination with other antimicrobials such as antibiotics.
Collapse
Affiliation(s)
- Jean-Paul Pirnay
- Laboratory for Molecular and Cellular Technology, Queen Astrid Military Hospital, Brussels, Belgium.
| | - Maia Merabishvili
- Laboratory for Molecular and Cellular Technology, Queen Astrid Military Hospital, Brussels, Belgium
| | - Daniel De Vos
- Laboratory for Molecular and Cellular Technology, Queen Astrid Military Hospital, Brussels, Belgium
| | - Gilbert Verbeken
- Laboratory for Molecular and Cellular Technology, Queen Astrid Military Hospital, Brussels, Belgium
| |
Collapse
|
11
|
Rossi FPN, Flores VS, Uceda-Campos G, Amgarten DE, Setubal JC, da Silva AM. Comparative Analyses of Bacteriophage Genomes. Methods Mol Biol 2024; 2802:427-453. [PMID: 38819567 DOI: 10.1007/978-1-0716-3838-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Bacterial viruses (bacteriophages or phages) are the most abundant and diverse biological entities on Earth. There is a renewed worldwide interest in phage-centered research motivated by their enormous potential as antimicrobials to cope with multidrug-resistant pathogens. An ever-growing number of complete phage genomes are becoming available, derived either from newly isolated phages (cultivated phages) or recovered from metagenomic sequencing data (uncultivated phages). Robust comparative analysis is crucial for a comprehensive understanding of genotypic variations of phages and their related evolutionary processes, and to investigate the interaction mechanisms between phages and their hosts. In this chapter, we present a protocol for phage comparative genomics employing tools selected out of the many currently available, focusing on complete genomes of phages classified in the class Caudoviricetes. This protocol provides accurate identification of similarities, differences, and patterns among new and previously known complete phage genomes as well as phage clustering and taxonomic classification.
Collapse
Affiliation(s)
| | - Vinicius Sousa Flores
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil
| | - Guillermo Uceda-Campos
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil
| | | | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil
| | - Aline Maria da Silva
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil.
| |
Collapse
|
12
|
Ha AD, Aylward FO. Automated classification of giant virus genomes using a random forest model built on trademark protein families. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566645. [PMID: 38014039 PMCID: PMC10680617 DOI: 10.1101/2023.11.10.566645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Viruses of the phylum Nucleocytoviricota , often referred to as "giant viruses," are prevalent in various environments around the globe and play significant roles in shaping eukaryotic diversity and activities in global ecosystems. Given the extensive phylogenetic diversity within this viral group and the highly complex composition of their genomes, taxonomic classification of giant viruses, particularly incomplete metagenome-assembled genomes (MAGs) can present a considerable challenge. Here we developed TIGTOG ( T axonomic Information of G iant viruses using T rademark O rthologous G roups), a machine learning-based approach to predict the taxonomic classification of novel giant virus MAGs based on profiles of protein family content. We applied a random forest algorithm to a training set of 1,531 quality-checked, phylogenetically diverse Nucleocytoviricota genomes using pre-selected sets of giant virus orthologous groups (GVOGs). The classification models were predictive of viral taxonomic assignments with a cross-validation accuracy of 99.6% to the order level and 97.3% to the family level. We found that no individual GVOGs or genome features significantly influenced the algorithm's performance or the models' predictions, indicating that classification predictions were based on a comprehensive genomic signature, which reduced the necessity of a fixed set of marker genes for taxonomic assigning purposes. Our classification models were validated with an independent test set of 823 giant virus genomes with varied genomic completeness and taxonomy and demonstrated an accuracy of 98.6% and 95.9% to the order and family level, respectively. Our results indicate that protein family profiles can be used to accurately classify large DNA viruses at different taxonomic levels and provide a fast and accurate method for the classification of giant viruses. This approach could easily be adapted to other viral groups.
Collapse
|
13
|
Fouks B, Harrison MC, Mikhailova AA, Marchal E, English S, Carruthers M, Jennings EC, Chiamaka EL, Frigard RA, Pippel M, Attardo GM, Benoit JB, Bornberg-Bauer E, Tobe SS. Live-bearing cockroach genome reveals convergent evolutionary mechanisms linked to viviparity in insects and beyond. iScience 2023; 26:107832. [PMID: 37829199 PMCID: PMC10565785 DOI: 10.1016/j.isci.2023.107832] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 02/13/2023] [Accepted: 09/01/2023] [Indexed: 10/14/2023] Open
Abstract
Live birth (viviparity) has arisen repeatedly and independently among animals. We sequenced the genome and transcriptome of the viviparous Pacific beetle-mimic cockroach and performed comparative analyses with two other viviparous insect lineages, tsetse flies and aphids, to unravel the basis underlying the transition to viviparity in insects. We identified pathways undergoing adaptive evolution for insects, involved in urogenital remodeling, tracheal system, heart development, and nutrient metabolism. Transcriptomic analysis of cockroach and tsetse flies revealed that uterine remodeling and nutrient production are increased and the immune response is altered during pregnancy, facilitating structural and physiological changes to accommodate and nourish the progeny. These patterns of convergent evolution of viviparity among insects, together with similar adaptive mechanisms identified among vertebrates, highlight that the transition to viviparity requires changes in urogenital remodeling, enhanced tracheal and heart development (corresponding to angiogenesis in vertebrates), altered nutrient metabolism, and shifted immunity in animal systems.
Collapse
Affiliation(s)
- Bertrand Fouks
- University of Münster, Institute for Evolution and Biodiversity, Molecular Evolution and Bioinformatics, Hüfferstrasse 1, 48149 Münster, Germany
| | - Mark C. Harrison
- University of Münster, Institute for Evolution and Biodiversity, Molecular Evolution and Bioinformatics, Hüfferstrasse 1, 48149 Münster, Germany
| | - Alina A. Mikhailova
- University of Münster, Institute for Evolution and Biodiversity, Molecular Evolution and Bioinformatics, Hüfferstrasse 1, 48149 Münster, Germany
| | - Elisabeth Marchal
- Department of Biology, Molecular Developmental Physiology and Signal Transduction Lab., Division of Animal Physiology and Neurobiology, Naamsestraat 59-Box 2465, B-3000 Leuven, Belgium
| | - Sinead English
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
| | | | - Emily C. Jennings
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Ezemuoka L. Chiamaka
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Ronja A. Frigard
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Geoffrey M. Attardo
- Department of Entomology and Nematology, College of Agriculture and Environmental Sciences, University of California, Davis, Davis, CA, USA
| | - Joshua B. Benoit
- Department of Biological Sciences, University of Cincinnati, Cincinnati, OH 45221, USA
| | - Erich Bornberg-Bauer
- University of Münster, Institute for Evolution and Biodiversity, Molecular Evolution and Bioinformatics, Hüfferstrasse 1, 48149 Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| | - Stephen S. Tobe
- Department of Biology, Molecular Developmental Physiology and Signal Transduction Lab., Division of Animal Physiology and Neurobiology, Naamsestraat 59-Box 2465, B-3000 Leuven, Belgium
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| |
Collapse
|
14
|
Copeland CJ, Roddy JW, Schmidt AK, Secor PR, Wheeler TJ. VIBES: A Workflow for Annotating and Visualizing Viral Sequences Integrated into Bacterial Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.17.562434. [PMID: 37905003 PMCID: PMC10614876 DOI: 10.1101/2023.10.17.562434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Bacteriophages are viruses that infect bacteria. Many bacteriophages integrate their genomes into the bacterial chromosome and become prophages. Prophages may substantially burden or benefit host bacteria fitness, acting in some cases as parasites and in others as mutualists, and have been demonstrated to increase host virulence. The increasing ease of bacterial genome sequencing provides an opportunity to deeply explore prophage prevalence and insertion sites. Here we present VIBES, a workflow intended to automate prophage annotation in complete bacterial genome sequences. VIBES provides additional context to prophage annotations by annotating bacterial genes and viral proteins in user-provided bacterial and viral genomes. The VIBES pipeline is implemented as a Nextflow-driven workflow, providing a simple, unified interface for execution on local, cluster, and cloud computing environments. For each step of the pipeline, a container including all necessary software dependencies is provided. VIBES produces results in simple tab separated format and generates intuitive and interactive visualizations for data exploration. Despite VIBES' primary emphasis on prophage annotation, its generic alignment-based design allows it to be deployed as a general-purpose sequence similarity search manager. We demonstrate the utility of the VIBES prophage annotation workflow by searching for 178 Pf phage genomes across 1,072 Pseudomonas spp. genomes. VIBES software is available at https://github.com/TravisWheelerLab/VIBES.
Collapse
Affiliation(s)
- Conner J. Copeland
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Jack W. Roddy
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA
| | - Amelia K. Schmidt
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Patrick R. Secor
- Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Travis J. Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
15
|
Mallawaarachchi V, Roach MJ, Decewicz P, Papudeshi B, Giles SK, Grigson SR, Bouras G, Hesse RD, Inglis LK, Hutton ALK, Dinsdale EA, Edwards RA. Phables: from fragmented assemblies to high-quality bacteriophage genomes. Bioinformatics 2023; 39:btad586. [PMID: 37738590 PMCID: PMC10563150 DOI: 10.1093/bioinformatics/btad586] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/14/2023] [Accepted: 09/19/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Microbial communities have a profound impact on both human health and various environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of challenges in viral assembly, fragmentation of genomes can occur, and existing tools may recover incomplete genome fragments. Therefore, the identification and characterization of novel phage genomes remain a challenge, leading to the need of improved approaches for phage genome recovery. RESULTS We introduce Phables, a new computational method to resolve phage genomes from fragmented viral metagenome assemblies. Phables identifies phage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that Phables recovers on average over 49% more high-quality phage genomes compared to existing viral identification tools. Furthermore, Phables can resolve variant phage genomes with over 99% average nucleotide identity, a distinction that existing tools are unable to make. AVAILABILITY AND IMPLEMENTATION Phables is available on GitHub at https://github.com/Vini2/phables.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Michael J Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Przemyslaw Decewicz
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
- Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw 02-096, Poland
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Sarah K Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Department of Surgery—Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, South Australia 5000, Australia
| | - Ryan D Hesse
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Laura K Inglis
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Abbey L K Hutton
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| |
Collapse
|
16
|
Džunková M, Moraru C, Anantharaman K. Editorial: Advances in viromics: new tools, challenges, and data towards characterizing human and environmental viromes. Front Microbiol 2023; 14:1290062. [PMID: 37822741 PMCID: PMC10562684 DOI: 10.3389/fmicb.2023.1290062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 09/14/2023] [Indexed: 10/13/2023] Open
Affiliation(s)
- Mária Džunková
- Institute for Integrative Systems Biology, University of Valencia and Consejo Superior de Investigaciones Científicas (CSIC), Valencia, Spain
| | - Cristina Moraru
- Environmental Metagenomics, Faculty of Chemistry, Research Center One Health Ruhr of the University Alliance Ruhr, University of Duisburg-Essen, Essen, Germany
| | - Karthik Anantharaman
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
17
|
Mallawaarachchi V, Roach MJ, Decewicz P, Papudeshi B, Giles SK, Grigson SR, Bouras G, Hesse RD, Inglis LK, Hutton ALK, Dinsdale EA, Edwards RA. Phables: from fragmented assemblies to high-quality bacteriophage genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535632. [PMID: 37066369 PMCID: PMC10104058 DOI: 10.1101/2023.04.04.535632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Microbial communities influence both human health and different environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies, and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of the challenges in viral assembly, fragmentation of genomes can occur, leading to the need for new approaches in viral identification. Therefore, the identification and characterisation of novel phages remain a challenge. We introduce Phables, a new computational method to resolve phage genomes from fragmented viral metagenome assemblies. Phables identifies phage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that Phables recovers on average over 49% more high-quality phage genomes compared to existing viral identification tools. Furthermore, Phables can resolve variant phage genomes with over 99% average nucleotide identity, a distinction that existing tools are unable to make. Phables is available on GitHub at https://github.com/Vini2/phables.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Michael J Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Przemyslaw Decewicz
- Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw 02-096, Poland
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Sarah K Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - George Bouras
- Adelaide Medical School, The University of Adelaide, North Tce, Adelaide, SA, 5000, Australia
| | - Ryan D Hesse
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Laura K Inglis
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Abbey L K Hutton
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| |
Collapse
|
18
|
Vik D, Bolduc B, Roux S, Sun CL, Pratama AA, Krupovic M, Sullivan MB. MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets. ISME COMMUNICATIONS 2023; 3:87. [PMID: 37620369 PMCID: PMC10449787 DOI: 10.1038/s43705-023-00295-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 08/04/2023] [Accepted: 08/09/2023] [Indexed: 08/26/2023]
Abstract
Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats.
Collapse
Affiliation(s)
- Dean Vik
- Department of Microbiology, The Ohio State University, Columbus, OH, 43210, USA.
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA.
| | - Benjamin Bolduc
- Department of Microbiology, The Ohio State University, Columbus, OH, 43210, USA
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christine L Sun
- Department of Microbiology, The Ohio State University, Columbus, OH, 43210, USA
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA
| | - Akbar Adjie Pratama
- Department of Microbiology, The Ohio State University, Columbus, OH, 43210, USA
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA
| | - Mart Krupovic
- Archaeal Virology Unit, Institut Pasteur, Université Paris Cité, CNRS UMR6047, Paris, France
| | - Matthew B Sullivan
- Department of Microbiology, The Ohio State University, Columbus, OH, 43210, USA.
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA.
- Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
19
|
Zhong ZP, Vik D, Rapp JZ, Zablocki O, Maughan H, Temperton B, Deming JW, Sullivan MB. Lower viral evolutionary pressure under stable versus fluctuating conditions in subzero Arctic brines. MICROBIOME 2023; 11:174. [PMID: 37550784 PMCID: PMC10405475 DOI: 10.1186/s40168-023-01619-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 07/12/2023] [Indexed: 08/09/2023]
Abstract
BACKGROUND Climate change threatens Earth's ice-based ecosystems which currently offer archives and eco-evolutionary experiments in the extreme. Arctic cryopeg brine (marine-derived, within permafrost) and sea ice brine, similar in subzero temperature and high salinity but different in temporal stability, are inhabited by microbes adapted to these extreme conditions. However, little is known about their viruses (community composition, diversity, interaction with hosts, or evolution) or how they might respond to geologically stable cryopeg versus fluctuating sea ice conditions. RESULTS We used long- and short-read viromics and metatranscriptomics to study viruses in Arctic cryopeg brine, sea ice brine, and underlying seawater, recovering 11,088 vOTUs (~species-level taxonomic unit), a 4.4-fold increase of known viruses in these brines. More specifically, the long-read-powered viromes doubled the number of longer (≥25 kb) vOTUs generated and recovered more hypervariable regions by >5-fold compared to short-read viromes. Distribution assessment, by comparing to known viruses in public databases, supported that cryopeg brine viruses were of marine origin yet distinct from either sea ice brine or seawater viruses, while 94% of sea ice brine viruses were also present in seawater. A virus-encoded, ecologically important exopolysaccharide biosynthesis gene was identified, and many viruses (~half of metatranscriptome-inferred "active" vOTUs) were predicted as actively infecting the dominant microbial genera Marinobacter and Polaribacter in cryopeg and sea ice brines, respectively. Evolutionarily, microdiversity (intra-species genetic variations) analyses suggested that viruses within the stable cryopeg brine were under significantly lower evolutionary pressures than those in the fluctuating sea ice environment, while many sea ice brine virus-tail genes were under positive selection, indicating virus-host co-evolutionary arms races. CONCLUSIONS Our results confirmed the benefits of long-read-powered viromics in understanding the environmental virosphere through significantly improved genomic recovery, expanding viral discovery and the potential for biological inference. Evidence of viruses actively infecting the dominant microbes in subzero brines and modulating host metabolism underscored the potential impact of viruses on these remote and underexplored extreme ecosystems. Microdiversity results shed light on different strategies viruses use to evolve and adapt when extreme conditions are stable versus fluctuating. Together, these findings verify the value of long-read-powered viromics and provide foundational data on viral evolution and virus-microbe interactions in Earth's destabilized and rapidly disappearing cryosphere. Video Abstract.
Collapse
Affiliation(s)
- Zhi-Ping Zhong
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Dean Vik
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | - Josephine Z Rapp
- Department of Biology, Université Laval, Québec, QC, Canada
- Center for Northern Studies (CEN), Université Laval, Québec, QC, Canada
| | - Olivier Zablocki
- Department of Microbiology, Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA
| | | | - Ben Temperton
- School of Biosciences, University of Exeter, Exeter, Devon, UK
| | - Jody W Deming
- School of Oceanography and Astrobiology Program, University of Washington, Seattle, WA, USA.
| | - Matthew B Sullivan
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, USA.
- Department of Microbiology, Ohio State University, Columbus, OH, USA.
- Center of Microbiome Science, Ohio State University, Columbus, OH, USA.
- Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, USA.
| |
Collapse
|
20
|
Rangel-Pineros G, Almeida A, Beracochea M, Sakharova E, Marz M, Reyes Muñoz A, Hölzer M, Finn RD. VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models. PLoS Comput Biol 2023; 19:e1011422. [PMID: 37639475 PMCID: PMC10491390 DOI: 10.1371/journal.pcbi.1011422] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 09/08/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.
Collapse
Affiliation(s)
- Guillermo Rangel-Pineros
- The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogota, Colombia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Alexandre Almeida
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Martin Beracochea
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Ekaterina Sakharova
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Manja Marz
- RNA Bioinformatics, Friedrich Schiller University, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University, Jena, Germany
| | - Alejandro Reyes Muñoz
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogota, Colombia
| | - Martin Hölzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
- European Virus Bioinformatics Center, Friedrich Schiller University, Jena, Germany
- Methodology and Research Infrastructure, Genome Competence Center (MF1), Robert Koch Institute, Berlin, Germany
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|
21
|
Miao Y, Bian J, Dong G, Dai T. DETIRE: a hybrid deep learning model for identifying viral sequences from metagenomes. Front Microbiol 2023; 14:1169791. [PMID: 37396369 PMCID: PMC10313334 DOI: 10.3389/fmicb.2023.1169791] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 05/18/2023] [Indexed: 07/04/2023] Open
Abstract
A metagenome contains all DNA sequences from an environmental sample, including viruses, bacteria, archaea, and eukaryotes. Since viruses are of huge abundance and have caused vast mortality and morbidity to human society in history as a type of major pathogens, detecting viruses from metagenomes plays a crucial role in analyzing the viral component of samples and is the very first step for clinical diagnosis. However, detecting viral fragments directly from the metagenomes is still a tough issue because of the existence of a huge number of short sequences. In this study a hybrid Deep lEarning model for idenTifying vIral sequences fRom mEtagenomes (DETIRE) is proposed to solve the problem. First, the graph-based nucleotide sequence embedding strategy is utilized to enrich the expression of DNA sequences by training an embedding matrix. Then, the spatial and sequential features are extracted by trained CNN and BiLSTM networks, respectively, to enrich the features of short sequences. Finally, the two sets of features are weighted combined for the final decision. Trained by 220,000 sequences of 500 bp subsampled from the Virus and Host RefSeq genomes, DETIRE identifies more short viral sequences (<1,000 bp) than the three latest methods, such as DeepVirFinder, PPR-Meta, and CHEER. DETIRE is freely available at Github (https://github.com/crazyinter/DETIRE).
Collapse
|
22
|
Ho SFS, Wheeler NE, Millard AD, van Schaik W. Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data. MICROBIOME 2023; 11:84. [PMID: 37085924 PMCID: PMC10120246 DOI: 10.1186/s40168-023-01533-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
BACKGROUND The prediction of bacteriophage sequences in metagenomic datasets has become a topic of considerable interest, leading to the development of many novel bioinformatic tools. A comparative analysis of ten state-of-the-art phage identification tools was performed to inform their usage in microbiome research. METHODS Artificial contigs generated from complete RefSeq genomes representing phages, plasmids, and chromosomes, and a previously sequenced mock community containing four phage species, were used to evaluate the precision, recall, and F1 scores of the tools. We also generated a dataset of randomly shuffled sequences to quantify false-positive calls. In addition, a set of previously simulated viromes was used to assess diversity bias in each tool's output. RESULTS VIBRANT and VirSorter2 achieved the highest F1 scores (0.93) in the RefSeq artificial contigs dataset, with several other tools also performing well. Kraken2 had the highest F1 score (0.86) in the mock community benchmark by a large margin (0.3 higher than DeepVirFinder in second place), mainly due to its high precision (0.96). Generally, k-mer-based tools performed better than reference similarity tools and gene-based methods. Several tools, most notably PPR-Meta, called a high number of false positives in the randomly shuffled sequences. When analysing the diversity of the genomes that each tool predicted from a virome set, most tools produced a viral genome set that had similar alpha- and beta-diversity patterns to the original population, with Seeker being a notable exception. CONCLUSIONS This study provides key metrics used to assess performance of phage detection tools, offers a framework for further comparison of additional viral discovery tools, and discusses optimal strategies for using these tools. We highlight that the choice of tool for identification of phages in metagenomic datasets, as well as their parameters, can bias the results and provide pointers for different use case scenarios. We have also made our benchmarking dataset available for download in order to facilitate future comparisons of phage identification tools. Video Abstract.
Collapse
Affiliation(s)
- Siu Fung Stanley Ho
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Nicole E. Wheeler
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Andrew D. Millard
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Willem van Schaik
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
23
|
Kumar R, Yadav G, Kuddus M, Ashraf GM, Singh R. Unlocking the microbial studies through computational approaches: how far have we reached? ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:48929-48947. [PMID: 36920617 PMCID: PMC10016191 DOI: 10.1007/s11356-023-26220-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 02/24/2023] [Indexed: 04/16/2023]
Abstract
The metagenomics approach accelerated the study of genetic information from uncultured microbes and complex microbial communities. In silico research also facilitated an understanding of protein-DNA interactions, protein-protein interactions, docking between proteins and phyto/biochemicals for drug design, and modeling of the 3D structure of proteins. These in silico approaches provided insight into analyzing pathogenic and nonpathogenic strains that helped in the identification of probable genes for vaccines and antimicrobial agents and comparing whole-genome sequences to microbial evolution. Artificial intelligence, more precisely machine learning (ML) and deep learning (DL), has proven to be a promising approach in the field of microbiology to handle, analyze, and utilize large data that are generated through nucleic acid sequencing and proteomics. This enabled the understanding of the functional and taxonomic diversity of microorganisms. ML and DL have been used in the prediction and forecasting of diseases and applied to trace environmental contaminants and environmental quality. This review presents an in-depth analysis of the recent application of silico approaches in microbial genomics, proteomics, functional diversity, vaccine development, and drug design.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, MO, USA
| | - Garima Yadav
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India
| | - Mohammed Kuddus
- Department of Biochemistry, College of Medicine, University of Hail, Hail, Saudi Arabia
| | - Ghulam Md Ashraf
- Department of Medical Laboratory Sciences, College of Health Sciences, and Sharjah Institute for Medical Research, University of Sharjah, Sharjah , 27272, United Arab Emirates
| | - Rachana Singh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India.
| |
Collapse
|
24
|
Schackart KE, Graham JB, Ponsero AJ, Hurwitz BL. Evaluation of computational phage detection tools for metagenomic datasets. Front Microbiol 2023; 14:1078760. [PMID: 36760501 PMCID: PMC9902911 DOI: 10.3389/fmicb.2023.1078760] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/09/2023] [Indexed: 01/25/2023] Open
Abstract
Introduction As new computational tools for detecting phage in metagenomes are being rapidly developed, a critical need has emerged to develop systematic benchmarks. Methods In this study, we surveyed 19 metagenomic phage detection tools, 9 of which could be installed and run at scale. Those 9 tools were assessed on several benchmark challenges. Fragmented reference genomes are used to assess the effects of fragment length, low viral content, phage taxonomy, robustness to eukaryotic contamination, and computational resource usage. Simulated metagenomes are used to assess the effects of sequencing and assembly quality on the tool performances. Finally, real human gut metagenomes and viromes are used to assess the differences and similarities in the phage communities predicted by the tools. Results We find that the various tools yield strikingly different results. Generally, tools that use a homology approach (VirSorter, MARVEL, viralVerify, VIBRANT, and VirSorter2) demonstrate low false positive rates and robustness to eukaryotic contamination. Conversely, tools that use a sequence composition approach (VirFinder, DeepVirFinder, Seeker), and MetaPhinder, have higher sensitivity, including to phages with less representation in reference databases. These differences led to widely differing predicted phage communities in human gut metagenomes, with nearly 80% of contigs being marked as phage by at least one tool and a maximum overlap of 38.8% between any two tools. While the results were more consistent among the tools on viromes, the differences in results were still significant, with a maximum overlap of 60.65%. Discussion: Importantly, the benchmark datasets developed in this study are publicly available and reusable to enable the future comparability of new tools developed.
Collapse
Affiliation(s)
- Kenneth E. Schackart
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
| | - Jessica B. Graham
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| | - Alise J. Ponsero
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
- Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Bonnie L. Hurwitz
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
25
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
26
|
Liu Q, Liu F, Miao Y, He J, Dong T, Hou T, Liu Y. Virsearcher: Identifying Bacteriophages from Metagenomes by Combining Convolutional Neural Network and Gene Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:763-774. [PMID: 35316191 DOI: 10.1109/tcbb.2022.3161135] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Metagenome sequencing provides an unprecedented opportunity for the discovery of unknown microbes and viruses. A large number of phages and prokaryotes are mixed together in metagenomes. To study the influence of phages on human bodies and environments, it is of great significance to isolate phages from metagenomes. However, it is difficult to identify novel phages because of the diversity of their sequences and the frequent presence of short contigs in metagenomes. Here, virSearcher is developed to identify phages from metagenomes by combining the convolutional neural network (CNN) and the gene information of input sequences. Firstly, an input sequence is encoded in accordance with the different functions of its coding and the non-coding regions and then is converted into word embedding code through a word embedding layer before a convolutional layer. Meanwhile, the hit ratio of the virus genes is combined with the output of the CNN to further improve the performance of the network. The genes used by virSearcher consist of complete and incomplete genes. Experiments on several metagenomes have showed that, compared with others, virSearcher can significantly improve the performance for the identification of short sequences, while maintaining the performance for long ones. The source code of virSearcher is freely available from http://github.com/DrJackson18/virSearcher.
Collapse
|
27
|
Mangalea MR, Keift K, Duerkop BA, Anantharaman K. Assembly and Annotation of Viral Metagenomes from Short-Read Sequencing Data. Methods Mol Biol 2023; 2649:317-337. [PMID: 37258871 DOI: 10.1007/978-1-0716-3072-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Viral metagenomics enables the detection, characterization, and quantification of viral sequences present in shotgun-sequenced datasets of purified virus-like particles and whole metagenomes. Next generation sequencing (Illumina) derived short single or paired-end read runs are a principal platform for metagenomics, and assembly of short reads allows for the identification of distinguishing viral signatures and complex genomic features for taxonomy and functional annotation. Here we describe the identification and characterization of viral genome sequences, bacteriophages, and eukaryotic viruses, from a cohort of human stool samples, using multiple methods. Following the purification of virus-like particles, sequencing, quality refinement, and genome assembly, we begin the protocol with raw short reads deposited in an open-source nucleotide archive. We highlight the use of VIBRANT, an automated computational tool for the characterization of microbial viruses and their viral community function. Finally, we also describe an alternative assembly-free option of mapping reads to established databases of reference genomes and previously characterized metagenome-assembled viral genomes.
Collapse
Affiliation(s)
- Mihnea R Mangalea
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Kristopher Keift
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Breck A Duerkop
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | | |
Collapse
|
28
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
29
|
Phage Therapy for Crops: Concepts, Experimental and Bioinformatics Approaches to Direct Its Application. Int J Mol Sci 2022; 24:ijms24010325. [PMID: 36613768 PMCID: PMC9820149 DOI: 10.3390/ijms24010325] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/14/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Phage therapy consists of applying bacteriophages, whose natural function is to kill specific bacteria. Bacteriophages are safe, evolve together with their host, and are environmentally friendly. At present, the indiscriminate use of antibiotics and salt minerals (Zn2+ or Cu2+) has caused the emergence of resistant strains that infect crops, causing difficulties and loss of food production. Phage therapy is an alternative that has shown positive results and can improve the treatments available for agriculture. However, the success of phage therapy depends on finding effective bacteriophages. This review focused on describing the potential, up to now, of applying phage therapy as an alternative treatment against bacterial diseases, with sustainable improvement in food production. We described the current isolation techniques, characterization, detection, and selection of lytic phages, highlighting the importance of complementary studies using genome analysis of the phage and its host. Finally, among these studies, we concentrated on the most relevant bacteriophages used for biocontrol of Pseudomonas spp., Xanthomonas spp., Pectobacterium spp., Ralstonia spp., Burkholderia spp., Dickeya spp., Clavibacter michiganensis, and Agrobacterium tumefaciens as agents that cause damage to crops, and affect food production around the world.
Collapse
|
30
|
Aytan-Aktug D, Grigorjev V, Szarvas J, Clausen PTLC, Munk P, Nguyen M, Davis JJ, Aarestrup FM, Lund O. SourceFinder: a Machine-Learning-Based Tool for Identification of Chromosomal, Plasmid, and Bacteriophage Sequences from Assemblies. Microbiol Spectr 2022; 10:e0264122. [PMID: 36377945 PMCID: PMC9769690 DOI: 10.1128/spectrum.02641-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
High-throughput genome sequencing technologies enable the investigation of complex genetic interactions, including the horizontal gene transfer of plasmids and bacteriophages. However, identifying these elements from assembled reads remains challenging due to genome sequence plasticity and the difficulty in assembling complete sequences. In this study, we developed a classifier, using random forest, to identify whether sequences originated from bacterial chromosomes, plasmids, or bacteriophages. The classifier was trained on a diverse collection of 23,211 chromosomal, plasmid, and bacteriophage sequences from hundreds of bacterial species. In order to adapt the classifier to incomplete sequences, each complete sequence was subsampled into 5,000 nucleotide fragments and further subdivided into k-mers. This three-class classifier succeeded in identifying chromosomes, plasmids, and bacteriophages using k-mer distributions of complete and partial genome sequences, including simulated metagenomic scaffolds with minimum performance of 0.939 area under the receiver operating characteristic curve (AUC). This classifier, implemented as SourceFinder, has been made available as an online web service to help the community with predicting the chromosomal, plasmid, and bacteriophage sources of assembled bacterial sequence data (https://cge.food.dtu.dk/services/SourceFinder/). IMPORTANCE Extra-chromosomal genes encoding antimicrobial resistance, metal resistance, and virulence provide selective advantages for bacterial survival under stress conditions and pose serious threats to human and animal health. These accessory genes can impact the composition of microbiomes by providing selective advantages to their hosts. Accurately identifying extra-chromosomal elements in genome sequence data are critical for understanding gene dissemination trajectories and taking preventative measures. Therefore, in this study, we developed a random forest classifier for identifying the source of bacterial chromosomal, plasmid, and bacteriophage sequences.
Collapse
Affiliation(s)
- Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Vladislav Grigorjev
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Judit Szarvas
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Patrick Munk
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Marcus Nguyen
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, Illinois, USA
| | - James J. Davis
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, Illinois, USA
| | - Frank M. Aarestrup
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ole Lund
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
31
|
Greig DR, Bird MT, Chattaway MA, Langridge GC, Waters EV, Ribeca P, Jenkins C, Nair S. Characterization of a P1-bacteriophage-like plasmid (phage-plasmid) harbouring bla CTX-M-15 in Salmonella enterica serovar Typhi. Microb Genom 2022; 8:mgen000913. [PMID: 36748517 PMCID: PMC9837566 DOI: 10.1099/mgen.0.000913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Antimicrobial-resistance (AMR) genes can be transferred between microbial cells via horizontal gene transfer (HGT), which involves mobile and integrative elements such as plasmids, bacteriophages, transposons, integrons and pathogenicity islands. Bacteriophages are found in abundance in the microbial world, but their role in virulence and AMR has not fully been elucidated in the Enterobacterales. With short-read sequencing paving the way to systematic high-throughput AMR gene detection, long-read sequencing technologies now enable us to establish how such genes are structurally connected into meaningful genomic units, raising questions about how they might cooperate to achieve their biological function. Here, we describe a novel ~98 kbp circular P1-bacteriophage-like plasmid termed ph681355 isolated from a clinical Salmonella enterica serovar Typhi isolate. It carries bla CTX-M-15, an IncY plasmid replicon (repY gene) and the ISEcP1 mobile element and is, to our knowledge, the first reported P1-bacteriophage-like plasmid (phage-plasmid) in S. enterica Typhi. We compared ph681355 to two previously described phage-plasmids, pSJ46 from S. enterica serovar Indiana and pMCR-1-P3 from Escherichia coli, and found high nucleotide similarity across the backbone. However, we saw low ph681355 backbone similarity to plasmid p60006 associated with the extensively drug-resistant S. enterica Typhi outbreak isolate in Pakistan, providing evidence of an alternative route for bla CTX-M-15 transmission. Our discovery highlights the importance of utilizing long-read sequencing in interrogating bacterial genomic architecture to fully understand AMR mechanisms and their clinical relevance. It also raises questions regarding how widespread bacteriophage-mediated HGT might be, suggesting that the resulting genomic plasticity might be higher than previously thought.
Collapse
Affiliation(s)
- David R. Greig
- National Infection Service, UK Health Security Agency, London NW9 5EQ, UK,NIHR Health Protection Research Unit in Gastrointestinal Pathogens, Liverpool, UK,Division of Infection and Immunity, Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush EH25 9RG, UK
| | - Matthew T. Bird
- National Infection Service, UK Health Security Agency, London NW9 5EQ, UK,NIHR Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Oxford, UK
| | | | | | - Emma V. Waters
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Paolo Ribeca
- National Infection Service, UK Health Security Agency, London NW9 5EQ, UK,NIHR Health Protection Research Unit in Genomics and Enabling Data, Warwick, UK
| | - Claire Jenkins
- National Infection Service, UK Health Security Agency, London NW9 5EQ, UK,NIHR Health Protection Research Unit in Gastrointestinal Pathogens, Liverpool, UK
| | - Satheesh Nair
- National Infection Service, UK Health Security Agency, London NW9 5EQ, UK,*Correspondence: Satheesh Nair,
| |
Collapse
|
32
|
Viruses direct carbon cycling in lake sediments under global change. Proc Natl Acad Sci U S A 2022; 119:e2202261119. [PMID: 36206369 PMCID: PMC9564219 DOI: 10.1073/pnas.2202261119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Global change is altering the vast amount of carbon cycled by microbes between land and freshwater, but how viruses mediate this process is poorly understood. Here, we show that viruses direct carbon cycling in lake sediments, and these impacts intensify with future changes in water clarity and terrestrial organic matter (tOM) inputs. Using experimental tOM gradients within sediments of a clear and a dark boreal lake, we identified 156 viral operational taxonomic units (vOTUs), of which 21% strongly increased with abundances of key bacteria and archaea, identified via metagenome-assembled genomes (MAGs). MAGs included the most abundant prokaryotes, which were themselves associated with dissolved organic matter (DOM) composition and greenhouse gas (GHG) concentrations. Increased abundances of virus-like particles were separately associated with reduced bacterial metabolism and with shifts in DOM toward amino sugars, likely released by cell lysis rather than higher molecular mass compounds accumulating from reduced tOM degradation. An additional 9.6% of vOTUs harbored auxiliary metabolic genes associated with DOM and GHGs. Taken together, these different effects on host dynamics and metabolism can explain why abundances of vOTUs rather than MAGs were better overall predictors of carbon cycling. Future increases in tOM quantity, but not quality, will change viral composition and function with consequences for DOM pools. Given their importance, viruses must now be explicitly considered in efforts to understand and predict the freshwater carbon cycle and its future under global environmental change.
Collapse
|
33
|
Ecogenomics reveals viral communities across the Challenger Deep oceanic trench. Commun Biol 2022; 5:1055. [PMID: 36192584 PMCID: PMC9529941 DOI: 10.1038/s42003-022-04027-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 09/23/2022] [Indexed: 11/23/2022] Open
Abstract
Despite the environmental challenges and nutrient scarcity, the geographically isolated Challenger Deep in Mariana trench, is considered a dynamic hotspot of microbial activity. Hadal viruses are the least explored microorganisms in Challenger Deep, while their taxonomic and functional diversity and ecological impact on deep-sea biogeochemistry are poorly described. Here, we collect 13 sediment cores from slope and bottom-axis sites across the Challenger Deep (down to ~11 kilometers depth), and identify 1,628 previously undescribed viral operational taxonomic units at species level. Community-wide analyses reveals 1,299 viral genera and distinct viral diversity across the trench, which is significantly higher at the bottom-axis vs. slope sites of the trench. 77% of these viral genera have not been previously identified in soils, deep-sea sediments and other oceanic settings. Key prokaryotes involved in hadal carbon and nitrogen cycling are predicted to be potential hosts infected by these viruses. The detected putative auxiliary metabolic genes suggest that viruses at Challenger Deep could modulate the carbohydrate and sulfur metabolisms of their potential hosts, and stabilize host’s cell membranes under extreme hydrostatic pressures. Our results shed light on hadal viral metabolic capabilities, contribute to understanding deep sea ecology and on functional adaptions of hadal viruses for future research. Analysis of 13 sediment cores from the Challenger Deep of Marian Trench (down to 11 kilometers depth) identified distinct operational taxonomic units and relevant auxiliary metabolic genes, providing further insight into deep-sea viral metabolic capabilities and ecology.
Collapse
|
34
|
Microbiome-phage interactions in inflammatory bowel disease. Clin Microbiol Infect 2022:S1198-743X(22)00506-7. [PMID: 36191844 DOI: 10.1016/j.cmi.2022.08.027] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/23/2022] [Accepted: 08/29/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Inflammatory bowel diseases (IBD) constitute a group of auto-inflammatory disorders impacting the gastrointestinal tract and other systemic organs. The gut microbiome contributes to IBD pathology through multiple mechanisms. Bacteriophages (hence termed phages) are viruses that are able to specifically infect bacteria. Considered as part of the gut microbiome, phages may impact bacterial community structure in various clinical contexts. Additionally, exogenous phage administration may represent a means of suppressing IBD-associated pathobionts, yet utilization of phage therapy remains at an early developmental phase. OBJECTIVES Herein, we summarize the latest advances in understanding endogenous phage impacts on the gut microbiome in health and in IBD. We highlight the prospect of phage utilization as a targeted mode of pathobiont eradication, in preventing and treating IBD manifestations and complications. SOURCES Selected peer-reviewed publications regarding the role of phages in health and in IBD, published between 2013 and 2022. CONTENT The human gut microbiome is increasingly suggested to play a significant role in the onset and progression of multiple non-communicable diseases such as IBD. Several studies suggest that this effect may be mediated by discrete disease-contributing commensals. However, eradication of such pathogenic bacteria remains a daunting unmet task. Altered community structure in IBD may be influenced by blooms of phages within the gut bacterial ecosystem. Moreover, combinations of phages specifically targeting disease-contributing pathobiont strain clades may be harnessed as potential eradication treatment preventing and treating IBD, while bearing minimal adverse impacts on the surrounding bacterial microbiome. IMPLICATIONS Understanding endogenous phage-gut commensal interactions in health and in IBD may enable phage utilization in precision gut microbiome editing, towards treating IBD and other non-communicable microbiome-associated diseases. Nevertheless, developing phage combination-mediated IBD pathobiont eradication treatment modalities will likely necessitate better strain-level bacterial target identification and resolution of treatment-related challenges, such as phage delivery, off-target effects, and bacterial resistance.
Collapse
|
35
|
Ataee S, Brochet X, Peña-Reyes CA. Bacteriophage Genetic Edition Using LSTM. FRONTIERS IN BIOINFORMATICS 2022; 2:932319. [PMID: 36353213 PMCID: PMC9639385 DOI: 10.3389/fbinf.2022.932319] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 06/06/2022] [Indexed: 09/16/2023] Open
Abstract
Bacteriophages are gaining increasing interest as antimicrobial tools, largely due to the emergence of multi-antibiotic-resistant bacteria. Although their huge diversity and virulence make them particularly attractive for targeting a wide range of bacterial pathogens, it is difficult to select suitable phages due to their high specificity which limits their host range. In addition, other challenges remain such as structural fragility under certain environmental conditions, immunogenicity of phage therapy, or development of bacterial resistance. The use of genetically engineered phages may reduce characteristics that hinder prophylactic and therapeutic applications of phages. Nowadays, there is no systematic method to modify a given phage genome conferring its sought characteristics. We explore the use of artificial intelligence for this purpose as it has the potential to both guide and accelerate genome modification to generate phage variants with unique properties that overcome the limitations of natural phages. We propose an original architecture composed of two deep learning-driven components: a phage-bacterium interaction predictor and a phage genome-sequence generator. The former is a multi-branch 1-D convolutional neural network (1D-CNN) that analyses phage and bacterial genomes to predict interactions. The latter is a recurrent neural network, more particularly a long short-term memory (LSTM), that performs genomic modifications to a phage to offer substantial host range improvement. For this component, we developed two different architectures composed of one or two stacked LSTM layers with 256 neurons each. These generators are used to modify, more precisely to rewrite, the genome sequence of 42 selected phages, while the predictor is used to estimate the host range of the modified bacteriophages across 46 strains of Pseudomonas aeruginosa. The proposed generators, trained with an average accuracy of 96.1%, are able to improve the host range for an average of 18 phages among the 42 under study, increasing both their average host range, by 73.0 and 103.7%, and the maximum host ranges from 21 to 24 and 29, respectively. These promising results showed that the use of deep learning methodologies allows genetic modification of phages to extend, for instance, their host range, confirming the potential of these approaches to guide bacteriophage engineering.
Collapse
Affiliation(s)
- Shabnam Ataee
- Institute of Information and Communication Technology (IICT), School of Management and Engineering Vaud (HEIG-VD), Yverdon-les-Bains, Switzerland
- HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
- CI4CB—Computational Intelligence for Computational Biology, SIB—Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Xavier Brochet
- Institute of Information and Communication Technology (IICT), School of Management and Engineering Vaud (HEIG-VD), Yverdon-les-Bains, Switzerland
- HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
- CI4CB—Computational Intelligence for Computational Biology, SIB—Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Carlos Andrés Peña-Reyes
- Institute of Information and Communication Technology (IICT), School of Management and Engineering Vaud (HEIG-VD), Yverdon-les-Bains, Switzerland
- HES-SO University of Applied Sciences and Arts Western Switzerland, Delémont, Switzerland
- CI4CB—Computational Intelligence for Computational Biology, SIB—Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
36
|
Chitcharoen S, Sivapornnukul P, Payungporn S. Revolutionized virome research using systems microbiology approaches. Exp Biol Med (Maywood) 2022; 247:1135-1147. [PMID: 35723062 PMCID: PMC9335507 DOI: 10.1177/15353702221102895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Currently, both pathogenic and commensal viruses are continuously being discovered and acknowledged as ubiquitous components of microbial communities. The advancements of systems microbiological approaches have changed the face of virome research. Here, we focus on viral metagenomic approach to study virus community and their interactions with other microbial members as well as their hosts. This review also summarizes challenges, limitations, and benefits of the current virome approaches. Potentially, the studies of virome can be further applied in various biological and clinical fields.
Collapse
Affiliation(s)
- Suwalak Chitcharoen
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand,Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Pavaret Sivapornnukul
- Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sunchai Payungporn
- Research Unit of Systems Microbiology, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand,Sunchai Payungporn.
| |
Collapse
|
37
|
Nishimura L, Fujito N, Sugimoto R, Inoue I. Detection of Ancient Viruses and Long-Term Viral Evolution. Viruses 2022; 14:v14061336. [PMID: 35746807 PMCID: PMC9230872 DOI: 10.3390/v14061336] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 12/22/2022] Open
Abstract
The COVID-19 outbreak has reminded us of the importance of viral evolutionary studies as regards comprehending complex viral evolution and preventing future pandemics. A unique approach to understanding viral evolution is the use of ancient viral genomes. Ancient viruses are detectable in various archaeological remains, including ancient people's skeletons and mummified tissues. Those specimens have preserved ancient viral DNA and RNA, which have been vigorously analyzed in the last few decades thanks to the development of sequencing technologies. Reconstructed ancient pathogenic viral genomes have been utilized to estimate the past pandemics of pathogenic viruses within the ancient human population and long-term evolutionary events. Recent studies revealed the existence of non-pathogenic viral genomes in ancient people's bodies. These ancient non-pathogenic viruses might be informative for inferring their relationships with ancient people's diets and lifestyles. Here, we reviewed the past and ongoing studies on ancient pathogenic and non-pathogenic viruses and the usage of ancient viral genomes to understand their long-term viral evolution.
Collapse
Affiliation(s)
- Luca Nishimura
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
| | - Naoko Fujito
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
| | - Ryota Sugimoto
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
| | - Ituro Inoue
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
- Correspondence: ; Tel.: +81-55-981-6795
| |
Collapse
|
38
|
Andrade-Martínez JS, Camelo Valera LC, Chica Cárdenas LA, Forero-Junco L, López-Leal G, Moreno-Gallego JL, Rangel-Pineros G, Reyes A. Computational Tools for the Analysis of Uncultivated Phage Genomes. Microbiol Mol Biol Rev 2022; 86:e0000421. [PMID: 35311574 PMCID: PMC9199400 DOI: 10.1128/mmbr.00004-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Over a century of bacteriophage research has uncovered a plethora of fundamental aspects of their biology, ecology, and evolution. Furthermore, the introduction of community-level studies through metagenomics has revealed unprecedented insights on the impact that phages have on a range of ecological and physiological processes. It was not until the introduction of viral metagenomics that we began to grasp the astonishing breadth of genetic diversity encompassed by phage genomes. Novel phage genomes have been reported from a diverse range of biomes at an increasing rate, which has prompted the development of computational tools that support the multilevel characterization of these novel phages based solely on their genome sequences. The impact of these technologies has been so large that, together with MAGs (Metagenomic Assembled Genomes), we now have UViGs (Uncultivated Viral Genomes), which are now officially recognized by the International Committee for the Taxonomy of Viruses (ICTV), and new taxonomic groups can now be created based exclusively on genomic sequence information. Even though the available tools have immensely contributed to our knowledge of phage diversity and ecology, the ongoing surge in software programs makes it challenging to keep up with them and the purpose each one is designed for. Therefore, in this review, we describe a comprehensive set of currently available computational tools designed for the characterization of phage genome sequences, focusing on five specific analyses: (i) assembly and identification of phage and prophage sequences, (ii) phage genome annotation, (iii) phage taxonomic classification, (iv) phage-host interaction analysis, and (v) phage microdiversity.
Collapse
Affiliation(s)
- Juan Sebastián Andrade-Martínez
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Carolina Camelo Valera
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Luis Alberto Chica Cárdenas
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Forero-Junco
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Plant and Environmental Science, University of Copenhagen, Frederiksberg, Denmark
| | - Gamaliel López-Leal
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - J. Leonardo Moreno-Gallego
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Guillermo Rangel-Pineros
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
39
|
Jiang Y, Luo J, Huang D, Liu Y, Li DD. Machine Learning Advances in Microbiology: A Review of Methods and Applications. Front Microbiol 2022; 13:925454. [PMID: 35711777 PMCID: PMC9196628 DOI: 10.3389/fmicb.2022.925454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/09/2022] [Indexed: 12/18/2022] Open
Abstract
Microorganisms play an important role in natural material and elemental cycles. Many common and general biology research techniques rely on microorganisms. Machine learning has been gradually integrated with multiple fields of study. Machine learning, including deep learning, aims to use mathematical insights to optimize variational functions to aid microbiology using various types of available data to help humans organize and apply collective knowledge of various research objects in a systematic and scaled manner. Classification and prediction have become the main achievements in the development of microbial community research in the direction of computational biology. This review summarizes the application and development of machine learning and deep learning in the field of microbiology and shows and compares the advantages and disadvantages of different algorithm tools in four fields: microbiome and taxonomy, microbial ecology, pathogen and epidemiology, and drug discovery.
Collapse
|
40
|
Liu F, Miao Y, Liu Y, Hou T. RNN-VirSeeker: A Deep Learning Method for Identification of Short Viral Sequences From Metagenomes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1840-1849. [PMID: 33315571 DOI: 10.1109/tcbb.2020.3044575] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Viruses are the most abundant biological entities on earth, and play vital roles in many aspects of microbial communities. As major human pathogens, viruses have caused huge mortality and morbidity to human society in history. Metagenomic sequencing methods could capture all microorganisms from microbiota, with sequences of viruses mixed with these of other species. Therefore, it is necessary to identify viral sequences from metagenomes. However, existing methods perform poorly on identifying short viral sequences. To solve this problem, a deep learning based method, RNN-VirSeeker, is proposed in this paper. RNN-VirSeeker was trained by sequences of 500bp sampled from known Virus and Host RefSeq genomes. Experimental results on the testing set have shown that RNN-VirSeeker exhibited AUROC of 0.9175, recall of 0.8640 and precision of 0.9211 for sequences of 500bp, and outperformed three widely used methods, VirSorter, VirFinder, and DeepVirFinder, on identifying short viral sequences. RNN-VirSeeker was also used to identify viral sequences from a CAMI dataset and a human gut metagenome. Compared with DeepVirFinder, RNN-VirSeeker identified more viral sequences from these metagenomes and achieved greater values of AUPRC and AUROC. RNN-VirSeeker is freely available at https://github.com/crazyinter/RNN-VirSeeker.
Collapse
|
41
|
Kieft K, Anantharaman K. Virus genomics: what is being overlooked? Curr Opin Virol 2022; 53:101200. [DOI: 10.1016/j.coviro.2022.101200] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/21/2021] [Accepted: 01/03/2022] [Indexed: 01/05/2023]
|
42
|
Abstract
Modern sequencing technologies have provided insight into the genetic diversity of numerous species, including the human pathogen Pseudomonas aeruginosa. Bacterial genomes often harbor bacteriophage genomes (prophages), which can account for upwards of 20% of the genome. Prior studies have found P. aeruginosa prophages that contribute to their host’s pathogenicity and fitness. These advantages come in many different forms, including the production of toxins, promotion of biofilm formation, and displacement of other P. aeruginosa strains. While several different genera and species of P. aeruginosa prophages have been studied, there has not been a comprehensive study of the overall diversity of P. aeruginosa-infecting prophages. Here, we present the results of just such an analysis. A total of 6,852 high-confidence prophages were identified from 5,383 P. aeruginosa genomes from strains isolated from the human body and other environments. In total, 3,201 unique prophage sequences were identified. While 53.1% of these prophage sequences displayed sequence similarity to publicly available phage genomes, novel and highly mosaic prophages were discovered. Among these prophages, there is extensive diversity, including diversity within the functionally conserved integrase and C repressor coding regions, two genes responsible for prophage entering and persisting through the lysogenic life cycle. Analysis of integrase, C repressor, and terminase coding regions revealed extensive reassortment among P. aeruginosa prophages. This catalog of P. aeruginosa prophages provides a resource for future studies into the evolution of the species. IMPORTANCE Prophages play a critical role in the evolution of their host species and can also contribute to the virulence and fitness of pathogenic species. Here, we conducted a comprehensive investigation of prophage sequences from 5,383 publicly available Pseudomonas aeruginosa genomes from human as well as environmental isolates. We identified a diverse population of prophages, including tailed phages, inoviruses, and microviruses; 46.9% of the prophage sequences found share no significant sequence similarity with characterized phages, representing a vast array of novel P. aeruginosa-infecting phages. Our investigation into these prophages found substantial evidence of reassortment. In producing this, the first catalog of P. aeruginosa prophages, we uncovered both novel prophages as well as genetic content that have yet to be explored.
Collapse
|
43
|
Miao Y, Liu F, Hou T, Liu Y. Virtifier: a deep learning-based identifier for viral sequences from metagenomes. Bioinformatics 2022; 38:1216-1222. [PMID: 34908121 DOI: 10.1093/bioinformatics/btab845] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 11/13/2021] [Accepted: 12/13/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Viruses, the most abundant biological entities on earth, are important components of microbial communities, and as major human pathogens, they are responsible for human mortality and morbidity. The identification of viral sequences from metagenomes is critical for viral analysis. As massive quantities of short sequences are generated by next-generation sequencing, most methods utilize discrete and sparse one-hot vectors to encode nucleotide sequences, which are usually ineffective in viral identification. RESULTS In this article, Virtifier, a deep learning-based viral identifier for sequences from metagenomic data is proposed. It includes a meaningful nucleotide sequence encoding method named Seq2Vec and a variant viral sequence predictor with an attention-based long short-term memory (LSTM) network. By utilizing a fully trained embedding matrix to encode codons, Seq2Vec can efficiently extract the relationships among those codons in a nucleotide sequence. Combined with an attention layer, the LSTM neural network can further analyze the codon relationships and sift the parts that contribute to the final features. Experimental results of three datasets have shown that Virtifier can accurately identify short viral sequences (<500 bp) from metagenomes, surpassing three widely used methods, VirFinder, DeepVirFinder and PPR-Meta. Meanwhile, a comparable performance was achieved by Virtifier at longer lengths (>5000 bp). AVAILABILITY AND IMPLEMENTATION A Python implementation of Virtifier and the Python code developed for this study have been provided on Github https://github.com/crazyinter/Seq2Vec. The RefSeq genomes in this article are available in VirFinder at https://dx.doi.org/10.1186/s40168-017-0283-5. The CAMI Challenge Dataset 3 CAMI_high dataset in this article is available in CAMI at https://data.cami-challenge.org/participate. The real human gut metagenomes in this article are available at https://dx.doi.org/10.1101/gr.142315.112. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yan Miao
- College of Communication Engineering, Jilin University, Changchun 130022, China
| | - Fu Liu
- College of Communication Engineering, Jilin University, Changchun 130022, China
| | - Tao Hou
- College of Communication Engineering, Jilin University, Changchun 130022, China
| | - Yun Liu
- College of Communication Engineering, Jilin University, Changchun 130022, China
| |
Collapse
|
44
|
Xu G, Zhang L, Liu X, Guan F, Xu Y, Yue H, Huang JQ, Chen J, Wu N, Tian J. Combined assembly of long and short sequencing reads improve the efficiency of exploring the soil metagenome. BMC Genomics 2022; 23:37. [PMID: 34996356 PMCID: PMC8742384 DOI: 10.1186/s12864-021-08260-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 12/13/2021] [Indexed: 12/22/2022] Open
Abstract
Background Advances in DNA sequencing technologies have transformed our capacity to perform life science research, decipher the dynamics of complex soil microbial communities and exploit them for plant disease management. However, soil is a complex conglomerate, which makes functional metagenomics studies very challenging. Results Metagenomes were assembled by long-read (PacBio, PB), short-read (Illumina, IL), and mixture of PB and IL (PI) sequencing of soil DNA samples were compared. Ortholog analyses and functional annotation revealed that the PI approach significantly increased the contig length of the metagenomic sequences compared to IL and enlarged the gene pool compared to PB. The PI approach also offered comparable or higher species abundance than either PB or IL alone, and showed significant advantages for studying natural product biosynthetic genes in the soil microbiomes. Conclusion Our results provide an effective strategy for combining long and short-read DNA sequencing data to explore and distill the maximum information out of soil metagenomics. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08260-3.
Collapse
Affiliation(s)
- Guoshun Xu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China
| | - Liwen Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China.
| | - Xiaoqing Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China
| | - Yuquan Xu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China
| | - Haitao Yue
- Department of Biology and Biotechnology, Xinjiang University, 666 Shengli Road, Urumqi, 830046, People's Republic of China
| | - Jin-Qun Huang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Jieyin Chen
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China.
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China
| | - Jian Tian
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, No.12 Zhongguancun South Street, Beijing, 100081, People's Republic of China.
| |
Collapse
|
45
|
Bolduc B, Zablocki O, Guo J, Zayed AA, Vik D, Dehal P, Wood-Charlson EM, Arkin A, Merchant N, Pett-Ridge J, Roux S, Vaughn M, Sullivan MB. iVirus 2.0: Cyberinfrastructure-supported tools and data to power DNA virus ecology. ISME COMMUNICATIONS 2021; 1:77. [PMID: 36765102 PMCID: PMC9723767 DOI: 10.1038/s43705-021-00083-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 11/24/2021] [Accepted: 11/29/2021] [Indexed: 11/09/2022]
Abstract
Microbes drive myriad ecosystem processes, but under strong influence from viruses. Because studying viruses in complex systems requires different tools than those for microbes, they remain underexplored. To combat this, we previously aggregated double-stranded DNA (dsDNA) virus analysis capabilities and resources into 'iVirus' on the CyVerse collaborative cyberinfrastructure. Here we substantially expand iVirus's functionality and accessibility, to iVirus 2.0, as follows. First, core iVirus apps were integrated into the Department of Energy's Systems Biology KnowledgeBase (KBase) to provide an additional analytical platform. Second, at CyVerse, 20 software tools (apps) were upgraded or added as new tools and capabilities. Third, nearly 20-fold more sequence reads were aggregated to capture new data and environments. Finally, documentation, as "live" protocols, was updated to maximize user interaction with and contribution to infrastructure development. Together, iVirus 2.0 serves as a uniquely central and accessible analytical platform for studying how viruses, particularly dsDNA viruses, impact diverse microbial ecosystems.
Collapse
Affiliation(s)
- Benjamin Bolduc
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Columbus, OH, USA
- EMERGE Biology Integration Institute, Columbus, OH, USA
| | - Olivier Zablocki
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Columbus, OH, USA
- EMERGE Biology Integration Institute, Columbus, OH, USA
| | - Jiarong Guo
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Columbus, OH, USA
- EMERGE Biology Integration Institute, Columbus, OH, USA
| | - Ahmed A Zayed
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, Columbus, OH, USA
- EMERGE Biology Integration Institute, Columbus, OH, USA
| | - Dean Vik
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | - Paramvir Dehal
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Adam Arkin
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Bioengineering, University of California, Berkeley, CA, USA
| | | | - Jennifer Pett-Ridge
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
- Life & Environmental Sciences Department, University of California Merced, Merced, CA, 95343, USA
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew Vaughn
- Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, USA
| | - Matthew B Sullivan
- Department of Microbiology, The Ohio State University, Columbus, OH, USA.
- Center of Microbiome Science, Columbus, OH, USA.
- EMERGE Biology Integration Institute, Columbus, OH, USA.
- Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
46
|
Jurasz H, Pawłowski T, Perlejewski K. Contamination Issue in Viral Metagenomics: Problems, Solutions, and Clinical Perspectives. Front Microbiol 2021; 12:745076. [PMID: 34745046 PMCID: PMC8564396 DOI: 10.3389/fmicb.2021.745076] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 09/17/2021] [Indexed: 12/16/2022] Open
Abstract
We describe the most common internal and external sources and types of contamination encountered in viral metagenomic studies and discuss their negative impact on sequencing results, particularly for low-biomass samples and clinical applications. We also propose some basic recommendations for reducing the background noise in viral shotgun metagenomic (SM) studies, which would limit the bias introduced by various classes of contaminants. Regardless of the specific viral SM protocol, contamination cannot be totally avoided; in particular, the issue of reagent contamination should always be addressed with high priority. There is an urgent need for the development and validation of standards for viral metagenomic studies especially if viral SM protocols will be more widely applied in diagnostics.
Collapse
Affiliation(s)
- Henryk Jurasz
- Department of Immunopathology of Infectious and Parasitic Diseases, Medical University of Warsaw, Warsaw, Poland
| | - Tomasz Pawłowski
- Division of Psychotherapy and Psychosomatic Medicine, Department of Psychiatry, Wrocław Medical University, Wrocław, Poland
| | - Karol Perlejewski
- Department of Immunopathology of Infectious and Parasitic Diseases, Medical University of Warsaw, Warsaw, Poland
| |
Collapse
|
47
|
Ponsero AJ, Hurwitz BL, Magain N, Miadlikowska J, Lutzoni F, U'Ren JM. Cyanolichen microbiome contains novel viruses that encode genes to promote microbial metabolism. ISME COMMUNICATIONS 2021; 1:56. [PMID: 37938275 PMCID: PMC9723557 DOI: 10.1038/s43705-021-00060-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/23/2021] [Accepted: 09/27/2021] [Indexed: 11/09/2023]
Abstract
Lichen thalli are formed through the symbiotic association of a filamentous fungus and photosynthetic green alga and/or cyanobacterium. Recent studies have revealed lichens also host highly diverse communities of secondary fungal and bacterial symbionts, yet few studies have examined the viral component within these complex symbioses. Here, we describe viral biodiversity and functions in cyanolichens collected from across North America and Europe. As current machine-learning viral-detection tools are not trained on complex eukaryotic metagenomes, we first developed efficient methods to remove eukaryotic reads prior to viral detection and a custom pipeline to validate viral contigs predicted with three machine-learning methods. Our resulting high-quality viral data illustrate that every cyanolichen thallus contains diverse viruses that are distinct from viruses in other terrestrial ecosystems. In addition to cyanobacteria, predicted viral hosts include other lichen-associated bacterial lineages and algae, although a large fraction of viral contigs had no host prediction. Functional annotation of cyanolichen viral sequences predicts numerous viral-encoded auxiliary metabolic genes (AMGs) involved in amino acid, nucleotide, and carbohydrate metabolism, including AMGs for secondary metabolism (antibiotics and antimicrobials) and fatty acid biosynthesis. Overall, the diversity of cyanolichen AMGs suggests that viruses may alter microbial interactions within these complex symbiotic assemblages.
Collapse
Affiliation(s)
- Alise J Ponsero
- BIO5 Institute and Department of Biosystems Engineering, University of Arizona, Tucson, AZ, 85721, USA
- Department of Medicine, University of Helsinki, Helsinki, Finland
| | - Bonnie L Hurwitz
- BIO5 Institute and Department of Biosystems Engineering, University of Arizona, Tucson, AZ, 85721, USA
| | - Nicolas Magain
- Department of Biology, Duke University, Durham, NC, 27708, USA
- Evolution and Conservation Biology, InBioS, University of Liège, Liège, Belgium
| | | | | | - Jana M U'Ren
- BIO5 Institute and Department of Biosystems Engineering, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
48
|
Utilizing the VirIdAl Pipeline to Search for Viruses in the Metagenomic Data of Bat Samples. Viruses 2021; 13:v13102006. [PMID: 34696436 PMCID: PMC8541124 DOI: 10.3390/v13102006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/30/2021] [Accepted: 10/02/2021] [Indexed: 12/27/2022] Open
Abstract
According to various estimates, only a small percentage of existing viruses have been discovered, naturally much less being represented in the genomic databases. High-throughput sequencing technologies develop rapidly, empowering large-scale screening of various biological samples for the presence of pathogen-associated nucleotide sequences, but many organisms are yet to be attributed specific loci for identification. This problem particularly impedes viral screening, due to vast heterogeneity in viral genomes. In this paper, we present a new bioinformatic pipeline, VirIdAl, for detecting and identifying viral pathogens in sequencing data. We also demonstrate the utility of the new software by applying it to viral screening of the feces of bats collected in the Moscow region, which revealed a significant variety of viruses associated with bats, insects, plants, and protozoa. The presence of alpha and beta coronavirus reads, including the MERS-like bat virus, deserves a special mention, as it once again indicates that bats are indeed reservoirs for many viral pathogens. In addition, it was shown that alignment-based methods were unable to identify the taxon for a large proportion of reads, and we additionally applied other approaches, showing that they can further reveal the presence of viral agents in sequencing data. However, the incompleteness of viral databases remains a significant problem in the studies of viral diversity, and therefore necessitates the use of combined approaches, including those based on machine learning methods.
Collapse
|
49
|
Garneau JR, Legrand V, Marbouty M, Press MO, Vik DR, Fortier LC, Sullivan MB, Bikard D, Monot M. High-throughput identification of viral termini and packaging mechanisms in virome datasets using PhageTermVirome. Sci Rep 2021; 11:18319. [PMID: 34526611 PMCID: PMC8443750 DOI: 10.1038/s41598-021-97867-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 08/27/2021] [Indexed: 11/13/2022] Open
Abstract
Viruses that infect bacteria (phages) are increasingly recognized for their importance in diverse ecosystems but identifying and annotating them in large-scale sequence datasets is still challenging. Although efficient scalable virus identification tools are emerging, defining the exact ends (termini) of phage genomes is still particularly difficult. The proper identification of termini is crucial, as it helps in characterizing the packaging mechanism of bacteriophages and provides information on various aspects of phage biology. Here, we introduce PhageTermVirome (PTV) as a tool for the easy and rapid high-throughput determination of phage termini and packaging mechanisms using modern large-scale metagenomics datasets. We successfully tested the PTV algorithm on a mock virome dataset and then used it on two real virome datasets to achieve the rapid identification of more than 100 phage termini and packaging mechanisms, with just a few hours of computing time. Because PTV allows the identification of free fully formed viral particles (by recognition of termini present only in encapsidated DNA), it can also complement other virus identification softwares to predict the true viral origin of contigs in viral metagenomics datasets. PTV is a novel and unique tool for high-throughput characterization of phage genomes, including phage termini identification and characterization of genome packaging mechanisms. This software should help researchers better visualize, map and study the virosphere. PTV is freely available for downloading and installation at https://gitlab.pasteur.fr/vlegrand/ptv.
Collapse
Affiliation(s)
| | - Véronique Legrand
- Infrastructure et Ingénierie Scientifique, Institut Pasteur, 75015, Paris, France
| | - Martial Marbouty
- Institut Pasteur, Unité Régulation Spatiale des Génomes, UMR 3525, CNRS, 75015, Paris, France
| | | | - Dean R Vik
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA
| | - Louis-Charles Fortier
- Faculty of Medicine and Health Sciences, Department of Microbiology and Infectious Diseases, Université de Sherbrooke, Sherbrooke, QC, J1E 4K8, Canada
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA
| | - David Bikard
- Département de Microbiologie, Institut Pasteur, Groupe Biologie de Synthèse, 75015, Paris, France
| | - Marc Monot
- Biomics Platform, C2RT, Institut Pasteur, 75015, Paris, France.
| |
Collapse
|
50
|
Wu S, Fang Z, Tan J, Li M, Wang C, Guo Q, Xu C, Jiang X, Zhu H. DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience 2021; 10:giab056. [PMID: 34498685 PMCID: PMC8427542 DOI: 10.1093/gigascience/giab056] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment. FINDINGS DeePhage uses a "one-hot" encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. CONCLUSIONS DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.
Collapse
Affiliation(s)
- Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Jie Tan
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Mo Li
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Chunhui Wang
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Congmin Xu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
- Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, Beijing, China
| |
Collapse
|