1
|
Rahimian M, Panahi B. Metagenome sequence data mining for viral interaction studies: Review on progress and prospects. Virus Res 2024; 349:199450. [PMID: 39151562 DOI: 10.1016/j.virusres.2024.199450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2024] [Revised: 08/11/2024] [Accepted: 08/13/2024] [Indexed: 08/19/2024]
Abstract
Metagenomics has been greatly accelerated by the development of next-generation sequencing (NGS) technologies, which allow scientists to discover and describe novel microorganisms without the need for conventional culture techniques. Examining integrative bioinformatics methods used in viral interaction research, this study highlights metagenomic data from various contexts. Accurate viral identification depends on high-purity genetic material extraction, appropriate NGS platform selection, and sophisticated bioinformatics tools like VirPipe and VirFinder. The efficiency and precision of metagenomic analysis are further improved with the advent of AI-based techniques. The diversity and dynamics of viral communities are demonstrated by case studies from a variety of environments, emphasizing the seasonal and geographical variations that influence viral populations. In addition to speeding up the discovery of new viruses, metagenomics offers thorough understanding of virus-host interactions and their ecological effects. This review provides a promising framework for comprehending the complexity of viral communities and their interactions with hosts, highlighting the transformational potential of metagenomics and bioinformatics in viral research.
Collapse
Affiliation(s)
- Mohammadreza Rahimian
- Department of Biology, Faculty of Basic Sciences, University of Maragheh, Maragheh, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
2
|
Ridgway R, Lu H, Blower TR, Evans NJ, Ainsworth S. Genomic and taxonomic evaluation of 38 Treponema prophage sequences. BMC Genomics 2024; 25:549. [PMID: 38824509 PMCID: PMC11144348 DOI: 10.1186/s12864-024-10461-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 05/28/2024] [Indexed: 06/03/2024] Open
Abstract
BACKGROUND Despite Spirochetales being a ubiquitous and medically important order of bacteria infecting both humans and animals, there is extremely limited information regarding their bacteriophages. Of the genus Treponema, there is just a single reported characterised prophage. RESULTS We applied a bioinformatic approach on 24 previously published Treponema genomes to identify and characterise putative treponemal prophages. Thirteen of the genomes did not contain any detectable prophage regions. The remaining eleven contained 38 prophage sequences, with between one and eight putative prophages in each bacterial genome. The prophage regions ranged from 12.4 to 75.1 kb, with between 27 and 171 protein coding sequences. Phylogenetic analysis revealed that 24 of the prophages formed three distinct sequence clusters, identifying putative myoviral and siphoviral morphology. ViPTree analysis demonstrated that the identified sequences were novel when compared to known double stranded DNA bacteriophage genomes. CONCLUSIONS In this study, we have started to address the knowledge gap on treponeme bacteriophages by characterising 38 prophage sequences in 24 treponeme genomes. Using bioinformatic approaches, we have been able to identify and compare the prophage-like elements with respect to other bacteriophages, their gene content, and their potential to be a functional and inducible bacteriophage, which in turn can help focus our attention on specific prophages to investigate further.
Collapse
Affiliation(s)
- Rachel Ridgway
- Department of Infection Biology and Microbiomes, University of Liverpool, Leahurst Campus, Chester High Road, Neston, Cheshire, CH64 7TE, UK.
| | - Hanshuo Lu
- Department of Infection Biology and Microbiomes, University of Liverpool, Biosciences Building, Crown Street, Liverpool, L69 7BE, UK
| | - Tim R Blower
- Department of Biosciences, Durham University, Stockton Road, Durham, DH1 3LE, UK
| | - Nicholas James Evans
- Department of Infection Biology and Microbiomes, University of Liverpool, Leahurst Campus, Chester High Road, Neston, Cheshire, CH64 7TE, UK
| | - Stuart Ainsworth
- Department of Infection Biology and Microbiomes, University of Liverpool, Liverpool Science Park IC2, 146 Brownlow Hill, Liverpool, L3 5RF, UK
| |
Collapse
|
3
|
Dantas CWD, Martins DT, Nogueira WG, Alegria OVC, Ramos RTJ. Tools and methodology to in silico phage discovery in freshwater environments. Front Microbiol 2024; 15:1390726. [PMID: 38881659 PMCID: PMC11176557 DOI: 10.3389/fmicb.2024.1390726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/16/2024] [Indexed: 06/18/2024] Open
Abstract
Freshwater availability is essential, and its maintenance has become an enormous challenge. Due to population growth and climate changes, freshwater sources are becoming scarce, imposing the need for strategies for its reuse. Currently, the constant discharge of waste into water bodies from human activities leads to the dissemination of pathogenic bacteria, negatively impacting water quality from the source to the infrastructure required for treatment, such as the accumulation of biofilms. Current water treatment methods cannot keep pace with bacterial evolution, which increasingly exhibits a profile of multidrug resistance to antibiotics. Furthermore, using more powerful disinfectants may affect the balance of aquatic ecosystems. Therefore, there is a need to explore sustainable ways to control the spreading of pathogenic bacteria. Bacteriophages can infect bacteria and archaea, hijacking their host machinery to favor their replication. They are widely abundant globally and provide a biological alternative to bacterial treatment with antibiotics. In contrast to common disinfectants and antibiotics, bacteriophages are highly specific, minimizing adverse effects on aquatic microbial communities and offering a lower cost-benefit ratio in production compared to antibiotics. However, due to the difficulty involving cultivating and identifying environmental bacteriophages, alternative approaches using NGS metagenomics in combination with some bioinformatic tools can help identify new bacteriophages that can be useful as an alternative treatment against resistant bacteria. In this review, we discuss advances in exploring the virome of freshwater, as well as current applications of bacteriophages in freshwater treatment, along with current challenges and future perspectives.
Collapse
Affiliation(s)
- Carlos Willian Dias Dantas
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - David Tavares Martins
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Wylerson Guimarães Nogueira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Oscar Victor Cardenas Alegria
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Thiago Jucá Ramos
- Laboratory of Simulation and Computational Biology - SIMBIC, High Performance Computing Center - CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| |
Collapse
|
4
|
Wu LY, Wijesekara Y, Piedade GJ, Pappas N, Brussaard CPD, Dutilh BE. Benchmarking bioinformatic virus identification tools using real-world metagenomic data across biomes. Genome Biol 2024; 25:97. [PMID: 38622738 PMCID: PMC11020464 DOI: 10.1186/s13059-024-03236-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 04/01/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND As most viruses remain uncultivated, metagenomics is currently the main method for virus discovery. Detecting viruses in metagenomic data is not trivial. In the past few years, many bioinformatic virus identification tools have been developed for this task, making it challenging to choose the right tools, parameters, and cutoffs. As all these tools measure different biological signals, and use different algorithms and training and reference databases, it is imperative to conduct an independent benchmarking to give users objective guidance. RESULTS We compare the performance of nine state-of-the-art virus identification tools in thirteen modes on eight paired viral and microbial datasets from three distinct biomes, including a new complex dataset from Antarctic coastal waters. The tools have highly variable true positive rates (0-97%) and false positive rates (0-30%). PPR-Meta best distinguishes viral from microbial contigs, followed by DeepVirFinder, VirSorter2, and VIBRANT. Different tools identify different subsets of the benchmarking data and all tools, except for Sourmash, find unique viral contigs. Performance of tools improved with adjusted parameter cutoffs, indicating that adjustment of parameter cutoffs before usage should be considered. CONCLUSIONS Together, our independent benchmarking facilitates selecting choices of bioinformatic virus identification tools and gives suggestions for parameter adjustments to viromics researchers.
Collapse
Affiliation(s)
- Ling-Yi Wu
- Theoretical Biology and Bioinformatics, Science4Life, Utrecht University, Padualaan 8, Utrecht, 3584 CH, The Netherlands
| | - Yasas Wijesekara
- Institute of Bioinformatics, University Medicine Greifswald, Felix Hausdorff Str. 8, 17475, Greifswald, Germany
| | - Gonçalo J Piedade
- Department Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Den Burg, PO Box 59, Texel, 1790 AB, The Netherlands
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Nikolaos Pappas
- Theoretical Biology and Bioinformatics, Science4Life, Utrecht University, Padualaan 8, Utrecht, 3584 CH, The Netherlands
| | - Corina P D Brussaard
- Department Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Den Burg, PO Box 59, Texel, 1790 AB, The Netherlands
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Science4Life, Utrecht University, Padualaan 8, Utrecht, 3584 CH, The Netherlands.
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany.
| |
Collapse
|
5
|
Hegarty B, Riddell V J, Bastien E, Langenfeld K, Lindback M, Saini JS, Wing A, Zhang J, Duhaime M. Benchmarking informatics approaches for virus discovery: caution is needed when combining in silico identification methods. mSystems 2024; 9:e0110523. [PMID: 38376167 PMCID: PMC10949488 DOI: 10.1128/msystems.01105-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/24/2024] [Indexed: 02/21/2024] Open
Abstract
Understanding the ecological impacts of viruses on natural and engineered ecosystems relies on the accurate identification of viral sequences from community sequencing data. To maximize viral recovery from metagenomes, researchers frequently combine viral identification tools. However, the effectiveness of this strategy is unknown. Here, we benchmarked combinations of six widely used informatics tools for viral identification and analysis (VirSorter, VirSorter2, VIBRANT, DeepVirFinder, CheckV, and Kaiju), called "rulesets." Rulesets were tested against mock metagenomes composed of taxonomically diverse sequence types and diverse aquatic metagenomes to assess the effects of the degree of viral enrichment and habitat on tool performance. We found that six rulesets achieved equivalent accuracy [Matthews Correlation Coefficient (MCC) = 0.77, Padj ≥ 0.05]. Each contained VirSorter2, and five used our "tuning removal" rule designed to remove non-viral contamination. While DeepVirFinder, VIBRANT, and VirSorter were each found once in these high-accuracy rulesets, they were not found in combination with each other: combining tools does not lead to optimal performance. Our validation suggests that the MCC plateau at 0.77 is partly caused by inaccurate labeling within reference sequence databases. In aquatic metagenomes, our highest MCC ruleset identified more viral sequences in virus-enriched (44%-46%) than in cellular metagenomes (7%-19%). While improved algorithms may lead to more accurate viral identification tools, this should be done in tandem with careful curation of sequence databases. We recommend using the VirSorter2 ruleset and our empirically derived tuning removal rule. Our analysis provides insight into methods for in silico viral identification and will enable more robust viral identification from metagenomic data sets. IMPORTANCE The identification of viruses from environmental metagenomes using informatics tools has offered critical insights in microbial ecology. However, it remains difficult for researchers to know which tools optimize viral recovery for their specific study. In an attempt to recover more viruses, studies are increasingly combining the outputs from multiple tools without validating this approach. After benchmarking combinations of six viral identification tools against mock metagenomes and environmental samples, we found that these tools should only be combined cautiously. Two to four tool combinations maximized viral recovery and minimized non-viral contamination compared with either the single-tool or the five- to six-tool ones. By providing a rigorous overview of the behavior of in silico viral identification strategies and a pipeline to replicate our process, our findings guide the use of existing viral identification tools and offer a blueprint for feature engineering of new tools that will lead to higher-confidence viral discovery in microbiome studies.
Collapse
Affiliation(s)
- Bridget Hegarty
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio, USA
| | - James Riddell V
- Department of Microbiology, The Ohio State University, Columbus, Ohio, USA
| | - Eric Bastien
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Kathryn Langenfeld
- Department of Civil and Environmental Engineering, Stanford University, Palo Alto, California, USA
| | - Morgan Lindback
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jaspreet S. Saini
- Laboratory for Environmental Biotechnology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Anthony Wing
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Jessica Zhang
- Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Melissa Duhaime
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
6
|
Liu GY, Yu D, Fan MM, Zhang X, Jin ZY, Tang C, Liu XF. Antimicrobial resistance crisis: could artificial intelligence be the solution? Mil Med Res 2024; 11:7. [PMID: 38254241 PMCID: PMC10804841 DOI: 10.1186/s40779-024-00510-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance is a global public health threat, and the World Health Organization (WHO) has announced a priority list of the most threatening pathogens against which novel antibiotics need to be developed. The discovery and introduction of novel antibiotics are time-consuming and expensive. According to WHO's report of antibacterial agents in clinical development, only 18 novel antibiotics have been approved since 2014. Therefore, novel antibiotics are critically needed. Artificial intelligence (AI) has been rapidly applied to drug development since its recent technical breakthrough and has dramatically improved the efficiency of the discovery of novel antibiotics. Here, we first summarized recently marketed novel antibiotics, and antibiotic candidates in clinical development. In addition, we systematically reviewed the involvement of AI in antibacterial drug development and utilization, including small molecules, antimicrobial peptides, phage therapy, essential oils, as well as resistance mechanism prediction, and antibiotic stewardship.
Collapse
Affiliation(s)
- Guang-Yu Liu
- Department of Immunology and Pathogen Biology, School of Basic Medical Sciences, Hangzhou Normal University, Key Laboratory of Aging and Cancer Biology of Zhejiang Province, Key Laboratory of Inflammation and Immunoregulation of Hangzhou, Hangzhou Normal University, Hangzhou, 311121, China
| | - Dan Yu
- National Key Discipline of Pediatrics Key Laboratory of Major Diseases in Children Ministry of Education, Laboratory of Dermatology, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Mei-Mei Fan
- Department of Immunology and Pathogen Biology, School of Basic Medical Sciences, Hangzhou Normal University, Key Laboratory of Aging and Cancer Biology of Zhejiang Province, Key Laboratory of Inflammation and Immunoregulation of Hangzhou, Hangzhou Normal University, Hangzhou, 311121, China
| | - Xu Zhang
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Ze-Yu Jin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Christoph Tang
- Sir William Dunn School of Pathology, University of Oxford, Oxford, OX1 3RE, UK.
| | - Xiao-Fen Liu
- Institute of Antibiotics, Huashan Hospital, Fudan University, Key Laboratory of Clinical Pharmacology of Antibiotics, National Health Commission of the People's Republic of China, National Clinical Research Centre for Aging and Medicine, Huashan Hospital, Fudan University, Shanghai, 200040, China.
| |
Collapse
|
7
|
Miao Y, Sun Z, Ma C, Lin C, Wang G, Yang C. VirGrapher: a graph-based viral identifier for long sequences from metagenomes. Brief Bioinform 2024; 25:bbae036. [PMID: 38343326 PMCID: PMC10859693 DOI: 10.1093/bib/bbae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/15/2024] Open
Abstract
Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing deep learning-based methods divide these long sequences into short subsequences and identify them separately. This makes the relationships between them be omitted, leading to poor performance on identifying long viral sequences. In this paper, VirGrapher is proposed to improve the identification performance of long viral sequences by constructing relationships among short subsequences from long ones. VirGrapher see a long sequence as a graph and uses a Graph Convolutional Network (GCN) model to learn multilayer connections between nodes from sequences after a GCN-based node embedding model. VirGrapher achieves a better AUC value and accuracy on validation set, which is better than three benchmark methods.
Collapse
Affiliation(s)
- Yan Miao
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Zhenyuan Sun
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Chenjing Ma
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Chen Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiangannan Road, 361104, Fujian Province, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| | - Chunxue Yang
- College of Landscape Architecture, Northeast Forestry University, Hexing Road, 150040, Heilongjiang Province, China
| |
Collapse
|
8
|
Zhang H, Zhang H, Du H, Yu X, Xu Y. The insights into the phage communities of fermented foods in the age of viral metagenomics. Crit Rev Food Sci Nutr 2024:1-13. [PMID: 38214674 DOI: 10.1080/10408398.2023.2299323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Phages play a critical role in the assembly and regulation of fermented food microbiome through lysis and lysogenic lifestyle, which in turn affects the yield and quality of fermented foods. Therefore, it is important to investigate and characterize the diversity and function of phages under complex microbial communities and nutrient substrate conditions to provide novel insights into the regulation of traditional spontaneous fermentation. Viral metagenomics has gradually garnered increasing attention in fermented food research to elucidate phage functions and characterize the interactions between phages and the microbial community. Advances in this technology have uncovered a wide range of phages associated with the production of traditional fermented foods and beverages. This paper reviews the common methods of viral metagenomics applied in fermented food research, and summarizes the ecological functions of phages in traditional fermented foods. In the future, combining viral metagenomics with culturable methods and metagenomics will broaden the scope of research on fermented food systems, revealing the complex role of phages and intricate phage-bacterium interactions.
Collapse
Affiliation(s)
- Huadong Zhang
- Laboratory of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education, State Key Laboratory of Food Science and Technology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Hongxia Zhang
- College of Life Sciences, Shanxi Normal University, Taiyuan, Shanxi, China
| | - Hai Du
- Laboratory of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education, State Key Laboratory of Food Science and Technology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Xiaowei Yu
- Laboratory of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education, State Key Laboratory of Food Science and Technology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Yan Xu
- Laboratory of Brewing Microbiology and Applied Enzymology, The Key Laboratory of Industrial Biotechnology, Ministry of Education, State Key Laboratory of Food Science and Technology, School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
9
|
Owens LA, Friant S, Martorelli Di Genova B, Knoll LJ, Contreras M, Noya-Alarcon O, Dominguez-Bello MG, Goldberg TL. VESPA: an optimized protocol for accurate metabarcoding-based characterization of vertebrate eukaryotic endosymbiont and parasite assemblages. Nat Commun 2024; 15:402. [PMID: 38195557 PMCID: PMC10776621 DOI: 10.1038/s41467-023-44521-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/15/2023] [Indexed: 01/11/2024] Open
Abstract
Protocols for characterizing taxonomic assemblages by deep sequencing of short DNA barcode regions (metabarcoding) have revolutionized our understanding of microbial communities and are standardized for bacteria, archaea, and fungi. Unfortunately, comparable methods for host-associated eukaryotes have lagged due to technical challenges. Despite 54 published studies, issues remain with primer complementarity, off-target amplification, and lack of external validation. Here, we present VESPA (Vertebrate Eukaryotic endoSymbiont and Parasite Analysis) primers and optimized metabarcoding protocol for host-associated eukaryotic community analysis. Using in silico prediction, panel PCR, engineered mock community standards, and clinical samples, we demonstrate VESPA to be more effective at resolving host-associated eukaryotic assemblages than previously published methods and to minimize off-target amplification. When applied to human and non-human primate samples, VESPA enables reconstruction of host-associated eukaryotic endosymbiont communities more accurately and at finer taxonomic resolution than microscopy. VESPA has the potential to advance basic and translational science on vertebrate eukaryotic endosymbiont communities, similar to achievements made for bacterial, archaeal, and fungal microbiomes.
Collapse
Affiliation(s)
- Leah A Owens
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA.
| | - Sagan Friant
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA
- Department of Anthropology, The Pennsylvania State University, University Park, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Bruno Martorelli Di Genova
- Department of Medical Microbiology and Immunology, University of Wisconsin-Madison, Madison, WI, USA
- Department of Microbiology and Molecular Genetics, Larner College of Medicine, The University of Vermont, Burlington, VT, USA
| | - Laura J Knoll
- Department of Medical Microbiology and Immunology, University of Wisconsin-Madison, Madison, WI, USA
| | - Monica Contreras
- Center for Biophysics and Biochemistry, Venezuelan Institute of Scientific Research (IVIC), Caracas, Venezuela
| | - Oscar Noya-Alarcon
- Centro Amazónico de Investigación y Control de Enfermedades Tropicales-CAICET, Puerto Ayacucho, Amazonas, Venezuela
| | - Maria G Dominguez-Bello
- Department of Biochemistry and Microbiology, Rutgers University-New Brunswick, New Brunswick, NJ, USA
- Department of Anthropology, Rutgers University, New Brunswick, NJ, USA
- Institute for Food, Nutrition and Health, Rutgers University, New Brunswick, NJ, USA
- Canadian Institute for Advanced Research (CIFAR), Toronto, ON, Canada
| | - Tony L Goldberg
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
10
|
Kerkvliet JJ, Bossers A, Kers JG, Meneses R, Willems R, Schürch AC. Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements. PeerJ 2024; 12:e16695. [PMID: 38188174 PMCID: PMC10771768 DOI: 10.7717/peerj.16695] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 11/28/2023] [Indexed: 01/09/2024] Open
Abstract
Antimicrobial resistance genes (ARG) are commonly found on acquired mobile genetic elements (MGEs) such as plasmids or transposons. Understanding the spread of resistance genes associated with mobile elements (mARGs) across different hosts and environments requires linking ARGs to the existing mobile reservoir within bacterial communities. However, reconstructing mARGs in metagenomic data from diverse ecosystems poses computational challenges, including genome fragment reconstruction (assembly), high-throughput annotation of MGEs, and identification of their association with ARGs. Recently, several bioinformatics tools have been developed to identify assembled fragments of plasmids, phages, and insertion sequence (IS) elements in metagenomic data. These methods can help in understanding the dissemination of mARGs. To streamline the process of identifying mARGs in multiple samples, we combined these tools in an automated high-throughput open-source pipeline, MetaMobilePicker, that identifies ARGs associated with plasmids, IS elements and phages, starting from short metagenomic sequencing reads. This pipeline was used to identify these three elements on a simplified simulated metagenome dataset, comprising whole genome sequences from seven clinically relevant bacterial species containing 55 ARGs, nine plasmids and five phages. The results demonstrated moderate precision for the identification of plasmids (0.57) and phages (0.71), and moderate sensitivity of identification of IS elements (0.58) and ARGs (0.70). In this study, we aim to assess the main causes of this moderate performance of the MGE prediction tools in a comprehensive manner. We conducted a systematic benchmark, considering metagenomic read coverage, contig length cutoffs and investigating the performance of the classification algorithms. Our analysis revealed that the metagenomic assembly process is the primary bottleneck when linking ARGs to identified MGEs in short-read metagenomics sequencing experiments rather than ARGs and MGEs identification by the different tools.
Collapse
Affiliation(s)
- Jesse J. Kerkvliet
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Alex Bossers
- Utrecht University, Institute for Risk Assessment Sciences, Utrecht, The Netherlands
- Wageningen University, Wageningen Bioveterinary Research, Lelystad, The Netherlands
| | - Jannigje G. Kers
- Utrecht University, Institute for Risk Assessment Sciences, Utrecht, The Netherlands
| | - Rodrigo Meneses
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Rob Willems
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Anita C. Schürch
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| |
Collapse
|
11
|
Roach MJ, Beecroft SJ, Mihindukulasuriya KA, Wang L, Paredes A, Cárdenas LAC, Henry-Cocks K, Lima LFO, Dinsdale EA, Edwards RA, Handley SA. Hecatomb: an integrated software platform for viral metagenomics. Gigascience 2024; 13:giae020. [PMID: 38832467 PMCID: PMC11148595 DOI: 10.1093/gigascience/giae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/18/2024] [Accepted: 04/08/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Modern sequencing technologies offer extraordinary opportunities for virus discovery and virome analysis. Annotation of viral sequences from metagenomic data requires a complex series of steps to ensure accurate annotation of individual reads and assembled contigs. In addition, varying study designs will require project-specific statistical analyses. FINDINGS Here we introduce Hecatomb, a bioinformatic platform coordinating commonly used tasks required for virome analysis. Hecatomb means "a great sacrifice." In this setting, Hecatomb is "sacrificing" false-positive viral annotations using extensive quality control and tiered-database searches. Hecatomb processes metagenomic data obtained from both short- and long-read sequencing technologies, providing annotations to individual sequences and assembled contigs. Results are provided in commonly used data formats useful for downstream analysis. Here we demonstrate the functionality of Hecatomb through the reanalysis of a primate enteric and a novel coral reef virome. CONCLUSION Hecatomb provides an integrated platform to manage many commonly used steps for virome characterization, including rigorous quality control, host removal, and both read- and contig-based analysis. Each step is managed using the Snakemake workflow manager with dependency management using Conda. Hecatomb outputs several tables properly formatted for immediate use within popular data analysis and visualization tools, enabling effective data interpretation for a variety of study designs. Hecatomb is hosted on GitHub (github.com/shandley/hecatomb) and is available for installation from Bioconda and PyPI.
Collapse
Affiliation(s)
- Michael J Roach
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
- Adelaide Centre for Epigenetics, University of Adelaide, Adelaide, SA, 5005, Australia
- South Australian Immunogenomics Cancer Institute, University of Adelaide, Adelaide, SA, 5005, Australia
| | - Sarah J Beecroft
- Harry Perkins Institute of Medical Research, Perth, WA, 6009, Australia
| | - Kathie A Mihindukulasuriya
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Leran Wang
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Anne Paredes
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Luis Alberto Chica Cárdenas
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Kara Henry-Cocks
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
| | | | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, SA, Australia
| | - Scott A Handley
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| |
Collapse
|
12
|
Young GR, Nelson A, Stewart CJ, Smith DL. Bacteriophage communities are a reservoir of unexplored microbial diversity in neonatal health and disease. Curr Opin Microbiol 2023; 75:102379. [PMID: 37647765 DOI: 10.1016/j.mib.2023.102379] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 07/30/2023] [Accepted: 08/02/2023] [Indexed: 09/01/2023]
Abstract
Acquisition and development of the gut microbiome are vital for immune education in neonates, especially those born preterm. As such, microbial communities have been extensively studied in the context of postnatal health and disease. Bacterial communities have been the focus of research in this area due to the relative ease of targeted bacterial sequencing and the availability of databases to align and validate sequencing data. Recent increases in high-throughput metagenomic sequencing accessibility have facilitated research to investigate bacteriophages within the context of neonatal gut microbial communities. Focusing on unexplored viral diversity, has identified novel bacteriophage species and previously uncharacterised viral diversity. In doing so, studies have highlighted links between bacteriophages and bacterial community structure in the context of health and disease. However, much remains unknown about the complex relationships between bacteriophages, the bacteria they infect and their human host. With a particular focus on preterm infants, this review highlights opportunities to explore the influence of bacteriophages on developing microbial communities and the tripartite relationships between bacteriophages, bacteria and the neonatal human host. We suggest a focus on expanding collections of isolated bacteriophages that will further our understanding of the growing numbers of bacteriophages identified in metagenomes.
Collapse
Affiliation(s)
- Gregory R Young
- Applied Sciences, Health and Life Sciences, Northumbria University, Newcastle, UK
| | - Andrew Nelson
- Applied Sciences, Health and Life Sciences, Northumbria University, Newcastle, UK
| | | | - Darren L Smith
- Applied Sciences, Health and Life Sciences, Northumbria University, Newcastle, UK.
| |
Collapse
|
13
|
Rangel-Pineros G, Almeida A, Beracochea M, Sakharova E, Marz M, Reyes Muñoz A, Hölzer M, Finn RD. VIRify: An integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models. PLoS Comput Biol 2023; 19:e1011422. [PMID: 37639475 PMCID: PMC10491390 DOI: 10.1371/journal.pcbi.1011422] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 09/08/2023] [Accepted: 08/09/2023] [Indexed: 08/31/2023] Open
Abstract
The study of viral communities has revealed the enormous diversity and impact these biological entities have on various ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterisation of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterisation of viral communities. VIRify identifies viral contigs and prophages from metagenomic assemblies and annotates them using a collection of viral profile hidden Markov models (HMMs). These include our manually-curated profile HMMs, which serve as specific taxonomic markers for a wide range of prokaryotic and eukaryotic viral taxa and are thus used to reliably classify viral contigs. We tested VIRify on assemblies from two microbial mock communities, a large metagenomics study, and a collection of publicly available viral genomic sequences from the human gut. The results showed that VIRify could identify sequences from both prokaryotic and eukaryotic viruses, and provided taxonomic classifications from the genus to the family rank with an average accuracy of 86.6%. In addition, VIRify allowed the detection and taxonomic classification of a range of prokaryotic and eukaryotic viruses present in 243 marine metagenomic assemblies. Finally, the use of VIRify led to a large expansion in the number of taxonomically classified human gut viral sequences and the improvement of outdated and shallow taxonomic classifications. Overall, we demonstrate that VIRify is a novel and powerful resource that offers an enhanced capability to detect a broad range of viral contigs and taxonomically classify them.
Collapse
Affiliation(s)
- Guillermo Rangel-Pineros
- The Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogota, Colombia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Alexandre Almeida
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Martin Beracochea
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Ekaterina Sakharova
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Manja Marz
- RNA Bioinformatics, Friedrich Schiller University, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University, Jena, Germany
| | - Alejandro Reyes Muñoz
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogota, Colombia
| | - Martin Hölzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
- European Virus Bioinformatics Center, Friedrich Schiller University, Jena, Germany
- Methodology and Research Infrastructure, Genome Competence Center (MF1), Robert Koch Institute, Berlin, Germany
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| |
Collapse
|