1
|
Pu L, Shamir R. 4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs. Nucleic Acids Res 2024:gkae799. [PMID: 39287139 DOI: 10.1093/nar/gkae799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 07/13/2024] [Accepted: 09/02/2024] [Indexed: 09/19/2024] Open
Abstract
Microbial communities usually harbor a mix of bacteria, archaea, plasmids, viruses and microeukaryotes. Within these communities, viruses, plasmids, and microeukaryotes coexist in relatively low abundance, yet they engage in intricate interactions with bacteria. Moreover, viruses and plasmids, as mobile genetic elements, play important roles in horizontal gene transfer and the development of antibiotic resistance within microbial populations. However, due to the difficulty of identifying viruses, plasmids, and microeukaryotes in microbial communities, our understanding of these minor classes lags behind that of bacteria and archaea. Recently, several classifiers have been developed to separate one or more minor classes from bacteria and archaea in metagenome assemblies. However, these classifiers often overlook the issue of class imbalance, leading to low precision in identifying the minor classes. Here, we developed a classifier called 4CAC that is able to identify viruses, plasmids, microeukaryotes, and prokaryotes simultaneously from metagenome assemblies. 4CAC generates an initial four-way classification using several sequence length-adjusted XGBoost models and further improves the classification using the assembly graph. Evaluation on simulated and real metagenome datasets demonstrates that 4CAC substantially outperforms existing classifiers and combinations thereof on short reads. On long reads, it also shows an advantage unless the abundance of the minor classes is very low. 4CAC runs 1-2 orders of magnitude faster than the other classifiers. The 4CAC software is available at https://github.com/Shamir-Lab/4CAC.
Collapse
Affiliation(s)
- Lianrong Pu
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- School of Computer Science and Technology, Shandong University, Qingdao, China
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
2
|
Mallawaarachchi V, Wickramarachchi A, Xue H, Papudeshi B, Grigson SR, Bouras G, Prahl RE, Kaphle A, Verich A, Talamantes-Becerra B, Dinsdale EA, Edwards RA. Solving genomic puzzles: computational methods for metagenomic binning. Brief Bioinform 2024; 25:bbae372. [PMID: 39082646 PMCID: PMC11289683 DOI: 10.1093/bib/bbae372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 06/05/2024] [Accepted: 07/15/2024] [Indexed: 08/03/2024] Open
Abstract
Metagenomics involves the study of genetic material obtained directly from communities of microorganisms living in natural environments. The field of metagenomics has provided valuable insights into the structure, diversity and ecology of microbial communities. Once an environmental sample is sequenced and processed, metagenomic binning clusters the sequences into bins representing different taxonomic groups such as species, genera, or higher levels. Several computational tools have been developed to automate the process of metagenomic binning. These tools have enabled the recovery of novel draft genomes of microorganisms allowing us to study their behaviors and functions within microbial communities. This review classifies and analyzes different approaches of metagenomic binning and different refinement, visualization, and evaluation techniques used by these methods. Furthermore, the review highlights the current challenges and areas of improvement present within the field of research.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Hansheng Xue
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Department of Surgery—Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, SA 5011, Australia
| | - Rosa E Prahl
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Anubhav Kaphle
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Andrey Verich
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
- The Kirby Institute, The University of New South Wales, Randwick, Sydney, NSW 2052, Australia
| | - Berenice Talamantes-Becerra
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| |
Collapse
|
3
|
Darabi A, Sobhani S, Aghdam R, Eslahchi C. AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides. BMC Bioinformatics 2024; 25:241. [PMID: 39014300 PMCID: PMC11253361 DOI: 10.1186/s12859-024-05859-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 07/09/2024] [Indexed: 07/18/2024] Open
Abstract
BACKGROUND Using next-generation sequencing technologies, scientists can sequence complex microbial communities directly from the environment. Significant insights into the structure, diversity, and ecology of microbial communities have resulted from the study of metagenomics. The assembly of reads into longer contigs, which are then binned into groups of contigs that correspond to different species in the metagenomic sample, is a crucial step in the analysis of metagenomics. It is necessary to organize these contigs into operational taxonomic units (OTUs) for further taxonomic profiling and functional analysis. For binning, which is synonymous with the clustering of OTUs, the tetra-nucleotide frequency (TNF) is typically utilized as a compositional feature for each OTU. RESULTS In this paper, we present AFIT, a new l-mer statistic vector for each contig, and AFITBin, a novel method for metagenomic binning based on AFIT and a matrix factorization method. To evaluate the performance of the AFIT vector, the t-SNE algorithm is used to compare species clustering based on AFIT and TNF information. In addition, the efficacy of AFITBin is demonstrated on both simulated and real datasets in comparison to state-of-the-art binning methods such as MetaBAT 2, MaxBin 2.0, CONCOT, MetaCon, SolidBin, BusyBee Web, and MetaBinner. To further analyze the performance of the purposed AFIT vector, we compare the barcodes of the AFIT vector and the TNF vector. CONCLUSION The results demonstrate that AFITBin shows superior performance in taxonomic identification compared to existing methods, leveraging the AFIT vector for improved results in metagenomic binning. This approach holds promise for advancing the analysis of metagenomic data, providing more reliable insights into microbial community composition and function. AVAILABILITY A python package is available at: https://github.com/SayehSobhani/AFITBin .
Collapse
Affiliation(s)
- Amin Darabi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Sayeh Sobhani
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Rosa Aghdam
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
4
|
Feng X, Li H. Evaluating and improving the representation of bacterial contents in long-read metagenome assemblies. Genome Biol 2024; 25:92. [PMID: 38605401 PMCID: PMC11007910 DOI: 10.1186/s13059-024-03234-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 03/29/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND In the metagenomic assembly of a microbial community, abundant species are often thought to assemble well given their deeper sequencing coverage. This conjuncture is rarely tested or evaluated in practice. We often do not know how many abundant species are missing and do not have an approach to recover them. RESULTS Here, we propose k-mer based and 16S RNA based methods to measure the completeness of metagenome assembly. We show that even with PacBio high-fidelity (HiFi) reads, abundant species are often not assembled, as high strain diversity may lead to fragmented contigs. We develop a novel reference-free algorithm to recover abundant metagenome-assembled genomes (MAGs) by identifying circular assembly subgraphs. Complemented with a reference-free genome binning heuristics based on dimension reduction, the proposed method rescues many abundant species that would be missing with existing methods and produces competitive results compared to those state-of-the-art binners in terms of total number of near-complete genome bins. CONCLUSIONS Our work emphasizes the importance of metagenome completeness, which has often been overlooked. Our algorithm generates more circular MAGs and moves a step closer to the complete representation of microbial communities.
Collapse
Affiliation(s)
- Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, USA.
| |
Collapse
|
5
|
Roy G, Prifti E, Belda E, Zucker JD. Deep learning methods in metagenomics: a review. Microb Genom 2024; 10:001231. [PMID: 38630611 PMCID: PMC11092122 DOI: 10.1099/mgen.0.001231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024] Open
Abstract
The ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome's key role in our health.
Collapse
Affiliation(s)
- Gaspar Roy
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
| | - Edi Prifti
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Eugeni Belda
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| | - Jean-Daniel Zucker
- IRD, Sorbonne University, UMMISCO, 32 avenue Henry Varagnat, Bondy Cedex, France
- Sorbonne University, INSERM, Nutriomics, 91 bvd de l’hopital, 75013 Paris, France
| |
Collapse
|
6
|
Zulfiqar M, Singh V, Steinbeck C, Sorokina M. Review on computer-assisted biosynthetic capacities elucidation to assess metabolic interactions and communication within microbial communities. Crit Rev Microbiol 2024:1-40. [PMID: 38270170 DOI: 10.1080/1040841x.2024.2306465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 01/12/2024] [Indexed: 01/26/2024]
Abstract
Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communication and the dynamics of their interactions. Advances in sequencing technologies and computational methods enable the reconstruction of taxonomic and functional profiles of microbial communities using high-throughput multi-omics data. Network-based approaches, including community flux balance analysis, aim to model molecular interactions within and between communities. Despite these advances, challenges remain in computer-assisted biosynthetic capacities elucidation, requiring continued innovation and collaboration among diverse scientists. This review provides insights into the current state and future directions of computer-assisted biosynthetic capacities elucidation in studying microbial communities.
Collapse
Affiliation(s)
- Mahnoor Zulfiqar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Vinay Singh
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Data Science and Artificial Intelligence, Research and Development, Pharmaceuticals, Bayer, Berlin, Germany
| |
Collapse
|
7
|
Mallawaarachchi V, Roach MJ, Decewicz P, Papudeshi B, Giles SK, Grigson SR, Bouras G, Hesse RD, Inglis LK, Hutton ALK, Dinsdale EA, Edwards RA. Phables: from fragmented assemblies to high-quality bacteriophage genomes. Bioinformatics 2023; 39:btad586. [PMID: 37738590 PMCID: PMC10563150 DOI: 10.1093/bioinformatics/btad586] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/14/2023] [Accepted: 09/19/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Microbial communities have a profound impact on both human health and various environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of challenges in viral assembly, fragmentation of genomes can occur, and existing tools may recover incomplete genome fragments. Therefore, the identification and characterization of novel phage genomes remain a challenge, leading to the need of improved approaches for phage genome recovery. RESULTS We introduce Phables, a new computational method to resolve phage genomes from fragmented viral metagenome assemblies. Phables identifies phage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that Phables recovers on average over 49% more high-quality phage genomes compared to existing viral identification tools. Furthermore, Phables can resolve variant phage genomes with over 99% average nucleotide identity, a distinction that existing tools are unable to make. AVAILABILITY AND IMPLEMENTATION Phables is available on GitHub at https://github.com/Vini2/phables.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Michael J Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Przemyslaw Decewicz
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
- Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw 02-096, Poland
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Sarah K Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Department of Surgery—Otolaryngology Head and Neck Surgery, Central Adelaide Local Health Network, Adelaide, South Australia 5000, Australia
| | - Ryan D Hesse
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Laura K Inglis
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Abbey L K Hutton
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| |
Collapse
|
8
|
Mallawaarachchi V, Roach MJ, Decewicz P, Papudeshi B, Giles SK, Grigson SR, Bouras G, Hesse RD, Inglis LK, Hutton ALK, Dinsdale EA, Edwards RA. Phables: from fragmented assemblies to high-quality bacteriophage genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535632. [PMID: 37066369 PMCID: PMC10104058 DOI: 10.1101/2023.04.04.535632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Microbial communities influence both human health and different environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies, and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of the challenges in viral assembly, fragmentation of genomes can occur, leading to the need for new approaches in viral identification. Therefore, the identification and characterisation of novel phages remain a challenge. We introduce Phables, a new computational method to resolve phage genomes from fragmented viral metagenome assemblies. Phables identifies phage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that Phables recovers on average over 49% more high-quality phage genomes compared to existing viral identification tools. Furthermore, Phables can resolve variant phage genomes with over 99% average nucleotide identity, a distinction that existing tools are unable to make. Phables is available on GitHub at https://github.com/Vini2/phables.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Michael J Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Przemyslaw Decewicz
- Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw 02-096, Poland
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Sarah K Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - George Bouras
- Adelaide Medical School, The University of Adelaide, North Tce, Adelaide, SA, 5000, Australia
| | - Ryan D Hesse
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Laura K Inglis
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Abbey L K Hutton
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| |
Collapse
|
9
|
Papudeshi B, Vega AA, Souza C, Giles SK, Mallawaarachchi V, Roach MJ, An M, Jacobson N, McNair K, Fernanda Mora M, Pastrana K, Boling L, Leigh C, Harker C, Plewa WS, Grigson SR, Bouras G, Decewicz P, Luque A, Droit L, Handley SA, Wang D, Segall AM, Dinsdale EA, Edwards RA. Host interactions of novel Crassvirales species belonging to multiple families infecting bacterial host, Bacteroides cellulosilyticus WH2. Microb Genom 2023; 9:001100. [PMID: 37665209 PMCID: PMC10569736 DOI: 10.1099/mgen.0.001100] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 08/10/2023] [Indexed: 09/05/2023] Open
Abstract
Bacteroides, the prominent bacteria in the human gut, play a crucial role in degrading complex polysaccharides. Their abundance is influenced by phages belonging to the Crassvirales order. Despite identifying over 600 Crassvirales genomes computationally, only few have been successfully isolated. Continued efforts in isolation of more Crassvirales genomes can provide insights into phage-host-evolution and infection mechanisms. We focused on wastewater samples, as potential sources of phages infecting various Bacteroides hosts. Sequencing, assembly, and characterization of isolated phages revealed 14 complete genomes belonging to three novel Crassvirales species infecting Bacteroides cellulosilyticus WH2. These species, Kehishuvirus sp. 'tikkala' strain Bc01, Kolpuevirus sp. 'frurule' strain Bc03, and 'Rudgehvirus jaberico' strain Bc11, spanned two families, and three genera, displaying a broad range of virion productions. Upon testing all successfully cultured Crassvirales species and their respective bacterial hosts, we discovered that they do not exhibit co-evolutionary patterns with their bacterial hosts. Furthermore, we observed variations in gene similarity, with greater shared similarity observed within genera. However, despite belonging to different genera, the three novel species shared a unique structural gene that encodes the tail spike protein. When investigating the relationship between this gene and host interaction, we discovered evidence of purifying selection, indicating its functional importance. Moreover, our analysis demonstrated that this tail spike protein binds to the TonB-dependent receptors present on the bacterial host surface. Combining these observations, our findings provide insights into phage-host interactions and present three Crassvirales species as an ideal system for controlled infectivity experiments on one of the most dominant members of the human enteric virome.
Collapse
Affiliation(s)
- Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Alejandro A. Vega
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Cole Souza
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Sarah K. Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Michael J. Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Michelle An
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Nicole Jacobson
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Katelyn McNair
- Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA, 992182, USA
| | - Maria Fernanda Mora
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Karina Pastrana
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Lance Boling
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Christopher Leigh
- Adelaide Microscopy, University of Adelaide, Adelaide, SA, 5005, Australia
| | - Clarice Harker
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Will S. Plewa
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Susanna R. Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Przemysław Decewicz
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
- Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Miecznikowa 1, Warsaw, 02-096, Poland
| | - Antoni Luque
- Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA, 992182, USA
- Department of Mathematics and Statistics, San Diego State University, 5500 Campanile Drive, San Diego, CA, 992182, USA
- Present address: Department of Biology, University of Miami, Coral Gables, Florida, USA
| | - Lindsay Droit
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Scott A. Handley
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - David Wang
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Anca M. Segall
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Elizabeth A. Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide SA, 5042, Australia
| |
Collapse
|
10
|
Papudeshi B, Vega AA, Souza C, Giles SK, Mallawaarachchi V, Roach MJ, An M, Jacobson N, McNair K, Mora MF, Pastrana K, Boling L, Leigh C, Harker C, Plewa WS, Grigson SR, Bouras G, Decewicz P, Luque A, Droit L, Handley SA, Wang D, Segall AM, Dinsdale EA, Edwards RA. Host interactions of novel Crassvirales species belonging to multiple families infecting bacterial host, Bacteroides cellulosilyticus WH2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.05.531146. [PMID: 36945541 PMCID: PMC10028833 DOI: 10.1101/2023.03.05.531146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
Bacteroides, the prominent bacteria in the human gut, play a crucial role in degrading complex polysaccharides. Their abundance is influenced by phages belonging to the Crassvirales order. Despite identifying over 600 Crassvirales genomes computationally, only few have been successfully isolated. Continued efforts in isolation of more Crassvirales genomes can provide insights into phage-host-evolution and infection mechanisms. We focused on wastewater samples, as potential sources of phages infecting various Bacteroides hosts. Sequencing, assembly, and characterization of isolated phages revealed 14 complete genomes belonging to three novel Crassvirales species infecting Bacteroides cellulosilyticus WH2. These species, Kehishuvirus sp. 'tikkala' strain Bc01, Kolpuevirus sp. 'frurule' strain Bc03, and 'Rudgehvirus jaberico' strain Bc11, spanned two families, and three genera, displaying a broad range of virion productions. Upon testing all successfully cultured Crassvirales species and their respective bacterial hosts, we discovered that they do not exhibit co-evolutionary patterns with their bacterial hosts. Furthermore, we observed variations in gene similarity, with greater shared similarity observed within genera. However, despite belonging to different genera, the three novel species shared a unique structural gene that encodes the tail spike protein. When investigating the relationship between this gene and host interaction, we discovered evidence of purifying selection, indicating its functional importance. Moreover, our analysis demonstrated that this tail spike protein binds to the TonB-dependent receptors present on the bacterial host surface. Combining these observations, our findings provide insights into phage-host interactions and present three Crassvirales species as an ideal system for controlled infectivity experiments on one of the most dominant members of the human enteric virome. Impact statement Bacteriophages play a crucial role in shaping microbial communities within the human gut. Among the most dominant bacteriophages in the human gut microbiome are Crassvirales phages, which infect Bacteroides. Despite being widely distributed, only a few Crassvirales genomes have been isolated, leading to a limited understanding of their biology, ecology, and evolution. This study isolated and characterized three novel Crassvirales genomes belonging to two different families, and three genera, but infecting one bacterial host, Bacteroides cellulosilyticus WH2. Notably, the observation confirmed the phages are not co-evolving with their bacterial hosts, rather have a shared ability to exploit similar features in their bacterial host. Additionally, the identification of a critical viral protein undergoing purifying selection and interacting with the bacterial receptors opens doors to targeted therapies against bacterial infections. Given Bacteroides role in polysaccharide degradation in the human gut, our findings advance our understanding of the phage-host interactions and could have important implications for the development of phage-based therapies. These discoveries may hold implications for improving gut health and metabolism to support overall well-being. Data summary The genomes used in this research are available on Sequence Read Archive (SRA) within the project, PRJNA737576. Bacteroides cellulosilyticus WH2, Kehishuvirus sp. 'tikkala' strain Bc01, Kolpuevirus sp. ' frurule' strain Bc03, and 'Rudgehvirus jaberico' strain Bc11 are all available on GenBank with accessions NZ_CP072251.1 ( B. cellulosilyticus WH2), QQ198717 (Bc01), QQ198718 (Bc03), and QQ198719 (Bc11), and we are working on making the strains available through ATCC. The 3D protein structures for the three Crassvirales genomes are available to download at doi.org/10.25451/flinders.21946034.
Collapse
Affiliation(s)
- Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Alejandro A. Vega
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Cole Souza
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Sarah K. Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Michael J. Roach
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Michelle An
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Nicole Jacobson
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Katelyn McNair
- Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA, 992182, USA
| | - Maria Fernanda Mora
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Karina Pastrana
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Lance Boling
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Christopher Leigh
- Adelaide Microscopy, University of Adelaide, Adelaide, SA, 5005, Australia
| | - Clarice Harker
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Will S. Plewa
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Susanna R. Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, 5005, Australia
| | - Przemysław Decewicz
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
- Department of Environmental Microbiology and Biotechnology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Miecznikowa 1, Warsaw, 02-096, Poland
| | - Antoni Luque
- Department of Mathematics and Statistics, San Diego State University, 5500 Campanile Drive, San Diego, CA, 992182, USA
- Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, CA, 992182, USA
| | - Lindsay Droit
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Scott A. Handley
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - David Wang
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Anca M. Segall
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Elizabeth A. Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, Adelaide, SA, 5042, Australia
| |
Collapse
|
11
|
Cavattoni M, Comin M. ClassGraph: Improving Metagenomic Read Classification with Overlap Graphs. J Comput Biol 2023. [PMID: 37023405 DOI: 10.1089/cmb.2022.0208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023] Open
Abstract
ABSTRACT Current technologies allow the sequencing of microbial communities directly from the environment without prior culturing. One of the major problems when analyzing a microbial sample is to taxonomically annotate its reads to identify the species it contains. Most methods that are currently available focus on the classification of reads using a set of reference genomes and their k-mers. While in terms of precision these methods have reached percentages of correctness close to perfection, in terms of sensitivity (the actual number of classified reads), the performance is often poor. One reason is that the reads in a sample can be very different from the corresponding reference genomes; for example, viral genomes are usually highly mutated. To address this issue, in this article, we propose ClassGraph, a new taxonomic classification method that makes use of the read overlap graph and applies a label propagation algorithm to refine the results of existing tools. We evaluated its performance on simulated and real datasets with several taxonomic classification tools, and the results showed an improved sensitivity and F-measure, while maintaining high precision. ClassGraph is capable of improving the classification accuracy, especially in difficult cases such as virus and real datasets, where traditional tools can classify <40% of reads.
Collapse
Affiliation(s)
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Padova, Italy
| |
Collapse
|
12
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:v15020587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
- Correspondence:
| |
Collapse
|
13
|
Xiang B, Zhao L, Zhang M. Unitig level assembly graph based metagenome-assembled genome refiner (UGMAGrefiner): A tool to increase completeness and resolution of metagenome-assembled genomes. Comput Struct Biotechnol J 2023; 21:2394-2404. [PMID: 37066122 PMCID: PMC10091015 DOI: 10.1016/j.csbj.2023.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 03/16/2023] [Accepted: 03/16/2023] [Indexed: 04/03/2023] Open
Abstract
De novo assembly of next generation metagenomic reads is widely used to provide taxonomic and functional information of genomes in a microbial community. As strains are functionally specific, recovery of strain-resolved genomes is important but still a challenge. Unitigs and assembly graphs are mid-products generated during the assembly of reads into contigs, and they provide higher resolution for sequences connection information. In this study, we propose a new approach UGMAGrefiner (a unitig level assembly graph-based metagenome-assembled Genome refiner), which uses the connection and coverage information from unitig level assembly graphs to recruit unbinned unitigs to MAGs, adjust binning result, and infer unitigs shared by multiple MAGs. In two simulated datasets (Simdata and CAMI data) and one real dataset (GD02), it outperforms two state-of-the-art assembly graph-based binning refine tools in the refinement of MAGs' quality by stably increasing the completeness of genomes. UGMAGrefiner can identify genome specific clusters of genomes with below 99% average nucleotide identity for homologous sequences. For MAGs mixed with 99% similarity genome clusters, it could distinguish 8 out of 9 genomes in Simdata and 8 out of 12 genomes in CAMI data. In GD02 data, it could identify 16 new unitig clusters representing genome specific regions of mixed genomes and 4 unitig clusters representing new genomes from total 135 MAGs for further functional analysis. UGMAGrefiner provides an efficient way to obtain more complete MAGs and study genome specific functions. It will be useful to improve taxonomic and functional information of genomes after de novo assembly.
Collapse
|
14
|
Kerr EN, Papudeshi B, Haggerty M, Wild N, Goodman AZ, Lima LFO, Hesse RD, Skye A, Mallawaarachchi V, Johri S, Parker S, Dinsdale EA. Stingray epidermal microbiomes are species-specific with local adaptations. Front Microbiol 2023; 14:1031711. [PMID: 36937279 PMCID: PMC10017458 DOI: 10.3389/fmicb.2023.1031711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 02/07/2023] [Indexed: 03/06/2023] Open
Abstract
Marine host-associated microbiomes are affected by a combination of species-specific (e.g., host ancestry, genotype) and habitat-specific features (e.g., environmental physiochemistry and microbial biogeography). The stingray epidermis provides a gradient of characteristics from high dermal denticles coverage with low mucus to reduce dermal denticles and high levels of mucus. Here we investigate the effects of host phylogeny and habitat by comparing the epidermal microbiomes of Myliobatis californica (bat rays) with a mucus rich epidermis, and Urobatis halleri (round rays) with a mucus reduced epidermis from two locations, Los Angeles and San Diego, California (a 150 km distance). We found that host microbiomes are species-specific and distinct from the water column, however composition of M. californica microbiomes showed more variability between individuals compared to U. halleri. The variability in the microbiome of M. californica caused the microbial taxa to be similar across locations, while U. halleri microbiomes were distinct across locations. Despite taxonomic differences, Shannon diversity is the same across the two locations in U. halleri microbiomes suggesting the taxonomic composition are locally adapted, but diversity is maintained by the host. Myliobatis californica and U. halleri microbiomes maintain functional similarity across Los Angeles and San Diego and each ray showed several unique functional genes. Myliobatis californica has a greater relative abundance of RNA Polymerase III-like genes in the microbiome than U. halleri, suggesting specific adaptations to a heavy mucus environment. Construction of Metagenome Assembled Genomes (MAGs) identified novel microbial species within Rhodobacteraceae, Moraxellaceae, Caulobacteraceae, Alcanivoracaceae and Gammaproteobacteria. All MAGs had a high abundance of active RNA processing genes, heavy metal, and antibiotic resistant genes, suggesting the stingray mucus supports high microbial growth rates, which may drive high levels of competition within the microbiomes increasing the antimicrobial properties of the microbes.
Collapse
Affiliation(s)
- Emma N. Kerr
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
- *Correspondence: Emma N. Kerr,
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Miranda Haggerty
- California Department of Fish and Wildlife, San Diego, CA, United States
| | - Natasha Wild
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Asha Z. Goodman
- Department of Biology, San Diego State University, San Diego, CA, United States
| | - Lais F. O. Lima
- Department of Biology, San Diego State University, San Diego, CA, United States
| | - Ryan D. Hesse
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Amber Skye
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
| | - Shaili Johri
- Hopkins Maine Station, Stanford University, Stanford, CA, United States
| | - Sophia Parker
- Department of Biology, San Diego State University, San Diego, CA, United States
| | - Elizabeth A. Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA, Australia
- Elizabeth A. Dinsdale,
| |
Collapse
|
15
|
Mallawaarachchi V, Lin Y. Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs. J Comput Biol 2022; 29:1357-1376. [DOI: 10.1089/cmb.2022.0262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Vijini Mallawaarachchi
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, Australia
| | - Yu Lin
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, Australia
| |
Collapse
|
16
|
Wu Z, Wang Y, Zeng J, Zhou Y. Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking. BMC Genomics 2022; 23:746. [PMID: 36352370 PMCID: PMC9647946 DOI: 10.1186/s12864-022-08967-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 10/25/2022] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND So far, a lot of binning approaches have been intensively developed for untangling metagenome-assembled genomes (MAGs) and evaluated by two main strategies. The strategy by comparison to known genomes prevails over the other strategy by using single-copy genes. However, there is still no dataset with all known genomes for a real (not simulated) bacterial consortium yet. RESULTS Here, we continue investigating the real bacterial consortium F1RT enriched and sequenced by us previously, considering the high possibility to unearth all MAGs, due to its low complexity. The improved F1RT metagenome reassembled by metaSPAdes here utilizes about 98.62% of reads, and a series of analyses for the remaining reads suggests that the possibility of containing other low-abundance organisms in F1RT is greatly low, demonstrating that almost all MAGs are successfully assembled. Then, 4 isolates are obtained and individually sequenced. Based on the 4 isolate genomes and the entire metagenome, an elaborate pipeline is then in-house developed to construct all F1RT MAGs. A series of assessments extensively prove the high reliability of the herein reconstruction. Next, our findings further show that this dataset harbors several properties challenging for binning and thus is suitable to compare advanced binning tools available now or benchmark novel binners. Using this dataset, 8 advanced binning algorithms are assessed, giving useful insights for developing novel approaches. In addition, compared with our previous study, two novel MAGs termed FC8 and FC9 are discovered here, and 7 MAGs are solidly unearthed for species without any available genomes. CONCLUSION To our knowledge, it is the first time to construct a dataset with almost all known MAGs for a not simulated consortium. We hope that this dataset will be used as a routine toolkit to complement mock datasets for evaluating binning methods to further facilitate binning and metagenomic studies in the future.
Collapse
Affiliation(s)
- Ziyao Wu
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China
| | - Yuxiao Wang
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China
| | - Jiaqi Zeng
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China
- Insitute of Pathogeny Biology, School of Basic Medicine, Guilin Medical University, Guilin, 541199, Guangxi, China
| | - Yizhuang Zhou
- Guangxi Key Laboratory of Environmental Exposomics and Entire Lifecycle Health, School of Public Health, Guilin Medical University, Guilin, 541199, Guangxi, China.
| |
Collapse
|
17
|
Pu L, Shamir R. 3CAC: improving the classification of phages and plasmids in metagenomic assemblies using assembly graphs. Bioinformatics 2022; 38:ii56-ii61. [PMID: 36124804 DOI: 10.1093/bioinformatics/btac468] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Bacteriophages and plasmids usually coexist with their host bacteria in microbial communities and play important roles in microbial evolution. Accurately identifying sequence contigs as phages, plasmids and bacterial chromosomes in mixed metagenomic assemblies is critical for further unraveling their functions. Many classification tools have been developed for identifying either phages or plasmids in metagenomic assemblies. However, only two classifiers, PPR-Meta and viralVerify, were proposed to simultaneously identify phages and plasmids in mixed metagenomic assemblies. Due to the very high fraction of chromosome contigs in the assemblies, both tools achieve high precision in the classification of chromosomes but perform poorly in classifying phages and plasmids. Short contigs in these assemblies are often wrongly classified or classified as uncertain. RESULTS Here we present 3CAC, a new three-class classifier that improves the precision of phage and plasmid classification. 3CAC starts with an initial three-class classification generated by existing classifiers and improves the classification of short contigs and contigs with low confidence classification by using proximity in the assembly graph. Evaluation on simulated metagenomes and on real human gut microbiome samples showed that 3CAC outperformed PPR-Meta and viralVerify in both precision and recall, and increased F1-score by 10-60 percentage points. AVAILABILITY AND IMPLEMENTATION The 3CAC software is available on https://github.com/Shamir-Lab/3CAC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lianrong Pu
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| |
Collapse
|
18
|
Lamurias A, Sereika M, Albertsen M, Hose K, Nielsen TD. Metagenomic binning with assembly graph embeddings. Bioinformatics 2022; 38:4481-4487. [PMID: 35972375 PMCID: PMC9525014 DOI: 10.1093/bioinformatics/btac557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 08/02/2022] [Accepted: 08/12/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning. RESULTS We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning. AVAILABILITY AND IMPLEMENTATION GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Mantas Sereika
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, 9000 Aalborg, Denmark
| | | | | | | |
Collapse
|
19
|
Kieft K, Adams A, Salamzade R, Kalan L, Anantharaman K. vRhyme enables binning of viral genomes from metagenomes. Nucleic Acids Res 2022; 50:e83. [PMID: 35544285 PMCID: PMC9371927 DOI: 10.1093/nar/gkac341] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 04/17/2022] [Accepted: 04/22/2022] [Indexed: 01/11/2023] Open
Abstract
Genome binning has been essential for characterization of bacteria, archaea, and even eukaryotes from metagenomes. Yet, few approaches exist for viruses. We developed vRhyme, a fast and precise software for construction of viral metagenome-assembled genomes (vMAGs). vRhyme utilizes single- or multi-sample coverage effect size comparisons between scaffolds and employs supervised machine learning to identify nucleotide feature similarities, which are compiled into iterations of weighted networks and refined bins. To refine bins, vRhyme utilizes unique features of viral genomes, namely a protein redundancy scoring mechanism based on the observation that viruses seldom encode redundant genes. Using simulated viromes, we displayed superior performance of vRhyme compared to available binning tools in constructing more complete and uncontaminated vMAGs. When applied to 10,601 viral scaffolds from human skin, vRhyme advanced our understanding of resident viruses, highlighted by identification of a Herelleviridae vMAG comprised of 22 scaffolds, and another vMAG encoding a nitrate reductase metabolic gene, representing near-complete genomes post-binning. vRhyme will enable a convention of binning uncultivated viral genomes and has the potential to transform metagenome-based viral ecology.
Collapse
Affiliation(s)
- Kristopher Kieft
- Department of Bacteriology, University of Wisconsin–Madison, Madison, WI, USA
- Microbiology Doctoral Training Program, University of Wisconsin–Madison, Madison, WI, USA
| | - Alyssa Adams
- Department of Bacteriology, University of Wisconsin–Madison, Madison, WI, USA
- Computation and Informatics in Biology and Medicine, University of Wisconsin–Madison, Madison, WI, USA
| | - Rauf Salamzade
- Microbiology Doctoral Training Program, University of Wisconsin–Madison, Madison, WI, USA
- Department of Medical Microbiology and Immunology, University of Wisconsin–Madison, Madison, WI, USA
| | - Lindsay Kalan
- Department of Medical Microbiology and Immunology, University of Wisconsin–Madison, Madison, WI, USA
- Department of Medicine, University of Wisconsin–Madison, Madison, WI, USA
| | | |
Collapse
|
20
|
Metagenomic methylation patterns resolve bacterial genomes of unusual size and structural complexity. THE ISME JOURNAL 2022; 16:1921-1931. [PMID: 35459792 PMCID: PMC9296519 DOI: 10.1038/s41396-022-01242-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 04/05/2022] [Accepted: 04/08/2022] [Indexed: 01/01/2023]
Abstract
The plasticity of bacterial and archaeal genomes makes examining their ecological and evolutionary dynamics both exciting and challenging. The same mechanisms that enable rapid genomic change and adaptation confound current approaches for recovering complete genomes from metagenomes. Here, we use strain-specific patterns of DNA methylation to resolve complex bacterial genomes from long-read metagenomic data of a marine microbial consortium, the “pink berries” of the Sippewissett Marsh (USA). Unique combinations of restriction-modification (RM) systems encoded by the bacteria produced distinctive methylation profiles that were used to accurately bin and classify metagenomic sequences. Using this approach, we finished the largest and most complex circularized bacterial genome ever recovered from a metagenome (7.9 Mb with >600 transposons), the finished genome of Thiohalocapsa sp. PB-PSB1 the dominant bacteria in the consortia. From genomes binned by methylation patterns, we identified instances of horizontal gene transfer between sulfur-cycling symbionts (Thiohalocapsa sp. PB-PSB1 and Desulfofustis sp. PB-SRB1), phage infection, and strain-level structural variation. We also linked the methylation patterns of each metagenome-assembled genome with encoded DNA methyltransferases and discovered new RM defense systems, including novel associations of RM systems with RNase toxins.
Collapse
|
21
|
Chandrasiri S, Perera T, Dilhara A, Perera I, Mallawaarachchi V. CH-Bin: A Convex Hull Based Approach for Binning Metagenomic Contigs. Comput Biol Chem 2022; 100:107734. [DOI: 10.1016/j.compbiolchem.2022.107734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022]
|
22
|
Haryono MAS, Law YY, Arumugam K, Liew LCW, Nguyen TQN, Drautz-Moses DI, Schuster SC, Wuertz S, Williams RBH. Recovery of High Quality Metagenome-Assembled Genomes From Full-Scale Activated Sludge Microbial Communities in a Tropical Climate Using Longitudinal Metagenome Sampling. Front Microbiol 2022; 13:869135. [PMID: 35756038 PMCID: PMC9230771 DOI: 10.3389/fmicb.2022.869135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
The analysis of metagenome data based on the recovery of draft genomes (so called metagenome-assembled genomes, or MAG) has assumed an increasingly central role in microbiome research in recent years. Microbial communities underpinning the operation of wastewater treatment plants are particularly challenging targets for MAG analysis due to their high ecological complexity, and remain important, albeit understudied, microbial communities that play ssa key role in mediating interactions between human and natural ecosystems. Here we consider strategies for recovery of MAG sequence from time series metagenome surveys of full-scale activated sludge microbial communities. We generate MAG catalogs from this set of data using several different strategies, including the use of multiple individual sample assemblies, two variations on multi-sample co-assembly and a recently published MAG recovery workflow using deep learning. We obtain a total of just under 9,100 draft genomes, which collapse to around 3,100 non-redundant genomic clusters. We examine the strengths and weaknesses of these approaches in relation to MAG yield and quality, showing that co-assembly may offer advantages over single-sample assembly in the case of metagenome data obtained from closely sampled longitudinal study designs. Around 1,000 MAGs were candidates for being considered high quality, based on single-copy marker gene occurrence statistics, however only 58 MAG formally meet the MIMAG criteria for being high quality draft genomes. These findings carry broader broader implications for performing genome-resolved metagenomics on highly complex communities, the design and implementation of genome recoverability strategies, MAG decontamination and the search for better binning methodology.
Collapse
Affiliation(s)
- Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| | - Ying Yu Law
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Krithika Arumugam
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Larry C-W Liew
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Thi Quynh Ngoc Nguyen
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Daniela I Drautz-Moses
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Stephan C Schuster
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Stefan Wuertz
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| |
Collapse
|
23
|
Du Y, Sun F. HiFine: integrating Hi-c-based and shotgun-based methods to reFine binning of metagenomic contigs. Bioinformatics 2022; 38:2973-2979. [PMID: 35482530 PMCID: PMC9154269 DOI: 10.1093/bioinformatics/btac295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/28/2022] [Accepted: 04/21/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Metagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs' composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample. RESULTS We develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs. AVAILABILITY HiFine is available at https://github.com/dyxstat/HiFine. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuxuan Du
- Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, USA
| |
Collapse
|
24
|
Quantifying and Cataloguing Unknown Sequences within Human Microbiomes. mSystems 2022; 7:e0146821. [PMID: 35258340 PMCID: PMC9052204 DOI: 10.1128/msystems.01468-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called "dark matter" is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified.
Collapse
|
25
|
Inferring Species Compositions of Complex Fungal Communities from Long- and Short-Read Sequence Data. mBio 2022; 13:e0244421. [PMID: 35404122 PMCID: PMC9040722 DOI: 10.1128/mbio.02444-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Our study is unique in that it provides an in-depth comparative study of a real-life complex fungal community analyzed with multiple long- and short-read sequencing approaches. These technologies and their application are currently of great interest to diverse biologists as they seek to characterize the community compositions of microbiomes.
Collapse
|
26
|
Breusing C, Genetti M, Russell SL, Corbett-Detig RB, Beinart RA. Horizontal transmission enables flexible associations with locally adapted symbiont strains in deep-sea hydrothermal vent symbioses. Proc Natl Acad Sci U S A 2022; 119:e2115608119. [PMID: 35349333 PMCID: PMC9168483 DOI: 10.1073/pnas.2115608119] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 03/02/2022] [Indexed: 12/11/2022] Open
Abstract
SignificanceIn marine ecosystems, transmission of microbial symbionts between host generations occurs predominantly through the environment. Yet, it remains largely unknown how host genetics, symbiont competition, environmental conditions, and geography shape the composition of symbionts acquired by individual hosts. To address this question, we applied population genomic approaches to four species of deep-sea hydrothermal vent snails that live in association with chemosynthetic bacteria. Our analyses show that environment is more important to strain-level symbiont composition than host genetics and that symbiont strains show genetic variation indicative of adaptation to the distinct geochemical conditions at each vent site. This corroborates a long-standing hypothesis that hydrothermal vent invertebrates affiliate with locally adapted symbiont strains to cope with the variable conditions characterizing their habitats.
Collapse
Affiliation(s)
- Corinna Breusing
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI 02882
| | - Maximilian Genetti
- Jack Baskin School of Engineering, University of California, Santa Cruz, CA 95064
| | - Shelbi L. Russell
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, CA 95064
| | | | - Roxanne A. Beinart
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI 02882
| |
Collapse
|
27
|
Dufault‐Thompson K, Jiang X. Applications of de Bruijn graphs in microbiome research. IMETA 2022; 1:e4. [PMID: 38867733 PMCID: PMC10989854 DOI: 10.1002/imt2.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 06/14/2024]
Abstract
High-throughput sequencing has become an increasingly central component of microbiome research. The development of de Bruijn graph-based methods for assembling high-throughput sequencing data has been an important part of the broader adoption of sequencing as part of biological studies. Recent advances in the construction and representation of de Bruijn graphs have led to new approaches that utilize the de Bruijn graph data structure to aid in different biological analyses. One type of application of these methods has been in alternative approaches to the assembly of sequencing data like gene-targeted assembly, where only gene sequences are assembled out of larger metagenomes, and differential assembly, where sequences that are differentially present between two samples are assembled. de Bruijn graphs have also been applied for comparative genomics where they can be used to represent large sets of multiple genomes or metagenomes where structural features in the graphs can be used to identify variants, indels, and homologous regions in sequences. These de Bruijn graph-based representations of sequencing data have even begun to be applied to whole sequencing databases for large-scale searches and experiment discovery. de Bruijn graphs have played a central role in how high-throughput sequencing data is worked with, and the rapid development of new tools that rely on these data structures suggests that they will continue to play an important role in biology in the future.
Collapse
Affiliation(s)
- Keith Dufault‐Thompson
- Intramural Research ProgramNational Library of Medicine, National Institutes of HealthBethesdaMarylandUSA
| | - Xiaofang Jiang
- Intramural Research ProgramNational Library of Medicine, National Institutes of HealthBethesdaMarylandUSA
| |
Collapse
|
28
|
Wickramarachchi A, Lin Y. GraphPlas: Refined Classification of Plasmid Sequences Using Assembly Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:57-67. [PMID: 34029192 DOI: 10.1109/tcbb.2021.3082915] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Plasmids are extra-chromosomal genetic materials with important markers that affect the function and behaviour of the microorganisms supporting their environmental adaptations. Hence the identification and recovery of such plasmid sequences from assemblies is a crucial task in metagenomics analysis. In the past, machine learning approaches have been developed to separate chromosomes and plasmids. However, there is always a compromise between precision and recall in the existing classification approaches. The similarity of compositions between chromosomes and their plasmids makes it difficult to separate plasmids and chromosomes with high accuracy. However, high confidence classifications are accurate with a significant compromise of recall, and vice versa. Hence, the requirement exists to have more sophisticated approaches to separate plasmids and chromosomes accurately while retaining an acceptable trade-off between precision and recall. We present GraphPlas, a novel approach for plasmid recovery using coverage, composition and assembly graph topology. We evaluated GraphPlas on simulated and real short read assemblies with varying compositions of plasmids and chromosomes. Our experiments show that GraphPlas is able to significantly improve accuracy in detecting plasmid and chromosomal contigs on top of popular state-of-the-art plasmid detection tools. The source code is freely available at: https://github.com/anuradhawick/GraphPlas.
Collapse
|
29
|
Schmartz GP, Hirsch P, Amand J, Dastbaz J, Fehlmann T, Kern F, Müller R, Keller A. OUP accepted manuscript. Nucleic Acids Res 2022; 50:W132-W137. [PMID: 35489067 PMCID: PMC9252796 DOI: 10.1093/nar/gkac298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/07/2022] [Accepted: 04/14/2022] [Indexed: 11/13/2022] Open
Abstract
Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of metagenomic analysis workflows. In case reference information is lacking, genomic binning is considered to be a state-of-the-art method in mixed culture metagenomic data analysis. In this light, our previously published tool BusyBee Web implements a composition-based binning method efficient enough to function as a rapid online utility. Handling assembled contigs and long nanopore generated reads alike, the webserver provides a wide range of supplementary annotations and visualizations. Half a decade after the initial publication, we revisited existing functionality, added comprehensive visualizations, and increased the number of data analysis customization options for further experimentation. The webserver now allows for visualization-supported differential analysis of samples, which is computationally expensive and typically only performed in coverage-based binning methods. Further, users may now optionally check their uploaded samples for plasmid sequences using PLSDB as a reference database. Lastly, a new application programming interface with a supporting python package was implemented, to allow power users fully automated access to the resource and integration into existing workflows. The webserver is freely available under: https://www.ccb.uni-saarland.de/busybee.
Collapse
Affiliation(s)
- Georges P Schmartz
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Pascal Hirsch
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Clinical Bioinformatics (CLIB), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Jérémy Amand
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Clinical Bioinformatics (CLIB), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Jan Dastbaz
- Microbial Natural Products (MINS), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF), Standort Hannover-Braunschweig, 38124 Braunschweig, Germany
| | - Tobias Fehlmann
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Fabian Kern
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Clinical Bioinformatics (CLIB), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Rolf Müller
- Microbial Natural Products (MINS), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF), Standort Hannover-Braunschweig, 38124 Braunschweig, Germany
| | - Andreas Keller
- To whom correspondence should be addressed. Tel: +49 681 30268611; Fax: +49 681 30268610;
| |
Collapse
|
30
|
Bornemann TLV, Adam PS, Probst AJ. Reconstruction of Archaeal Genomes from Short-Read Metagenomes. Methods Mol Biol 2022; 2522:487-527. [PMID: 36125772 DOI: 10.1007/978-1-0716-2445-6_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
As the majority of biological diversity remains unexplored and uncultured, investigating it requires culture-independent approaches. Archaea in particular suffer from a multitude of issues that make their culturing problematic, from them being frequently members of the rare biosphere, to low growth rates, to them thriving under very specific and often extreme environmental and community conditions that are difficult to replicate. OMICs techniques are state of the art approaches that allow direct high-throughput investigations of environmental samples at all levels from nucleic acids to proteins, lipids, and secondary metabolites. Metagenomics, as the foundation for other OMICs techniques, facilitates the identification and functional characterization of the microbial community members and can be combined with other methods to provide insights into the microbial activities, both on the RNA and protein levels. In this chapter, we provide a step-by-step workflow for the recovery of archaeal genomes from metagenomes, starting from raw short-read sequences. This workflow can be applied to recover bacterial genomes as well.
Collapse
Affiliation(s)
- Till L V Bornemann
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany.
| | - Panagiotis S Adam
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany
| | - Alexander J Probst
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany.
- Centre of Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany.
| |
Collapse
|
31
|
Choudhari J, Choubey J, Verma M, Chatterjee T, Sahariah B. Metagenomics: the boon for microbial world knowledge and current challenges. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00022-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
32
|
Mao G, Wu Y, Zhang Y, Wang X, Zhu Y, Liu B, Wang Y, Li J. DRBin: Metagenomic binning based on deep representation learning. J Genet Genomics 2021; 49:681-684. [PMID: 34979323 DOI: 10.1016/j.jgg.2021.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 12/14/2021] [Accepted: 12/23/2021] [Indexed: 10/19/2022]
Affiliation(s)
- Gang Mao
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yulin Wu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yang Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Xuan Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yan Zhu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China; Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.
| |
Collapse
|
33
|
Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, Zhang L. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J 2021; 19:6301-6314. [PMID: 34900140 PMCID: PMC8640167 DOI: 10.1016/j.csbj.2021.11.028] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 12/16/2022] Open
Abstract
Metagenomic sequencing provides a culture-independent avenue to investigate the complex microbial communities by constructing metagenome-assembled genomes (MAGs). A MAG represents a microbial genome by a group of sequences from genome assembly with similar characteristics. It enables us to identify novel species and understand their potential functions in a dynamic ecosystem. Many computational tools have been developed to construct and annotate MAGs from metagenomic sequencing, however, there is a prominent gap to comprehensively introduce their background and practical performance. In this paper, we have thoroughly investigated the computational tools designed for both upstream and downstream analyses, including metagenome assembly, metagenome binning, gene prediction, functional annotation, taxonomic classification, and profiling. We have categorized the commonly used tools into unique groups based on their functional background and introduced the underlying core algorithms and associated information to demonstrate a comparative outlook. Furthermore, we have emphasized the computational requisition and offered guidance to the users to select the most efficient tools. Finally, we have indicated current limitations, potential solutions, and future perspectives for further improving the tools of MAG construction and annotation. We believe that our work provides a consolidated resource for the current stage of MAG studies and shed light on the future development of more effective MAG analysis tools on metagenomic sequencing.
Collapse
Key Words
- CNN, convolutional neural network
- DBG, De Bruijn graph
- GTDB, Genome Taxonomy Database
- Gene functional annotation
- Gene prediction
- Genome assembly
- HMM, Hidden Markov Model
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- LCA, lowest common ancestor
- LPA, label propagation algorithm
- MAGs, metagenome-assembled genomes
- Metagenome binning
- Metagenome-assembled genomes
- Metagenomic sequencing
- Microbial abundance profiling
- OLC, overlap-layout consensus
- ONT, Oxford Nanopore Technologies
- ORFs, open reading frames
- PacBio, Pacific Biosciences
- QC, quality control
- SLR, synthetic long reads
- TNFs, tetranucleotide frequencies
- Taxonomic classification
Collapse
Affiliation(s)
- Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Debajyoti Chowdhury
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - William K. Cheung
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Aiping Lu
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhaoxiang Bian
- Institute of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Chinese Medicine Clinical Study Center, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
| |
Collapse
|
34
|
Andreace F, Pizzi C, Comin M. MetaProb 2: Metagenomic Reads Binning Based on Assembly Using Minimizers and K-Mers Statistics. J Comput Biol 2021; 28:1052-1062. [PMID: 34448593 DOI: 10.1089/cmb.2021.0270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Current technologies allow the sequencing of microbial communities directly from the environment without prior culturing. One of the major problems when analyzing a microbial sample is to taxonomically annotate its reads to identify the species it contains. The major difficulties of taxonomic analysis are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, and sequencing errors. Microbial communities can be studied with reads clustering, a process referred to as genome binning. In this study, we present MetaProb 2 an unsupervised genome binning method based on reads assembly and probabilistic k-mers statistics. The novelties of MetaProb 2 are the use of minimizers to efficiently assemble reads into unitigs and a community detection algorithm based on graph modularity to cluster unitigs and to detect representative unitigs. The effectiveness of MetaProb 2 is demonstrated in both simulated and real datasets in comparison with state-of-art binning tools such as MetaProb, AbundanceBin, Bimeta, and MetaCluster. On real datasets, it is the only one capable of producing promising results while being parsimonious with computational resources.
Collapse
Affiliation(s)
- Francesco Andreace
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Cinzia Pizzi
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Padova, Italy
| |
Collapse
|
35
|
Zhang Z, Zhang L. METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs. BMC Bioinformatics 2021; 22:378. [PMID: 34294039 PMCID: PMC8296540 DOI: 10.1186/s12859-021-04284-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 12/14/2022] Open
Abstract
Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04284-4.
Collapse
Affiliation(s)
- Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, SAR, China.
| |
Collapse
|
36
|
Mallawaarachchi VG, Wickramarachchi AS, Lin Y. Improving metagenomic binning results with overlapped bins using assembly graphs. Algorithms Mol Biol 2021; 16:3. [PMID: 33947431 PMCID: PMC8097841 DOI: 10.1186/s13015-021-00185-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/20/2021] [Indexed: 11/18/2022] Open
Abstract
Background Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for binning contigs only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). Results In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins. Conclusion GraphBin2 incorporates the coverage information into the assembly graph to refine the binning results obtained from existing binning tools. GraphBin2 also enables the detection of contigs that may belong to multiple species. We show that GraphBin2 outperforms its predecessor GraphBin on both simulated and real datasets. GraphBin2 is freely available at https://github.com/Vini2/GraphBin2. Supplementary Information The online version contains supplementary material available at 10.1186/s13015-021-00185-6.
Collapse
|
37
|
Muralidharan HS, Shah N, Meisel JS, Pop M. Binnacle: Using Scaffolds to Improve the Contiguity and Quality of Metagenomic Bins. Front Microbiol 2021; 12:638561. [PMID: 33717033 PMCID: PMC7945042 DOI: 10.3389/fmicb.2021.638561] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 02/04/2021] [Indexed: 01/03/2023] Open
Abstract
High-throughput sequencing has revolutionized the field of microbiology, however, reconstructing complete genomes of organisms from whole metagenomic shotgun sequencing data remains a challenge. Recovered genomes are often highly fragmented, due to uneven abundances of organisms, repeats within and across genomes, sequencing errors, and strain-level variation. To address the fragmented nature of metagenomic assemblies, scientists rely on a process called binning, which clusters together contigs inferred to originate from the same organism. Existing binning algorithms use oligonucleotide frequencies and contig abundance (coverage) within and across samples to group together contigs from the same organism. However, these algorithms often miss short contigs and contigs from regions with unusual coverage or DNA composition characteristics, such as mobile elements. Here, we propose that information from assembly graphs can assist current strategies for metagenomic binning. We use MetaCarvel, a metagenomic scaffolding tool, to construct assembly graphs where contigs are nodes and edges are inferred based on paired-end reads. We developed a tool, Binnacle, that extracts information from the assembly graphs and clusters scaffolds into comprehensive bins. Binnacle also provides wrapper scripts to integrate with existing binning methods. The Binnacle pipeline can be found on GitHub (https://github.com/marbl/binnacle). We show that binning graph-based scaffolds, rather than contigs, improves the contiguity and quality of the resulting bins, and captures a broader set of the genes of the organisms being reconstructed.
Collapse
Affiliation(s)
- Harihara Subrahmaniam Muralidharan
- Pop Lab, Department of Computer Science, Center for Bioinformatics and Computational Biology, UMIACS, University of Maryland, College Park, MD, United States
| | - Nidhi Shah
- Pop Lab, Department of Computer Science, Center for Bioinformatics and Computational Biology, UMIACS, University of Maryland, College Park, MD, United States
| | - Jacquelyn S Meisel
- Pop Lab, Department of Computer Science, Center for Bioinformatics and Computational Biology, UMIACS, University of Maryland, College Park, MD, United States
| | - Mihai Pop
- Pop Lab, Department of Computer Science, Center for Bioinformatics and Computational Biology, UMIACS, University of Maryland, College Park, MD, United States
| |
Collapse
|
38
|
Gwak HJ, Lee SJ, Rho M. Application of computational approaches to analyze metagenomic data. J Microbiol 2021; 59:233-241. [DOI: 10.1007/s12275-021-0632-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 01/04/2023]
|
39
|
Cormier A, Chebbi MA, Giraud I, Wattier R, Teixeira M, Gilbert C, Rigaud T, Cordaux R. Comparative Genomics of Strictly Vertically Transmitted, Feminizing Microsporidia Endosymbionts of Amphipod Crustaceans. Genome Biol Evol 2020; 13:5995313. [PMID: 33216144 DOI: 10.1093/gbe/evaa245] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2020] [Indexed: 12/19/2022] Open
Abstract
Microsporidia are obligate intracellular eukaryotic parasites of vertebrates and invertebrates. Microsporidia are usually pathogenic and undergo horizontal transmission or a mix of horizontal and vertical transmission. However, cases of nonpathogenic microsporidia, strictly vertically transmitted from mother to offspring, have been reported in amphipod crustaceans. Some of them further evolved the ability to feminize their nontransmitting male hosts into transmitting females. However, our understanding of the evolution of feminization in microsporidia is hindered by a lack of genomic resources. We report the sequencing and analysis of three strictly vertically transmitted microsporidia species for which feminization induction has been demonstrated (Nosema granulosis) or is strongly suspected (Dictyocoela muelleri and Dictyocoela roeselum), along with a draft genome assembly of their host Gammarus roeselii. Contrary to horizontally transmitted microsporidia that form environmental spores that can be purified, feminizing microsporidia cannot be easily isolated from their host cells. Therefore, we cosequenced symbiont and host genomic DNA and devised a computational strategy to obtain genome assemblies for the different partners. Genomic comparison with feminizing Wolbachia bacterial endosymbionts of isopod crustaceans indicated independent evolution of feminization in microsporidia and Wolbachia at the molecular genetic level. Feminization thus represents a remarkable evolutionary convergence of eukaryotic and prokaryotic microorganisms. Furthermore, a comparative genomics analysis of microsporidia allowed us to identify several candidate genes for feminization, involving functions such as DNA binding and membrane fusion. The genomic resources we generated contribute to establish Gammarus roeselii and its microsporidia symbionts as a new model to study the evolution of symbiont-mediated feminization.
Collapse
Affiliation(s)
- Alexandre Cormier
- Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Université de Poitiers, UMR CNRS 7267, France
| | - Mohamed Amine Chebbi
- Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Université de Poitiers, UMR CNRS 7267, France
| | - Isabelle Giraud
- Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Université de Poitiers, UMR CNRS 7267, France
| | - Rémi Wattier
- Laboratoire Biogéosciences, Université Bourgogne Franche-Comté, UMR CNRS 6282, Dijon, France
| | - Maria Teixeira
- Laboratoire Biogéosciences, Université Bourgogne Franche-Comté, UMR CNRS 6282, Dijon, France
| | - Clément Gilbert
- Université Paris-Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie, 91198 Gif-sur-Yvette, France
| | - Thierry Rigaud
- Laboratoire Biogéosciences, Université Bourgogne Franche-Comté, UMR CNRS 6282, Dijon, France
| | - Richard Cordaux
- Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Université de Poitiers, UMR CNRS 7267, France
| |
Collapse
|