1
|
Mallawaarachchi V, Wickramarachchi A, Xue H, Papudeshi B, Grigson SR, Bouras G, Prahl RE, Kaphle A, Verich A, Talamantes-Becerra B, Dinsdale EA, Edwards RA. Solving genomic puzzles: computational methods for metagenomic binning. Brief Bioinform 2024; 25:bbae372. [PMID: 39082646 PMCID: PMC11289683 DOI: 10.1093/bib/bbae372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 06/05/2024] [Accepted: 07/15/2024] [Indexed: 08/03/2024] Open
Abstract
Metagenomics involves the study of genetic material obtained directly from communities of microorganisms living in natural environments. The field of metagenomics has provided valuable insights into the structure, diversity and ecology of microbial communities. Once an environmental sample is sequenced and processed, metagenomic binning clusters the sequences into bins representing different taxonomic groups such as species, genera, or higher levels. Several computational tools have been developed to automate the process of metagenomic binning. These tools have enabled the recovery of novel draft genomes of microorganisms allowing us to study their behaviors and functions within microbial communities. This review classifies and analyzes different approaches of metagenomic binning and different refinement, visualization, and evaluation techniques used by these methods. Furthermore, the review highlights the current challenges and areas of improvement present within the field of research.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Hansheng Xue
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Department of Surgery—Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, SA 5011, Australia
| | - Rosa E Prahl
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Anubhav Kaphle
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Andrey Verich
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
- The Kirby Institute, The University of New South Wales, Randwick, Sydney, NSW 2052, Australia
| | - Berenice Talamantes-Becerra
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| |
Collapse
|
2
|
Gnimpieba EZ, Hartman TW, Do T, Zylla J, Aryal S, Haas SJ, Agany DDM, Gurung BDS, Doe V, Yosufzai Z, Pan D, Campbell R, Huber VC, Sani R, Gadhamshetty V, Lushbough C. Biofilm marker discovery with cloud-based dockerized metagenomics analysis of microbial communities. Brief Bioinform 2024; 25:bbae429. [PMID: 39266450 PMCID: PMC11392556 DOI: 10.1093/bib/bbae429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 08/04/2024] [Accepted: 08/16/2024] [Indexed: 09/14/2024] Open
Abstract
In an environment, microbes often work in communities to achieve most of their essential functions, including the production of essential nutrients. Microbial biofilms are communities of microbes that attach to a nonliving or living surface by embedding themselves into a self-secreted matrix of extracellular polymeric substances. These communities work together to enhance their colonization of surfaces, produce essential nutrients, and achieve their essential functions for growth and survival. They often consist of diverse microbes including bacteria, viruses, and fungi. Biofilms play a critical role in influencing plant phenotypes and human microbial infections. Understanding how these biofilms impact plant health, human health, and the environment is important for analyzing genotype-phenotype-driven rule-of-life functions. Such fundamental knowledge can be used to precisely control the growth of biofilms on a given surface. Metagenomics is a powerful tool for analyzing biofilm genomes through function-based gene and protein sequence identification (functional metagenomics) and sequence-based function identification (sequence metagenomics). Metagenomic sequencing enables a comprehensive sampling of all genes in all organisms present within a biofilm sample. However, the complexity of biofilm metagenomic study warrants the increasing need to follow the Findability, Accessibility, Interoperability, and Reusable (FAIR) Guiding Principles for scientific data management. This will ensure that scientific findings can be more easily validated by the research community. This study proposes a dockerized, self-learning bioinformatics workflow to increase the community adoption of metagenomics toolkits in a metagenomics and meta-transcriptomics investigation. Our biofilm metagenomics workflow self-learning module includes integrated learning resources with an interactive dockerized workflow. This module will allow learners to analyze resources that are beneficial for aggregating knowledge about biofilm marker genes, proteins, and metabolic pathways as they define the composition of specific microbial communities. Cloud and dockerized technology can allow novice learners-even those with minimal knowledge in computer science-to use complicated bioinformatics tools. Our cloud-based, dockerized workflow splits biofilm microbiome metagenomics analyses into four easy-to-follow submodules. A variety of tools are built into each submodule. As students navigate these submodules, they learn about each tool used to accomplish the task. The downstream analysis is conducted using processed data obtained from online resources or raw data processed via Nextflow pipelines. This analysis takes place within Vertex AI's Jupyter notebook instance with R and Python kernels. Subsequently, results are stored and visualized in Google Cloud storage buckets, alleviating the computational burden on local resources. The result is a comprehensive tutorial that guides bioinformaticians of any skill level through the entire workflow. It enables them to comprehend and implement the necessary processes involved in this integrated workflow from start to finish. This manuscript describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
Collapse
Affiliation(s)
- Etienne Z Gnimpieba
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Timothy W Hartman
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Tuyen Do
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Jessica Zylla
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Shiva Aryal
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Samuel J Haas
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Diing D M Agany
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Bichar Dip Shrestha Gurung
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| | - Valena Doe
- Google Cloud, 1900 Reston Metro Plaza, Reston, Virginia, 20190, United States
| | - Zelaikha Yosufzai
- Health Data and AI, Deloitte Consulting LLP, 1919 N Lynn St., Suite 1500, Arlington, Virginia, 22209, United States
| | - Daniel Pan
- Health Data and AI, Deloitte Consulting LLP, 1919 N Lynn St., Suite 1500, Arlington, Virginia, 22209, United States
| | - Ross Campbell
- Health Data and AI, Deloitte Consulting LLP, 1919 N Lynn St., Suite 1500, Arlington, Virginia, 22209, United States
| | - Victor C Huber
- Basic Biomedical Sciences Division, University of South Dakota, 414 E. Clark St, Vermillion, South Dakota, 57069, United States
| | - Rajesh Sani
- South Dakota School of Mines & Technology, 501 E. Saint Joseph St., Rapid City, South Dakota, 57701, United States
| | - Venkataramana Gadhamshetty
- South Dakota School of Mines & Technology, 501 E. Saint Joseph St., Rapid City, South Dakota, 57701, United States
| | - Carol Lushbough
- Biomedical Engineering Department, University of South Dakota, 4800 N. Career Ave., Suite 221, Sioux Falls, South Dakota, 57107, United States
| |
Collapse
|
3
|
Darabi A, Sobhani S, Aghdam R, Eslahchi C. AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides. BMC Bioinformatics 2024; 25:241. [PMID: 39014300 PMCID: PMC11253361 DOI: 10.1186/s12859-024-05859-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 07/09/2024] [Indexed: 07/18/2024] Open
Abstract
BACKGROUND Using next-generation sequencing technologies, scientists can sequence complex microbial communities directly from the environment. Significant insights into the structure, diversity, and ecology of microbial communities have resulted from the study of metagenomics. The assembly of reads into longer contigs, which are then binned into groups of contigs that correspond to different species in the metagenomic sample, is a crucial step in the analysis of metagenomics. It is necessary to organize these contigs into operational taxonomic units (OTUs) for further taxonomic profiling and functional analysis. For binning, which is synonymous with the clustering of OTUs, the tetra-nucleotide frequency (TNF) is typically utilized as a compositional feature for each OTU. RESULTS In this paper, we present AFIT, a new l-mer statistic vector for each contig, and AFITBin, a novel method for metagenomic binning based on AFIT and a matrix factorization method. To evaluate the performance of the AFIT vector, the t-SNE algorithm is used to compare species clustering based on AFIT and TNF information. In addition, the efficacy of AFITBin is demonstrated on both simulated and real datasets in comparison to state-of-the-art binning methods such as MetaBAT 2, MaxBin 2.0, CONCOT, MetaCon, SolidBin, BusyBee Web, and MetaBinner. To further analyze the performance of the purposed AFIT vector, we compare the barcodes of the AFIT vector and the TNF vector. CONCLUSION The results demonstrate that AFITBin shows superior performance in taxonomic identification compared to existing methods, leveraging the AFIT vector for improved results in metagenomic binning. This approach holds promise for advancing the analysis of metagenomic data, providing more reliable insights into microbial community composition and function. AVAILABILITY A python package is available at: https://github.com/SayehSobhani/AFITBin .
Collapse
Affiliation(s)
- Amin Darabi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Sayeh Sobhani
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Rosa Aghdam
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran.
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
4
|
Reynolds G, Mumey B, Strnadova‐Neeley V, Lachowiec J. Hijacking a rapid and scalable metagenomic method reveals subgenome dynamics and evolution in polyploid plants. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11581. [PMID: 39184200 PMCID: PMC11342227 DOI: 10.1002/aps3.11581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/26/2023] [Accepted: 12/20/2023] [Indexed: 08/27/2024]
Abstract
Premise The genomes of polyploid plants archive the evolutionary events leading to their present forms. However, plant polyploid genomes present numerous hurdles to the genome comparison algorithms for classification of polyploid types and exploring genome dynamics. Methods Here, the problem of intra- and inter-genome comparison for examining polyploid genomes is reframed as a metagenomic problem, enabling the use of the rapid and scalable MinHashing approach. To determine how types of polyploidy are described by this metagenomic approach, plant genomes were examined from across the polyploid spectrum for both k-mer composition and frequency with a range of k-mer sizes. In this approach, no subgenome-specific k-mers are identified; rather, whole-chromosome k-mer subspaces were utilized. Results Given chromosome-scale genome assemblies with sufficient subgenome-specific repetitive element content, literature-verified subgenomic and genomic evolutionary relationships were revealed, including distinguishing auto- from allopolyploidy and putative progenitor genome assignment. The sequences responsible were the rapidly evolving landscape of transposable elements. An investigation into the MinHashing parameters revealed that the downsampled k-mer space (genomic signatures) produced excellent approximations of sequence similarity. Furthermore, the clustering approach used for comparison of the genomic signatures is scrutinized to ensure applicability of the metagenomics-based method. Discussion The easily implementable and highly computationally efficient MinHashing-based sequence comparison strategy enables comparative subgenomics and genomics for large and complex polyploid plant genomes. Such comparisons provide evidence for polyploidy-type subgenomic assignments. In cases where subgenome-specific repeat signal may not be adequate given a chromosomes' global k-mer profile, alternative methods that are more specific but more computationally complex outperform this approach.
Collapse
Affiliation(s)
- Gillian Reynolds
- Plant Sciences and Plant Pathology DepartmentMontana State UniversityBozeman59717MontanaUSA
- Gianforte School of ComputingMontana State UniversityBozeman59717MontanaUSA
| | - Brendan Mumey
- Gianforte School of ComputingMontana State UniversityBozeman59717MontanaUSA
| | | | - Jennifer Lachowiec
- Plant Sciences and Plant Pathology DepartmentMontana State UniversityBozeman59717MontanaUSA
| |
Collapse
|
5
|
Hou S, Tang T, Cheng S, Liu Y, Xia T, Chen T, Fuhrman J, Sun F. DeepMicroClass sorts metagenomic contigs into prokaryotes, eukaryotes and viruses. NAR Genom Bioinform 2024; 6:lqae044. [PMID: 38711860 PMCID: PMC11071121 DOI: 10.1093/nargab/lqae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/18/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
Sequence classification facilitates a fundamental understanding of the structure of microbial communities. Binary metagenomic sequence classifiers are insufficient because environmental metagenomes are typically derived from multiple sequence sources. Here we introduce a deep-learning based sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e. viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. DeepMicroClass achieved high performance for all sequence classes at various tested sequence lengths ranging from 500 bp to 100 kbps. By benchmarking on a synthetic dataset with variable sequence class composition, we showed that DeepMicroClass obtained better performance for eukaryotic, plasmid and viral contig classification than other state-of-the-art predictors. DeepMicroClass achieved comparable performance on viral sequence classification with geNomad and VirSorter2 when benchmarked on the CAMI II marine dataset. Using a coastal daily time-series metagenomic dataset as a case study, we showed that microbial eukaryotes and prokaryotic viruses are integral to microbial communities. By analyzing monthly metagenomes collected at HOT and BATS, we found relatively higher viral read proportions in the subsurface layer in late summer, consistent with the seasonal viral infection patterns prevalent in these areas. We expect DeepMicroClass will promote metagenomic studies of under-appreciated sequence types.
Collapse
Affiliation(s)
- Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Tianqi Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Siliangyu Cheng
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Yuanhao Liu
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Tian Xia
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Ting Chen
- Department of Computer Science and Technology, Institute of Artificial Intelligence & BNRist, Tsinghua University, Beijing 100084, China
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
6
|
Trecarten S, Fongang B, Liss M. Current Trends and Challenges of Microbiome Research in Prostate Cancer. Curr Oncol Rep 2024; 26:477-487. [PMID: 38573440 DOI: 10.1007/s11912-024-01520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/05/2024]
Abstract
PURPOSE OF REVIEW The role of the gut microbiome in prostate cancer is an emerging area of research interest. However, no single causative organism has yet been identified. The goal of this paper is to examine the role of the microbiome in prostate cancer and summarize the challenges relating to methodology in specimen collection, sequencing technology, and interpretation of results. RECENT FINDINGS Significant heterogeneity still exists in methodology for stool sampling/storage, preservative options, DNA extraction, and sequencing database selection/in silico processing. Debate persists over primer choice in amplicon sequencing as well as optimal methods for data normalization. Statistical methods for longitudinal microbiome analysis continue to undergo refinement. While standardization of methodology may help yield more consistent results for organism identification in prostate cancer, this is a difficult task due to considerable procedural variation at each step in the process. Further reproducibility and methodology research is required.
Collapse
Affiliation(s)
- Shaun Trecarten
- Department of Urology, UT Health San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229, USA
| | - Bernard Fongang
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, UT Health San Antonio, San Antonio, TX, USA
- Department of Biochemistry and Structural Biology, UT Health San Antonio, San Antonio, TX, USA
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, TX, USA
| | - Michael Liss
- Department of Urology, UT Health San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229, USA.
| |
Collapse
|
7
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
8
|
Wang Z, You R, Han H, Liu W, Sun F, Zhu S. Effective binning of metagenomic contigs using contrastive multi-view representation learning. Nat Commun 2024; 15:585. [PMID: 38233391 PMCID: PMC10794208 DOI: 10.1038/s41467-023-44290-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 12/07/2023] [Indexed: 01/19/2024] Open
Abstract
Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).
Collapse
Affiliation(s)
- Ziye Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Haitao Han
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Wei Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
- Shanghai Qi Zhi Institute, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| |
Collapse
|
9
|
Feng T, Wu S, Zhou H, Fang Z. MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model. Gigascience 2024; 13:giae047. [PMID: 39101782 PMCID: PMC11299106 DOI: 10.1093/gigascience/giae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/31/2024] [Accepted: 06/24/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs. RESULTS We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D. CONCLUSIONS To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.
Collapse
Affiliation(s)
- Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Shufang Wu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|
10
|
Kleikamp HBC, Grouzdev D, Schaasberg P, van Valderen R, van der Zwaan R, Wijgaart RVD, Lin Y, Abbas B, Pronk M, van Loosdrecht MCM, Pabst M. Metaproteomics, metagenomics and 16S rRNA sequencing provide different perspectives on the aerobic granular sludge microbiome. WATER RESEARCH 2023; 246:120700. [PMID: 37866247 DOI: 10.1016/j.watres.2023.120700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 09/29/2023] [Accepted: 10/04/2023] [Indexed: 10/24/2023]
Abstract
The tremendous progress in sequencing technologies has made DNA sequencing routine for microbiome studies. Additionally, advances in mass spectrometric techniques have extended conventional proteomics into the field of microbial ecology. However, systematic studies that provide a better understanding of the complementary nature of these 'omics' approaches, particularly for complex environments such as wastewater treatment sludge, are urgently needed. Here, we describe a comparative metaomics study on aerobic granular sludge from three different wastewater treatment plants. For this, we employed metaproteomics, whole metagenome, and 16S rRNA amplicon sequencing to study the same granule material with uniform size. We furthermore compare the taxonomic profiles using the Genome Taxonomy Database (GTDB) to enhance the comparability between the different approaches. Though the major taxonomies were consistently identified in the different aerobic granular sludge samples, the taxonomic composition obtained by the different omics techniques varied significantly at the lower taxonomic levels, which impacts the interpretation of the nutrient removal processes. Nevertheless, as demonstrated by metaproteomics, the genera that were consistently identified in all techniques cover the majority of the protein biomass. The established metaomics data and the contig classification pipeline are publicly available, which provides a valuable resource for further studies on metabolic processes in aerobic granular sludge.
Collapse
Affiliation(s)
- Hugo B C Kleikamp
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands.
| | | | - Pim Schaasberg
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ramon van Valderen
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ramon van der Zwaan
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Roel van de Wijgaart
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Yuemei Lin
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ben Abbas
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Mario Pronk
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | | | - Martin Pabst
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands.
| |
Collapse
|
11
|
Walsh LH, Coakley M, Walsh AM, O'Toole PW, Cotter PD. Bioinformatic approaches for studying the microbiome of fermented food. Crit Rev Microbiol 2023; 49:693-725. [PMID: 36287644 DOI: 10.1080/1040841x.2022.2132850] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 08/11/2022] [Accepted: 09/28/2022] [Indexed: 11/03/2022]
Abstract
High-throughput DNA sequencing-based approaches continue to revolutionise our understanding of microbial ecosystems, including those associated with fermented foods. Metagenomic and metatranscriptomic approaches are state-of-the-art biological profiling methods and are employed to investigate a wide variety of characteristics of microbial communities, such as taxonomic membership, gene content and the range and level at which these genes are expressed. Individual groups and consortia of researchers are utilising these approaches to produce increasingly large and complex datasets, representing vast populations of microorganisms. There is a corresponding requirement for the development and application of appropriate bioinformatic tools and pipelines to interpret this data. This review critically analyses the tools and pipelines that have been used or that could be applied to the analysis of metagenomic and metatranscriptomic data from fermented foods. In addition, we critically analyse a number of studies of fermented foods in which these tools have previously been applied, to highlight the insights that these approaches can provide.
Collapse
Affiliation(s)
- Liam H Walsh
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
- School of Microbiology, University College Cork, Ireland
| | - Mairéad Coakley
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
| | - Aaron M Walsh
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
| | - Paul W O'Toole
- School of Microbiology, University College Cork, Ireland
- APC Microbiome Ireland, University College Cork, Ireland
| | - Paul D Cotter
- Teagasc Food Research Centre, Moorepark, Fermoy, Cork, Ireland
- APC Microbiome Ireland, University College Cork, Ireland
- VistaMilk SFI Research Centre, Teagasc, Moorepark, Fermoy, Cork, Ireland
| |
Collapse
|
12
|
Khan MA, Rahman AU, Khan B, Al-Mijalli SH, Alswat AS, Amin A, Eid RA, Zaki MSA, Butt S, Ahmad J, Fayad E, Ullah A. Antibiotic Resistance Profiling and Phylogenicity of Uropathogenic Bacteria Isolated from Patients with Urinary Tract Infections. Antibiotics (Basel) 2023; 12:1508. [PMID: 37887209 PMCID: PMC10603882 DOI: 10.3390/antibiotics12101508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/16/2023] [Accepted: 09/20/2023] [Indexed: 10/28/2023] Open
Abstract
Urinary tract infections (UTIs) are healthcare problems that commonly involve bacterial and, in some rare instances, fungal or viral infections. The irrational prescription and use of antibiotics in UTI treatment have led to an increase in antibiotic resistance. Urine samples (145) were collected from male and female patients from Lower Dir, Khyber Pakhtunkhwa (KP), Pakistan. Biochemical analyses were carried out to identify uropathogens. Molecular analysis for the identification of 16S ribosomal RNA in samples was performed via Sanger sequencing. Evolutionary linkage was determined using Molecular Evolutionary Genetics Analysis-7 (MEGA-7). The study observed significant growth in 52% of the samples (83/145). Gram-negative bacteria were identified in 85.5% of samples, while Gram-positive bacteria were reported in 14.5%. The UTI prevalence was 67.5% in females and 32.5% in males. The most prevalent uropathogenic bacteria were Klebsiella pneumoniae (39.7%, 33/83), followed by Escherichia coli (27.7%, 23/83), Pseudomonas aeruginosa (10.8%, 9/83), Staphylococcus aureus (9.6%, 8/83), Proteus mirabilis (7.2%, 6/83) and Staphylococcus saprophyticus (4.8%, 4/83). Phylogenetic analysis was performed using the neighbor-joining method, further confirming the relation of the isolates in our study with previously reported uropathogenic isolates. Antibiotic susceptibility tests identified K. pneumonia as being sensitive to imipenem (100%) and fosfomycin (78.7%) and resistant to cefuroxime (100%) and ciprofloxacin (94%). Similarly, E. coli showed high susceptibility to imipenem (100%), fosfomycin (78.2%) and nitrofurantoin (78.2%), and resistance to ciprofloxacin (100%) and cefuroxime (100%). Imipenem was identified as the most effective antibiotic, while cefuroxime and ciprofloxacin were the least. The phylogenetic tree analysis indicated that K. pneumoniae, E. coli, P. aeruginosa, S. aureus and P. mirabilis clustered with each other and the reference sequences, indicating high similarity (based on 16S rRNA sequencing). It can be concluded that genetically varied uropathogenic organisms are commonly present within the KP population. Our findings demonstrate the need to optimize antibiotic use in treating UTIs and the prevention of antibiotic resistance in the KP population.
Collapse
Affiliation(s)
- Muhammad Ajmal Khan
- Centre for Biotechnology and Microbiology, University of Peshawar, Peshawar 25000, Khyber Pakhtunkhwa, Pakistan; (M.A.K.); (J.A.)
| | - Atta Ur Rahman
- Leprosy Laboratory, Department of Parasite Biology, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro 21040-360, Brazil;
| | - Bakhtawar Khan
- Institute of Brain Disorders, Department of Physiology, Dalian Medical University, Dalian 116044, China
| | - Samiah Hamad Al-Mijalli
- Department of Biology, College of Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Amal S. Alswat
- Department of Biotechnology, College of Sciences, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia; (A.S.A.); (E.F.)
| | - Aftab Amin
- Center for Cancer Research, and State Key Lab of Molecular Neuroscience, Division of Life Science, Hong Kong University of Science and Technology, Hong Kong, China;
| | - Refaat A. Eid
- Department of Pathology, College of Medicine, King Khalid University, P.O. Box 62529, Abha 12573, Saudi Arabia;
| | - Mohamed Samir A. Zaki
- Anatomy Department, College of Medicine, King Khalid University, P.O. Box 62529, Abha 61413, Saudi Arabia;
| | - Sadia Butt
- Department of Microbiology, Shaheed Benazir Bhutto Women University Peshawar, Peshawar 25000, Khyber Pakhtunkhwa, Pakistan;
| | - Jamshaid Ahmad
- Centre for Biotechnology and Microbiology, University of Peshawar, Peshawar 25000, Khyber Pakhtunkhwa, Pakistan; (M.A.K.); (J.A.)
| | - Eman Fayad
- Department of Biotechnology, College of Sciences, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia; (A.S.A.); (E.F.)
| | - Amin Ullah
- Department of Health & Biological Sciences, Abasyn University Peshawar, Peshawar 25000, Khyber Pakhtunkhwa, Pakistan
| |
Collapse
|
13
|
Kishore D, Birzu G, Hu Z, DeLisi C, Korolev KS, Segrè D. Inferring microbial co-occurrence networks from amplicon data: a systematic evaluation. mSystems 2023; 8:e0096122. [PMID: 37338270 PMCID: PMC10469762 DOI: 10.1128/msystems.00961-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 04/14/2023] [Indexed: 06/21/2023] Open
Abstract
Microbes commonly organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) amplicon profiling provides snapshots that reveal the phylogenies and abundance profiles of these microbial communities. These snapshots, when collected from multiple samples, can reveal the co-occurrence of microbes, providing a glimpse into the network of associations in these communities. However, the inference of networks from 16S data involves numerous steps, each requiring specific tools and parameter choices. Moreover, the extent to which these steps affect the final network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that can convert 16S sequencing data into a network of microbial associations. Through this process, we map how different choices of algorithms and parameters affect the co-occurrence network and identify the steps that contribute substantially to the variance. We further determine the tools and parameters that generate robust co-occurrence networks and develop consensus network algorithms based on benchmarks with mock and synthetic data sets. The Microbial Co-occurrence Network Explorer, or MiCoNE (available at https://github.com/segrelab/MiCoNE) follows these default tools and parameters and can help explore the outcome of these combinations of choices on the inferred networks. We envisage that this pipeline could be used for integrating multiple data sets and generating comparative analyses and consensus networks that can guide our understanding of microbial community assembly in different biomes. IMPORTANCE Mapping the interrelationships between different species in a microbial community is important for understanding and controlling their structure and function. The surge in the high-throughput sequencing of microbial communities has led to the creation of thousands of data sets containing information about microbial abundances. These abundances can be transformed into co-occurrence networks, providing a glimpse into the associations within microbiomes. However, processing these data sets to obtain co-occurrence information relies on several complex steps, each of which involves numerous choices of tools and corresponding parameters. These multiple options pose questions about the robustness and uniqueness of the inferred networks. In this study, we address this workflow and provide a systematic analysis of how these choices of tools affect the final network and guidelines on appropriate tool selection for a particular data set. We also develop a consensus network algorithm that helps generate more robust co-occurrence networks based on benchmark synthetic data sets.
Collapse
Affiliation(s)
- Dileep Kishore
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Gabriel Birzu
- Department of Physics, Boston University, Boston, Massachusetts, USA
- Department of Applied Physics, Stanford University, Stanford, California, USA
| | - Zhenjun Hu
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
| | - Charles DeLisi
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
- Department of Physics, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Kirill S. Korolev
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Physics, Boston University, Boston, Massachusetts, USA
| | - Daniel Segrè
- Bioinformatics Program, Boston University, Boston, Massachusetts, USA
- Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Physics, Boston University, Boston, Massachusetts, USA
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Department of Biology, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
14
|
Pavia MJ, Chede A, Wu Z, Cadillo-Quiroz H, Zhu Q. BinaRena: a dedicated interactive platform for human-guided exploration and binning of metagenomes. MICROBIOME 2023; 11:186. [PMID: 37596696 PMCID: PMC10439608 DOI: 10.1186/s40168-023-01625-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 07/16/2023] [Indexed: 08/20/2023]
Abstract
BACKGROUND Exploring metagenomic contigs and "binning" them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. RESULTS We present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time. In demonstration of BinaRena's usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. CONCLUSIONS BinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted at https://github.com/qiyunlab/binarena , together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data. Video Abstract.
Collapse
Affiliation(s)
- Michael J Pavia
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- Biodesign Swette Center for Environmental Biotechnology, Arizona State University, Tempe, AZ, USA
| | - Abhinav Chede
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
| | - Zijun Wu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Hinsby Cadillo-Quiroz
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- Biodesign Swette Center for Environmental Biotechnology, Arizona State University, Tempe, AZ, USA.
| | - Qiyun Zhu
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
15
|
Wang J, Xu S, Zhao K, Song G, Zhao S, Liu R. Risk control of antibiotics, antibiotic resistance genes (ARGs) and antibiotic resistant bacteria (ARB) during sewage sludge treatment and disposal: A review. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 877:162772. [PMID: 36933744 DOI: 10.1016/j.scitotenv.2023.162772] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/14/2023] [Accepted: 03/06/2023] [Indexed: 05/06/2023]
Abstract
Sewage sludge is an important reservoir of antibiotics, antibiotic resistance genes (ARGs), and antibiotic resistant bacteria (ARB) in wastewater treatment plants (WWTPs), and the reclamation of sewage sludge potentially threats human health and environmental safety. Sludge treatment and disposal are expected to control these risks, and this review summarizes the fate and controlling efficiency of antibiotics, ARGs, and ARB in sludge involved in different processes, i.e., disintegration, anaerobic digestion, aerobic composting, drying, pyrolysis, constructed wetland, and land application. Additionally, the analysis and characterization methods of antibiotics, ARGs, and ARB in complicate sludge are reviewed, and the quantitative risk assessment approaches involved in land application are comprehensively discussed. This review benefits process optimization of sludge treatment and disposal, with regard to environmental risks control of antibiotics, ARGs, and ARB in sludge. Furthermore, current research limitations and gaps, e.g., the antibiotic resistance risk assessment in sludge-amended soil, are proposed to advance the future studies.
Collapse
Affiliation(s)
- Jiaqi Wang
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; Yangze Eco-Environment Engineering Research Center, China Three Gorges Corporation, Beijing 100038, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Siqi Xu
- Center for Water and Ecology, State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China
| | - Kai Zhao
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ge Song
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shunan Zhao
- Center for Water and Ecology, State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China
| | - Ruiping Liu
- Key Laboratory of Drinking Water Science and Technology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; Center for Water and Ecology, State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
16
|
Pu J, Yang J, Lu S, Jin D, Luo X, Xiong Y, Bai X, Zhu W, Huang Y, Wu S, Niu L, Liu L, Xu J. Species-Level Taxonomic Characterization of Uncultured Core Gut Microbiota of Plateau Pika. Microbiol Spectr 2023; 11:e0349522. [PMID: 37067438 PMCID: PMC10269723 DOI: 10.1128/spectrum.03495-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 02/13/2023] [Indexed: 04/18/2023] Open
Abstract
Rarely has the vast diversity of bacteria on Earth been profiled, particularly on inaccessible plateaus. These uncultured microbes, which are also known as "microbial dark matter," may play crucial roles in maintaining the ecosystem and are linked to human health, regarding pathogenicity and prebioticity. The plateau pika (Ochotona curzoniae) is a small burrowing steppe lagomorph that is endemic to the Qinghai-Tibetan Plateau and is a keystone species in the maintenance of ecological balance. We used a combination of full-length 16S rRNA amplicon sequencing, shotgun metagenomics, and metabolomics to elucidate the species-level community structure and the metabolic potential of the gut microbiota of the plateau pika. Using a full-length 16S rRNA metataxonomic approach, we clustered 618 (166 ± 35 per sample) operational phylogenetic units (OPUs) from 105 plateau pika samples and assigned them to 215 known species, 226 potentially new species, and 177 higher hierarchical taxa. Notably, 39 abundant OPUs (over 60% total relative abundance) are found in over 90% of the samples, thereby representing a "core microbiota." They are all classified as novel microbial lineages, from the class to the species level. Using metagenomic reads, we independently assembled and binned 109 high-quality, species-level genome bins (SGBs). Then, a precise taxonomic assignment was performed to clarify the phylogenetic consistency of the SGBs and the 16S rRNA amplicons. Thus, the majority of the core microbes possess their genomes. SGBs belonging to the genus Treponema, the families Muribaculaceae, Lachnospiraceae, and Oscillospiraceae, and the order Eubacteriales are abundant in the metagenomic samples. In addition, multiple CAZymes are detected in these SGBs, indicating their efficient utilization of plant biomass. As the most widely connected metabolite with the core microbiota, tryptophan may relate to host environmental adaptation. Our investigation allows for a greater comprehension of the composition and functional capacity of the gut microbiota of the plateau pika. IMPORTANCE The great majority of microbial species remain uncultured, severely limiting their taxonomic characterization and biological understanding. The plateau pika (Ochotona curzoniae) is a small burrowing steppe lagomorph that is endemic to the Qinghai-Tibetan Plateau and is considered to be the keystone species in the maintenance of ecological stability. We comprehensively investigated the gut microbiota of the plateau pika via a multiomics endeavor. Combining full-length 16S rRNA metataxonomics, shotgun metagenomics, and metabolomics, we elucidated the species-level taxonomic assignment of the core uncultured intestinal microbiota of the plateau pika and revealed their correlation to host nutritional metabolism and adaptation. Our findings provide insights into the microbial diversity and biological significance of alpine animals.
Collapse
Affiliation(s)
- Ji Pu
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Jing Yang
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing, China
| | - Shan Lu
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing, China
| | - Dong Jin
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing, China
| | - Xuelian Luo
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Yanwen Xiong
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Xiangning Bai
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Wentao Zhu
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Yuyuan Huang
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Shusheng Wu
- Yushu Prefecture Center for Disease Control and Prevention, Yushu, China
| | - Lina Niu
- Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou, China
| | - Liyun Liu
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Jianguo Xu
- State Key Laboratory of Infectious Disease Prevention and Control and National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
- Research Units of Discovery of Unknown Bacteria and Function, Chinese Academy of Medical Sciences, Beijing, China
- Institute of Public Health, Nankai University, Tianjing, China
| |
Collapse
|
17
|
Tadrent N, Dedeine F, Hervé V. SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes. F1000Res 2022; 11:1522. [PMID: 36875992 PMCID: PMC9978240 DOI: 10.12688/f1000research.128091.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/23/2023] [Indexed: 03/02/2023] Open
Abstract
Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. Methods: To overcome this challenge, we developed SnakeMAGs, a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM, GUNC), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Results: Using termite gut metagenomic datasets, we showed that SnakeMAGs is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Importantly, these additional MAGs showed no significant difference compared to the other ones in terms of completeness, contamination, genome size nor relative abundance. Conclusions: Overall, it should make the reconstruction of MAGs more accessible to microbiologists. SnakeMAGs as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs.
Collapse
Affiliation(s)
- Nachida Tadrent
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France
| | - Franck Dedeine
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France
| | - Vincent Hervé
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France
- Université Paris-Saclay, INRAE, AgroParisTech, UMR SayFood, Palaiseau, 91120, France
| |
Collapse
|
18
|
Tadrent N, Dedeine F, Hervé V. SnakeMAGs: a simple, efficient, flexible and scalable workflow to reconstruct prokaryotic genomes from metagenomes. F1000Res 2022; 11:1522. [PMID: 36875992 PMCID: PMC9978240 DOI: 10.12688/f1000research.128091.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/01/2022] [Indexed: 01/05/2024] Open
Abstract
Background: Over the last decade, we have observed in microbial ecology a transition from gene-centric to genome-centric analyses. Indeed, the advent of metagenomics combined with binning methods, single-cell genome sequencing as well as high-throughput cultivation methods have contributed to the continuing and exponential increase of available prokaryotic genomes, which in turn has favored the exploration of microbial metabolisms. In the case of metagenomics, data processing, from raw reads to genome reconstruction, involves various steps and software which can represent a major technical obstacle. Methods: To overcome this challenge, we developed SnakeMAGs, a simple workflow that can process Illumina data, from raw reads to metagenome-assembled genomes (MAGs) classification and relative abundance estimate. It integrates state-of-the-art bioinformatic tools to sequentially perform: quality control of the reads (illumina-utils, Trimmomatic), host sequence removal (optional step, using Bowtie2), assembly (MEGAHIT), binning (MetaBAT2), quality filtering of the bins (CheckM), classification of the MAGs (GTDB-Tk) and estimate of their relative abundance (CoverM). Developed with the popular Snakemake workflow management system, it can be deployed on various architectures, from single to multicore and from workstation to computer clusters and grids. It is also flexible since users can easily change parameters and/or add new rules. Results: Using termite gut metagenomic datasets, we showed that SnakeMAGs is slower but allowed the recovery of more MAGs encompassing more diverse phyla compared to another similar workflow named ATLAS. Conclusions: Overall, it should make the reconstruction of MAGs more accessible to microbiologists. SnakeMAGs as well as test files and an extended tutorial are available at https://github.com/Nachida08/SnakeMAGs.
Collapse
Affiliation(s)
- Nachida Tadrent
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France
| | - Franck Dedeine
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France
| | - Vincent Hervé
- Institut de Recherche sur la Biologie de l'Insecte, UMR 7261, CNRS-Université de Tours, Tours, 37200, France
- Université Paris-Saclay, INRAE, AgroParisTech, UMR SayFood, Palaiseau, 91120, France
| |
Collapse
|
19
|
Mallawaarachchi V, Lin Y. Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs. J Comput Biol 2022; 29:1357-1376. [DOI: 10.1089/cmb.2022.0262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Vijini Mallawaarachchi
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, Australia
| | - Yu Lin
- School of Computing, College of Engineering and Computer Science, Australian National University, Canberra, Australia
| |
Collapse
|
20
|
Vollmers J, Wiegand S, Lenk F, Kaster AK. How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner. Nucleic Acids Res 2022; 50:e76. [PMID: 35536293 PMCID: PMC9303271 DOI: 10.1093/nar/gkac294] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 11/12/2022] Open
Abstract
As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.
Collapse
Affiliation(s)
- John Vollmers
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Sandra Wiegand
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Florian Lenk
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Anne-Kristin Kaster
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
21
|
Chandrasiri S, Perera T, Dilhara A, Perera I, Mallawaarachchi V. CH-Bin: A Convex Hull Based Approach for Binning Metagenomic Contigs. Comput Biol Chem 2022; 100:107734. [DOI: 10.1016/j.compbiolchem.2022.107734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022]
|
22
|
Nishimura L, Fujito N, Sugimoto R, Inoue I. Detection of Ancient Viruses and Long-Term Viral Evolution. Viruses 2022; 14:v14061336. [PMID: 35746807 PMCID: PMC9230872 DOI: 10.3390/v14061336] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 12/22/2022] Open
Abstract
The COVID-19 outbreak has reminded us of the importance of viral evolutionary studies as regards comprehending complex viral evolution and preventing future pandemics. A unique approach to understanding viral evolution is the use of ancient viral genomes. Ancient viruses are detectable in various archaeological remains, including ancient people's skeletons and mummified tissues. Those specimens have preserved ancient viral DNA and RNA, which have been vigorously analyzed in the last few decades thanks to the development of sequencing technologies. Reconstructed ancient pathogenic viral genomes have been utilized to estimate the past pandemics of pathogenic viruses within the ancient human population and long-term evolutionary events. Recent studies revealed the existence of non-pathogenic viral genomes in ancient people's bodies. These ancient non-pathogenic viruses might be informative for inferring their relationships with ancient people's diets and lifestyles. Here, we reviewed the past and ongoing studies on ancient pathogenic and non-pathogenic viruses and the usage of ancient viral genomes to understand their long-term viral evolution.
Collapse
Affiliation(s)
- Luca Nishimura
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
| | - Naoko Fujito
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
| | - Ryota Sugimoto
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
| | - Ituro Inoue
- Human Genetics Laboratory, National Institute of Genetics, Mishima 411-8540, Japan; (L.N.); (N.F.); (R.S.)
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima 411-8540, Japan
- Correspondence: ; Tel.: +81-55-981-6795
| |
Collapse
|
23
|
Sinha D, Sharma A, Mishra DC, Rai A, Lal SB, Kumar S, Farooqi MS, Chaturvedi KK. MetaConClust - Unsupervised Binning of Metagenomics Data using Consensus Clustering. Curr Genomics 2022; 23:137-146. [PMID: 36778980 PMCID: PMC9878838 DOI: 10.2174/1389202923666220413114659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 01/18/2022] [Accepted: 02/21/2022] [Indexed: 11/22/2022] Open
Abstract
Background: Binning of metagenomic reads is an active area of research, and many unsupervised machine learning-based techniques have been used for taxonomic independent binning of metagenomic reads. Objective: It is important to find the optimum number of the cluster as well as develop an efficient pipeline for deciphering the complexity of the microbial genome. Methods: Applying unsupervised clustering techniques for binning requires finding the optimal number of clusters beforehand and is observed to be a difficult task. This paper describes a novel method, MetaConClust, using coverage information for grouping of contigs and automatically finding the optimal number of clusters for binning of metagenomics data using a consensus-based clustering approach. The coverage of contigs in a metagenomics sample has been observed to be directly proportional to the abundance of species in the sample and is used for grouping of data in the first phase by MetaConClust. The Partitioning Around Medoid (PAM) method is used for clustering in the second phase for generating bins with the initial number of clusters determined automatically through a consensus-based method. Results: Finally, the quality of the obtained bins is tested using silhouette index, rand Index, recall, precision, and accuracy. Performance of MetaConClust is compared with recent methods and tools using benchmarked low complexity simulated and real metagenomic datasets and is found better for unsupervised and comparable for hybrid methods. Conclusion: This is suggestive of the proposition that the consensus-based clustering approach is a promising method for automatically finding the number of bins for metagenomics data.
Collapse
Affiliation(s)
- Dipro Sinha
- These authors contributed equally to this work
| | - Anu Sharma
- Address correspondence to this author at the Division of Agriculture Bioinformatics, ICAR-IASRI, New Delhi- 110012, India; E-mail:
| | | | | | | | | | | | | |
Collapse
|
24
|
Ko KKK, Chng KR, Nagarajan N. Metagenomics-enabled microbial surveillance. Nat Microbiol 2022; 7:486-496. [PMID: 35365786 DOI: 10.1038/s41564-022-01089-w] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 02/22/2022] [Indexed: 12/13/2022]
Abstract
Lessons learnt from the COVID-19 pandemic include increased awareness of the potential for zoonoses and emerging infectious diseases that can adversely affect human health. Although emergent viruses are currently in the spotlight, we must not forget the ongoing toll of morbidity and mortality owing to antimicrobial resistance in bacterial pathogens and to vector-borne, foodborne and waterborne diseases. Population growth, planetary change, international travel and medical tourism all contribute to the increasing frequency of infectious disease outbreaks. Surveillance is therefore of crucial importance, but the diversity of microbial pathogens, coupled with resource-intensive methods, compromises our ability to scale-up such efforts. Innovative technologies that are both easy to use and able to simultaneously identify diverse microorganisms (viral, bacterial or fungal) with precision are necessary to enable informed public health decisions. Metagenomics-enabled surveillance methods offer the opportunity to improve detection of both known and yet-to-emerge pathogens.
Collapse
Affiliation(s)
- Karrie K K Ko
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,Department of Microbiology, Singapore General Hospital, Singapore, Singapore.,Department of Molecular Pathology, Singapore General Hospital, Singapore, Singapore.,Duke-NUS Medical School, Singapore, Singapore.,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore
| | - Kern Rei Chng
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,National Centre for Food Science, Singapore Food Agency, Singapore, Singapore
| | - Niranjan Nagarajan
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore. .,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore.
| |
Collapse
|
25
|
Assessment of Hydrocarbon Degradation Potential in Microbial Communities in Arctic Sea Ice. Microorganisms 2022; 10:microorganisms10020328. [PMID: 35208784 PMCID: PMC8879337 DOI: 10.3390/microorganisms10020328] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/27/2022] [Accepted: 01/28/2022] [Indexed: 02/04/2023] Open
Abstract
The anthropogenic release of oil hydrocarbons into the cold marine environment is an increasing concern due to the elevated usage of sea routes and the exploration of new oil drilling sites in Arctic areas. The aim of this study was to evaluate prokaryotic community structures and the genetic potential of hydrocarbon degradation in the metagenomes of seawater, sea ice, and crude oil encapsulating the sea ice of the Norwegian fjord, Ofotfjorden. Although the results indicated substantial differences between the structure of prokaryotic communities in seawater and sea ice, the crude oil encapsulating sea ice (SIO) showed increased abundances of many genera-containing hydrocarbon-degrading organisms, including Bermanella, Colwellia, and Glaciecola. Although the metagenome of seawater was rich in a variety of hydrocarbon degradation-related functional genes (HDGs) associated with the metabolism of n-alkanes, and mono- and polyaromatic hydrocarbons, most of the normalized gene counts were highest in the clean sea ice metagenome, whereas in SIO, these counts were the lowest. The long-chain alkane degradation gene almA was detected from all the studied metagenomes and its counts exceeded ladA and alkB counts in both sea ice metagenomes. In addition, almA was related to the most diverse group of prokaryotic genera. Almost all 18 good- and high-quality metagenome-assembled genomes (MAGs) had diverse HDGs profiles. The MAGs recovered from the SIO metagenome belonged to the abundant taxa, such as Glaciecola, Bermanella, and Rhodobacteracea, in this environment. The genera associated with HDGs were often previously known as hydrocarbon-degrading genera. However, a substantial number of new associations, either between already known hydrocarbon-degrading genera and new HDGs or between genera not known to contain hydrocarbon degraders and multiple HDGs, were found. The superimposition of the results of comparing HDG associations with taxonomy, the HDG profiles of MAGs, and the full genomes of organisms in the KEGG database suggest that the found relationships need further investigation and verification.
Collapse
|
26
|
Boeri L, Donnaloja F, Campanile M, Sardelli L, Tunesi M, Fusco F, Giordano C, Albani D. Using integrated meta-omics to appreciate the role of the gut microbiota in epilepsy. Neurobiol Dis 2022; 164:105614. [PMID: 35017031 DOI: 10.1016/j.nbd.2022.105614] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 12/31/2021] [Accepted: 01/05/2022] [Indexed: 12/16/2022] Open
Abstract
The way the human microbiota may modulate neurological pathologies is a fascinating matter of research. Epilepsy is a common neurological disorder, which has been largely investigated in correlation with microbiota health and function. However, the mechanisms that regulate this apparent connection are scarcely defined, and extensive effort has been conducted to understand the role of microbiota in preventing and reducing epileptic seizures. Intestinal bacteria seem to modulate the seizure frequency mainly by releasing neurotransmitters and inflammatory mediators. In order to elucidate the complex microbial contribution to epilepsy pathophysiology, integrated meta-omics could be pivotal. In fact, the combination of two or more meta-omics approaches allows a multifactorial study of microbial activity within the frame of disease or drug treatments. In this review, we provide information depicting and supporting the use of multi-omics to study the microbiota-epilepsy connection. We described different meta-omics analyses (metagenomics, metatranscriptomics, metaproteomics and metabolomics), focusing on current technical challenges in stool collection procedures, sample extraction methods and data processing. We further discussed the current advantages and limitations of using the integrative approach of multi-omics in epilepsy investigations.
Collapse
Affiliation(s)
- Lucia Boeri
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Francesca Donnaloja
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Marzia Campanile
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Lorenzo Sardelli
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Marta Tunesi
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Federica Fusco
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Carmen Giordano
- Department of Chemistry, Materials and Chemical Engineering "Giulio Natta", Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy.
| | - Diego Albani
- Department of Neuroscience, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, via Mario Negri 2, 20156 Milan, Italy.
| |
Collapse
|
27
|
Choudhari J, Choubey J, Verma M, Chatterjee T, Sahariah B. Metagenomics: the boon for microbial world knowledge and current challenges. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00022-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
28
|
Voigt B, Fischer O, Krumnow C, Herta C, Dabrowski PW. NGS read classification using AI. PLoS One 2021; 16:e0261548. [PMID: 34936673 PMCID: PMC8694450 DOI: 10.1371/journal.pone.0261548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 12/03/2021] [Indexed: 11/19/2022] Open
Abstract
Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient's sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen's genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.
Collapse
Affiliation(s)
- Benjamin Voigt
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Oliver Fischer
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Krumnow
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Christian Herta
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| | - Piotr Wojciech Dabrowski
- Center for Bio-Medical image and Information processing (CBMI), HTW University of Applied Sciences, Berlin, Germany
| |
Collapse
|
29
|
Wan XH. Artificial intelligence reveals roles of gut microbiota in driving human colorectal cancer evolution. Artif Intell Cancer 2021; 2:69-78. [DOI: 10.35713/aic.v2.i5.69] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 10/24/2021] [Accepted: 10/27/2021] [Indexed: 02/06/2023] Open
Abstract
With the rapid development of high-throughput sequencing and artificial intelligence (AI) techniques, gut mucosal microbiota begins to be recognized as critical drivers of human colorectal cancer (CRC). Various AI approaches have been designed to obtain effective information from enormous numbers of microbial cells residing in gut mucosal as well as cancer cells. These mainly include detection of microbial markers for early clinical diagnosis of stage-specific CRC, characterization of pathogenic bacterial activities via genomic and transcriptomic analyses, and prediction of interplay between bacterial drivers and host immune systems. Here I review the current progresses of AI applications in profiling gut microbiomes linked to CRC initiation and development. I further look forward to future AI research for improving our understanding of the roles of gut microbiota in CRC evolution.
Collapse
Affiliation(s)
- Xue-Hua Wan
- TEDA Institute of Biological Sciences and Biotechnology, Nankai University, Tianjin 300457, China
| |
Collapse
|
30
|
Mining the Microbiome and Microbiota-Derived Molecules in Inflammatory Bowel Disease. Int J Mol Sci 2021; 22:ijms222011243. [PMID: 34681902 PMCID: PMC8540913 DOI: 10.3390/ijms222011243] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/12/2022] Open
Abstract
The intestinal microbiota is a complex community that consists of an ecosystem with a dynamic interplay between bacteria, fungi, archaea, and viruses. Recent advances in model systems have revealed that the gut microbiome is critical for maintaining homeostasis through metabolic digestive function, immune regulation, and intestinal barrier integrity. Taxonomic shifts in the intestinal microbiota are strongly correlated with a multitude of human diseases, including inflammatory bowel disease (IBD). However, many of these studies have been descriptive, and thus the understanding of the cause and effect relationship often remains unclear. Using non-human experimental model systems such as gnotobiotic mice, probiotic mono-colonization, or prebiotic supplementation, researchers have defined numerous species-level functions of the intestinal microbiota that have produced therapeutic candidates for IBD. Despite these advances, the molecular mechanisms responsible for the function of much of the microbiota and the interplay with host cellular processes remain areas of tremendous research potential. In particular, future research will need to unlock the functional molecular units of the microbiota in order to utilize this untapped resource of bioactive molecules for therapy. This review will highlight the advances and remaining challenges of microbiota-based functional studies and therapeutic discovery, specifically in IBD. One of the limiting factors for reviewing this topic is the nascent development of this area with information on some drug candidates still under early commercial development. We will also highlight the current and evolving strategies, including in the biotech industry, used for the discovery of microbiota-derived bioactive molecules in health and disease.
Collapse
|
31
|
Liu L, Wang Y, Yang Y, Wang D, Cheng SH, Zheng C, Zhang T. Charting the complexity of the activated sludge microbiome through a hybrid sequencing strategy. MICROBIOME 2021; 9:205. [PMID: 34649602 PMCID: PMC8518188 DOI: 10.1186/s40168-021-01155-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 09/01/2021] [Indexed: 06/01/2023]
Abstract
BACKGROUND Long-read sequencing has shown its tremendous potential to address genome assembly challenges, e.g., achieving the first telomere-to-telomere assembly of a gapless human chromosome. However, many issues remain unresolved when leveraging error-prone long reads to characterize high-complexity metagenomes, for instance, complete/high-quality genome reconstruction from highly complex systems. RESULTS Here, we developed an iterative haplotype-resolved hierarchical clustering-based hybrid assembly (HCBHA) approach that capitalizes on a hybrid (error-prone long reads and high-accuracy short reads) sequencing strategy to reconstruct (near-) complete genomes from highly complex metagenomes. Using the HCBHA approach, we first phase short and long reads from the highly complex metagenomic dataset into different candidate bacterial haplotypes, then perform hybrid assembly of each bacterial genome individually. We reconstructed 557 metagenome-assembled genomes (MAGs) with an average N50 of 574 Kb from a deeply sequenced, highly complex activated sludge (AS) metagenome. These high-contiguity MAGs contained 14 closed genomes and 111 high-quality (HQ) MAGs including full-length rRNA operons, which accounted for 61.1% of the microbial community. Leveraging the near-complete genomes, we also profiled the metabolic potential of the AS microbiome and identified 2153 biosynthetic gene clusters (BGCs) encoded within the recovered AS MAGs. CONCLUSION Our results established the feasibility of an iterative haplotype-resolved HCBHA approach to reconstruct (near-) complete genomes from highly complex ecosystems, providing new insights into "complete metagenomics". The retrieved high-contiguity MAGs illustrated that various biosynthetic gene clusters (BGCs) were harbored in the AS microbiome. The high diversity of BGCs highlights the potential to discover new natural products biosynthesized by the AS microbial community, aside from the traditional function (e.g., organic carbon and nitrogen removal) in wastewater treatment. Video Abstract.
Collapse
Affiliation(s)
- Lei Liu
- Environmental Microbiome Engineering and Biotechnology Laboratory, The University of Hong Kong, Hong Kong SAR, China
- State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Yulin Wang
- Environmental Microbiome Engineering and Biotechnology Laboratory, The University of Hong Kong, Hong Kong SAR, China
| | - Yu Yang
- Environmental Microbiome Engineering and Biotechnology Laboratory, The University of Hong Kong, Hong Kong SAR, China
| | - Depeng Wang
- Nextomics Biosciences Institute, Wuhan, China
| | - Suk Hang Cheng
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Chunmiao Zheng
- State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology Laboratory, The University of Hong Kong, Hong Kong SAR, China
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
32
|
Dextro RB, Delbaje E, Cotta SR, Zehr JP, Fiore MF. Trends in Free-access Genomic Data Accelerate Advances in Cyanobacteria Taxonomy. JOURNAL OF PHYCOLOGY 2021; 57:1392-1402. [PMID: 34291461 DOI: 10.1111/jpy.13200] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 07/16/2021] [Indexed: 06/13/2023]
Abstract
Free access databases of DNA sequences containing microbial genetic information have changed the way scientists look at the microbial world. Currently, the NCBI database includes about 516 distinct search results for Cyanobacterial genomes distributed in a taxonomy based on a polyphasic approach. While their classification and taxonomic relationships are widely used as is, recent proposals to alter their grouping include further exploring the relationship between Cyanobacteria and Melainabacteria. Nowadays, most cyanobacteria still are named under the Botanical Code; however, there is a proposal made by the Genome Taxonomy Database (GTDB) to harmonize cyanobacteria nomenclature with the other bacteria, an initiative to standardize microbial taxonomy based on genome phylogeny, in order to contribute to an overall better phylogenetic resolution of microbiota. Furthermore, the assembly level of the genomes and their geographical origin demonstrates some trends of cyanobacteria genomics on the scientific community, such as low availability of complete genomes and underexplored sampling locations. By describing how available cyanobacterial genomes from free-access databases fit within different taxonomic classifications, this mini-review provides a holistic view of the current knowledge of cyanobacteria and indicates some steps towards improving our efforts to create a more cohesive and inclusive classifying system, which can be greatly improved by using large-scale sequencing and metagenomic techniques.
Collapse
Affiliation(s)
- Rafael B Dextro
- Center for Nuclear Energy in Agriculture, University of São Paulo, Avenida Centenário 303, 13416-000, Piracicaba, SP, Brazil
| | - Endrews Delbaje
- Center for Nuclear Energy in Agriculture, University of São Paulo, Avenida Centenário 303, 13416-000, Piracicaba, SP, Brazil
| | - Simone R Cotta
- Center for Nuclear Energy in Agriculture, University of São Paulo, Avenida Centenário 303, 13416-000, Piracicaba, SP, Brazil
| | - Jonathan P Zehr
- Ocean Sciences Department, University of California, 1156 High Street, Santa Cruz, California, 95064, USA
| | - Marli F Fiore
- Center for Nuclear Energy in Agriculture, University of São Paulo, Avenida Centenário 303, 13416-000, Piracicaba, SP, Brazil
| |
Collapse
|
33
|
Zacho CM, Bager MA, Margaryan A, Gravlund P, Galatius A, Rasmussen AR, Allentoft ME. Uncovering the genomic and metagenomic research potential in old ethanol-preserved snakes. PLoS One 2021; 16:e0256353. [PMID: 34424926 PMCID: PMC8382189 DOI: 10.1371/journal.pone.0256353] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 08/04/2021] [Indexed: 11/19/2022] Open
Abstract
Natural history museum collections worldwide represent a tremendous resource of information on past and present biodiversity. Fish, reptiles, amphibians and many invertebrate collections have often been preserved in ethanol for decades or centuries and our knowledge on the genomic and metagenomic research potential of such material is limited. Here, we use ancient DNA protocols, combined with shotgun sequencing to test the molecular preservation in liver, skin and bone tissue from five old (1842 to 1964) museum specimens of the common garter snake (Thamnophis sirtalis). When mapping reads to a T. sirtalis reference genome, we find that the DNA molecules are highly damaged with short average sequence lengths (38-64 bp) and high C-T deamination, ranging from 9% to 21% at the first position. Despite this, the samples displayed relatively high endogenous DNA content, ranging from 26% to 56%, revealing that genome-scale analyses are indeed possible from all specimens and tissues included here. Of the three tested types of tissue, bone shows marginally but significantly higher DNA quality in these metrics. Though at least one of the snakes had been exposed to formalin, neither the concentration nor the quality of the obtained DNA was affected. Lastly, we demonstrate that these specimens display a diverse and tissue-specific microbial genetic profile, thus offering authentic metagenomic data despite being submerged in ethanol for many years. Our results emphasize that historical museum collections continue to offer an invaluable source of information in the era of genomics.
Collapse
Affiliation(s)
- Claus M. Zacho
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Martina A. Bager
- Section for EvoGenomics, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ashot Margaryan
- Section for EvoGenomics, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- Center for Evolutionary Hologenomics, University of Copenhagen, Copenhagen, Denmark
| | | | - Anders Galatius
- Department of Bioscience, Aarhus University, Roskilde, Denmark
| | - Arne R. Rasmussen
- Institute of Conservation, Royal Danish Academy—Architecture, Design, Conservation, Copenhagen, Denmark
| | - Morten E. Allentoft
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Perth, Australia
| |
Collapse
|
34
|
Cahn JKB, Piel J. Anwendungen von Einzelzellmethoden in der mikrobiellen Naturstoffforschung. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.201900532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Jackson K. B. Cahn
- Institut für Mikrobiologie Eidgenössische Technische Hochschule Zürich (ETH) 8093 Zürich Schweiz
| | - Jörn Piel
- Institut für Mikrobiologie Eidgenössische Technische Hochschule Zürich (ETH) 8093 Zürich Schweiz
| |
Collapse
|
35
|
Stevens BR, Roesch L, Thiago P, Russell JT, Pepine CJ, Holbert RC, Raizada MK, Triplett EW. Depression phenotype identified by using single nucleotide exact amplicon sequence variants of the human gut microbiome. Mol Psychiatry 2021; 26:4277-4287. [PMID: 31988436 DOI: 10.1038/s41380-020-0652-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 01/13/2020] [Accepted: 01/16/2020] [Indexed: 12/15/2022]
Abstract
Single nucleotide exact amplicon sequence variants (ASV) of the human gut microbiome were used to evaluate if individuals with a depression phenotype (DEPR) could be identified from healthy reference subjects (NODEP). Microbial DNA in stool samples obtained from 40 subjects were characterized using high throughput microbiome sequence data processed via DADA2 error correction combined with PIME machine-learning de-noising and taxa binning/parsing of prevalent ASVs at the single nucleotide level of resolution. Application of ALDEx2 differential abundance analysis with assessed effect sizes and stringent PICRUSt2 predicted metabolic pathways. This multivariate machine-learning approach significantly differentiated DEPR (n = 20) vs. NODEP (n = 20) (PERMANOVA P < 0.001) based on microbiome taxa clustering and neurocircuit-relevant metabolic pathway network analysis for GABA, butyrate, glutamate, monoamines, monosaturated fatty acids, and inflammasome components. Gut microbiome dysbiosis using ASV prevalence data may offer the diagnostic potential of using human metaorganism biomarkers to identify individuals with a depression phenotype.
Collapse
Affiliation(s)
- Bruce R Stevens
- Department of Physiology and Functional Genomics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Psychiatry, University of Florida College of Medicine, Gainesville, FL, USA.
- Division of Gastroenterology, Department of Medicine, University of Florida College of Medicine, Gainesville, FL, USA.
| | - Luiz Roesch
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- Centro Interdisciplinar de Pesquisas em Biotecnologia-CIP-Biotec, Universidade Federal do Pampa, São Gabriel, Bagé, Brazil
| | - Priscila Thiago
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Jordan T Russell
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Carl J Pepine
- Division of Cardiovascular Medicine, Department of Medicine, University of Florida College of Medicine, Gainesville, FL, USA
| | - Richard C Holbert
- Department of Psychiatry, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mohan K Raizada
- Department of Physiology and Functional Genomics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Eric W Triplett
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| |
Collapse
|
36
|
Nathani NM, Dave KJ, Vatsa PP, Mahajan MS, Sharma P, Mootapally C. 309 metagenome assembled microbial genomes from deep sediment samples in the Gulfs of Kathiawar Peninsula. Sci Data 2021; 8:194. [PMID: 34321485 PMCID: PMC8319310 DOI: 10.1038/s41597-021-00957-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 05/24/2021] [Indexed: 11/23/2022] Open
Abstract
Prokaryoplankton genomes from the deep marine sediments are less explored compared to shallow shore sediments. The Gulfs of Kathiawar peninsula experience varied currents and inputs from different on-shore activities. Any perturbations would directly influence the microbiome and their normal homeostasis. Advancements in reconstructing genomes from metagenomes allows us to understand the role of individual unculturable microbes in ecological niches like the Gulf sediments. Here, we report 309 bacterial and archaeal genomes assembled from metagenomics data of deep sediments from sites in the Gulf of Khambhat and Gulf of Kutch as well as a sample from the Arabian Sea. Phylogenomics classified them into 5 archaeal and 18 bacterial phyla. The genomes will facilitate understanding of the physiology, adaptation and impact of on-shore anthropogenic activities on the deep sediment microbes.
Collapse
Affiliation(s)
- Neelam M Nathani
- Department of Life Sciences, Maharaja Krishnakumarsinhji Bhavnagar University, Bhavnagar, 364001, Gujarat, India
| | - Kaushambee J Dave
- Department of Molecular Cell Biology and Immunology, University of Tübingen, Geschwister-Scholl-Platz, 72074, Tübingen, Germany
| | - Priyanka P Vatsa
- Department of Biotechnology, National Institute of Pharmaceutical Education and Research (NIPER), Gandhinagar, 382355, Gujarat, India
| | - Mayur S Mahajan
- Microbiology Division, Regional Centre, Lokhandwala Road, Four Bungalows, Andheri (West), CSIR - National Institute of Oceanography (CSIR-NIO), Mumbai, 400053, Maharashtra, India
| | | | - Chandrashekar Mootapally
- Department of Marine Science, Maharaja Krishnakumarsinhji Bhavnagar University, Bhavnagar, 364001, Gujarat, India.
| |
Collapse
|
37
|
Mallawaarachchi VG, Wickramarachchi AS, Lin Y. Improving metagenomic binning results with overlapped bins using assembly graphs. Algorithms Mol Biol 2021; 16:3. [PMID: 33947431 PMCID: PMC8097841 DOI: 10.1186/s13015-021-00185-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/20/2021] [Indexed: 11/18/2022] Open
Abstract
Background Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for binning contigs only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). Results In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins. Conclusion GraphBin2 incorporates the coverage information into the assembly graph to refine the binning results obtained from existing binning tools. GraphBin2 also enables the detection of contigs that may belong to multiple species. We show that GraphBin2 outperforms its predecessor GraphBin on both simulated and real datasets. GraphBin2 is freely available at https://github.com/Vini2/GraphBin2. Supplementary Information The online version contains supplementary material available at 10.1186/s13015-021-00185-6.
Collapse
|
38
|
Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat Protoc 2021; 16:2520-2541. [PMID: 33864056 DOI: 10.1038/s41596-021-00508-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 01/12/2021] [Indexed: 02/02/2023]
Abstract
Recovering genomes from shotgun metagenomic sequence data allows detailed taxonomic and functional characterization of individual species or strains in a microbial community. Retrieving these metagenome-assembled genomes (MAGs) involves seven stages. First, low-quality bases, along with adapter and host sequences, are removed. Second, overlapping sequences are assembled to create longer contiguous fragments. Third, these fragments are clustered based on sequence composition and abundance. Fourth, these sequence clusters, or bins, undergo rounds of quality assessment and refinement to yield MAGs. The optional fifth stage is dereplication of MAGs to select representatives. Next, each MAG is taxonomically classified. The optional seventh stage is assessing the fraction of diversity that has been recovered. The output of this protocol is draft genomes, which can provide invaluable clues about uncultured organisms. This protocol takes ~1 week to run, depending on computational resources available, and requires prior experience with high-performance computing, shell script programming and Python.
Collapse
|
39
|
Cahn JKB, Piel J. Opening up the Single-Cell Toolbox for Microbial Natural Products Research. Angew Chem Int Ed Engl 2021; 60:18412-18428. [PMID: 30748086 DOI: 10.1002/anie.201900532] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Indexed: 02/06/2023]
Abstract
The diverse microbes that produce natural products represent an important source of novel therapeutics, drug leads, and scientific tools. However, the vast majority have not been grown in axenic culture and are members of complex communities. While meta-'omic methods such as metagenomics, -transcriptomics, and -proteomics reveal collective molecular features of this "microbial dark matter", the study of individual microbiome members can be challenging. To address these limits, a number of techniques with single-bacterial resolution have been developed in the last decade and a half. While several of these are embraced by microbial ecologists, there has been less use by researchers interested in mining microbes for natural products. In this review, we discuss the available and emerging techniques for targeted single-cell analysis with a particular focus on applications to the discovery and study of natural products.
Collapse
Affiliation(s)
- Jackson K B Cahn
- Instit. of Microbiol., Eidgenössische Technische Hochschule Zürich (ETH), 8093, Zurich, Switzerland
| | - Jörn Piel
- Instit. of Microbiol., Eidgenössische Technische Hochschule Zürich (ETH), 8093, Zurich, Switzerland
| |
Collapse
|
40
|
Borderes M, Gasc C, Prestat E, Galvão Ferrarini M, Vinga S, Boucinha L, Sagot MF. A comprehensive evaluation of binning methods to recover human gut microbial species from a non-redundant reference gene catalog. NAR Genom Bioinform 2021; 3:lqab009. [PMID: 33709074 PMCID: PMC7936653 DOI: 10.1093/nargab/lqab009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 01/18/2021] [Accepted: 01/29/2021] [Indexed: 01/19/2023] Open
Abstract
The human gut microbiota performs functions that are essential for the maintenance of the host physiology. However, characterizing the functioning of microbial communities in relation to the host remains challenging in reference-based metagenomic analyses. Indeed, as taxonomic and functional analyses are performed independently, the link between genes and species remains unclear. Although a first set of species-level bins was built by clustering co-abundant genes, no reference bin set is established on the most used gut microbiota catalog, the Integrated Gene Catalog (IGC). With the aim to identify the best suitable method to group the IGC genes, we benchmarked nine taxonomy-independent binners implementing abundance-based, hybrid and integrative approaches. To this purpose, we designed a simulated non-redundant gene catalog (SGC) and computed adapted assessment metrics. Overall, the best trade-off between the main metrics is reached by an integrative binner. For each approach, we then compared the results of the best-performing binner with our expected community structures and applied the method to the IGC. The three approaches are distinguished by specific advantages, and by inherent or scalability limitations. Hybrid and integrative binners show promising and potentially complementary results but require improvements to be used on the IGC to recover human gut microbial species.
Collapse
Affiliation(s)
- Marianne Borderes
- MaaT Pharma, 317 Avenue Jean Jaurès, 69007 Lyon, France
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
- Erable team, INRIA Grenoble Rhône-Alpes, 655 Avenue de l’Europe 38330 Montbonnot-Saint–Martin, France
| | - Cyrielle Gasc
- MaaT Pharma, 317 Avenue Jean Jaurès, 69007 Lyon, France
| | | | - Mariana Galvão Ferrarini
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
- INSA-Lyon, INRA, BF2i, UMR0203, F-69621 Villeurbanne, France
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, 1000-029 Lisbon, Portugal
| | - Lilia Boucinha
- MaaT Pharma, 317 Avenue Jean Jaurès, 69007 Lyon, France
- EVOTEC ID (Lyon), 40 Avenue Tony Garnier, 69007 Lyon, France
| | - Marie-France Sagot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
- Erable team, INRIA Grenoble Rhône-Alpes, 655 Avenue de l’Europe 38330 Montbonnot-Saint–Martin, France
| |
Collapse
|
41
|
Bharti R, Grimm DG. Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform 2021; 22:178-193. [PMID: 31848574 PMCID: PMC7820839 DOI: 10.1093/bib/bbz155] [Citation(s) in RCA: 227] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 10/23/2019] [Accepted: 11/06/2019] [Indexed: 12/15/2022] Open
Abstract
Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).
Collapse
Affiliation(s)
- Richa Bharti
- Weihenstephan-Triesdorf University of Applied Sciences and Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Straubing, Germany
| | - Dominik G Grimm
- Weihenstephan-Triesdorf University of Applied Sciences and Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Straubing, Germany
| |
Collapse
|
42
|
Zhao F, Zhang D, Ge C, Zhang L, Reinach PS, Tian X, Tao C, Zhao Z, Zhao C, Fu W, Zeng C, Chen W. Metagenomic Profiling of Ocular Surface Microbiome Changes in Meibomian Gland Dysfunction. Invest Ophthalmol Vis Sci 2021; 61:22. [PMID: 32673387 PMCID: PMC7425691 DOI: 10.1167/iovs.61.8.22] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Purpose Ocular surface microbiome changes can affect meibomian gland dysfunction (MGD) development. This study aimed to delineate differences among the microbiome of eyelid skin, conjunctiva, and meibum in healthy controls (HCs) and patients afflicted with MGD. Methods Shotgun metagenomic analysis was used to determine if there are differences between the microbial communities in ocular sites surrounding the meibomian gland in healthy individuals and patients afflicted with MGD. Results The meibum bacterial content of these microbiomes was dissimilar in these two different types of individuals. Almost all of the most significant taxonomic changes in the meibum microbiome of individuals with MGD were also present in their eyelid skin, but not in the conjunctiva. Such site-specific microbe pattern changes accompany increases in the gene expression levels controlling carbohydrate and lipid metabolism. Most of the microbiomes in patients with MGD possess a microbe population capable of metabolizing benzoate. Pathogens known to underlie ocular infection were evident in these individuals. MGD meibum contained an abundance of Campylobacter coli, Campylobacter jejuni, and Enterococcus faecium pathogens, which were almost absent from HCs. Functional annotation indicated that in the microbiomes of MGD meibum their capability to undergo chemotaxis, display immune evasive virulence, and mediate type IV secretion was different than that in the microbiomes of meibum isolated from HCs. Conclusions MGD meibum contains distinct microbiota whose immune evasive virulence is much stronger than that in the HCs. Profiling differences between the meibum microbiome makeup in HCs and patients with MGD characterizes changes of microbial communities associated with the disease status.
Collapse
|
43
|
Laso-Jadart R, Ambroise C, Peterlongo P, Madoui MA. metaVaR: Introducing metavariant species models for reference-free metagenomic-based population genomics. PLoS One 2020; 15:e0244637. [PMID: 33378381 PMCID: PMC7773188 DOI: 10.1371/journal.pone.0244637] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 12/14/2020] [Indexed: 11/18/2022] Open
Abstract
The availability of large metagenomic data offers great opportunities for the population genomic analysis of uncultured organisms, which represent a large part of the unexplored biosphere and play a key ecological role. However, the majority of these organisms lack a reference genome or transcriptome, which constitutes a technical obstacle for classical population genomic analyses. We introduce the metavariant species (MVS) model, in which a species is represented only by intra-species nucleotide polymorphism. We designed a method combining reference-free variant calling, multiple density-based clustering and maximum-weighted independent set algorithms to cluster intra-species variants into MVSs directly from multisample metagenomic raw reads without a reference genome or read assembly. The frequencies of the MVS variants are then used to compute population genomic statistics such as FST, in order to estimate genomic differentiation between populations and to identify loci under natural selection. The MVS construction was tested on simulated and real metagenomic data. MVSs showed the required quality for robust population genomics and allowed an accurate estimation of genomic differentiation (ΔFST < 0.0001 and <0.03 on simulated and real data respectively). Loci predicted under natural selection on real data were all detected by MVSs. MVSs represent a new paradigm that may simplify and enhance holistic approaches for population genomics and the evolution of microorganisms.
Collapse
Affiliation(s)
- Romuald Laso-Jadart
- Institut François Jacob, CEA, CNRS, Génomique Métabolique - UMR 8030, Univ Evry, Université Paris-Saclay, Evry, France
| | | | | | - Mohammed-Amin Madoui
- Institut François Jacob, CEA, CNRS, Génomique Métabolique - UMR 8030, Univ Evry, Université Paris-Saclay, Evry, France
- * E-mail:
| |
Collapse
|
44
|
Mallawaarachchi V, Wickramarachchi A, Lin Y. GraphBin: refined binning of metagenomic contigs using assembly graphs. Bioinformatics 2020; 36:3307-3313. [PMID: 32167528 DOI: 10.1093/bioinformatics/btaa180] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 02/18/2020] [Accepted: 03/10/2020] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning. RESULTS We propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools. We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs. AVAILABILITY AND IMPLEMENTATION The source code of GraphBin is available at https://github.com/Vini2/GraphBin. CONTACT vijini.mallawaarachchi@anu.edu.au or yu.lin@anu.edu.au. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra ACT 0200, Australia
| | - Anuradha Wickramarachchi
- Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra ACT 0200, Australia
| | - Yu Lin
- Research School of Computer Science, College of Engineering and Computer Science, Australian National University, Canberra ACT 0200, Australia
| |
Collapse
|
45
|
Pérez-Cobas AE, Gomez-Valero L, Buchrieser C. Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microb Genom 2020; 6:mgen000409. [PMID: 32706331 PMCID: PMC7641418 DOI: 10.1099/mgen.0.000409] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/30/2020] [Indexed: 12/23/2022] Open
Abstract
Metagenomics and marker gene approaches, coupled with high-throughput sequencing technologies, have revolutionized the field of microbial ecology. Metagenomics is a culture-independent method that allows the identification and characterization of organisms from all kinds of samples. Whole-genome shotgun sequencing analyses the total DNA of a chosen sample to determine the presence of micro-organisms from all domains of life and their genomic content. Importantly, the whole-genome shotgun sequencing approach reveals the genomic diversity present, but can also give insights into the functional potential of the micro-organisms identified. The marker gene approach is based on the sequencing of a specific gene region. It allows one to describe the microbial composition based on the taxonomic groups present in the sample. It is frequently used to analyse the biodiversity of microbial ecosystems. Despite its importance, the analysis of metagenomic sequencing and marker gene data is quite a challenge. Here we review the primary workflows and software used for both approaches and discuss the current challenges in the field.
Collapse
Affiliation(s)
- Ana Elena Pérez-Cobas
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France and CNRS UMR 3525, 675724, Paris, France
| | - Laura Gomez-Valero
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France and CNRS UMR 3525, 675724, Paris, France
| | - Carmen Buchrieser
- Institut Pasteur, Biologie des Bactéries Intracellulaires, Paris, France and CNRS UMR 3525, 675724, Paris, France
| |
Collapse
|
46
|
Yue Y, Huang H, Qi Z, Dou HM, Liu XY, Han TF, Chen Y, Song XJ, Zhang YH, Tu J. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics 2020; 21:334. [PMID: 32723290 PMCID: PMC7469296 DOI: 10.1186/s12859-020-03667-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 07/16/2020] [Indexed: 12/13/2022] Open
Abstract
Background Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major challenge in metagenomic research. Both supervised and unsupervised machine learning methods have been employed in binning. Genome binning belonging to unsupervised method clusters contigs into individual genome bins by machine learning methods without the assistance of any reference databases. So far a lot of genome binning tools have emerged. Evaluating these genome tools is of great significance to microbiological research. In this study, we evaluate 15 genome binning tools containing 12 original binning tools and 3 refining binning tools by comparing the performance of these tools on chicken gut metagenomic datasets and the first CAMI challenge datasets. Results For chicken gut metagenomic datasets, original genome binner MetaBat, Groopm2 and Autometa performed better than other original binner, and MetaWrap combined the binning results of them generated the most high-quality genome bins. For CAMI datasets, Groopm2 achieved the highest purity (> 0.9) with good completeness (> 0.8), and reconstructed the most high-quality genome bins among original genome binners. Compared with Groopm2, MetaBat2 had similar performance with higher completeness and lower purity. Genome refining binners DASTool predicated the most high-quality genome bins among all genomes binners. Most genome binner performed well for unique strains. Nonetheless, reconstructing common strains still is a substantial challenge for all genome binner. Conclusions In conclusion, we tested a set of currently available, state-of-the-art metagenomics hybrid binning tools and provided a guide for selecting tools for metagenomic binning by comparing range of purity, completeness, adjusted rand index, and the number of high-quality reconstructed bins. Furthermore, available information for future binning strategy were concluded.
Collapse
Affiliation(s)
- Yi Yue
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China. .,School of Information & Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China.
| | - Hao Huang
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China.,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China.,School of Animal Science and Technology, Anhui Agricultural University, Hefei, 230036, China
| | - Zhao Qi
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China.,School of Information & Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Hui-Min Dou
- School of Information & Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Xin-Yi Liu
- School of Information & Computer, Anhui Agricultural University, Hefei, 230036, China
| | - Tian-Fei Han
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China.,School of Animal Science and Technology, Anhui Agricultural University, Hefei, 230036, China
| | - Yue Chen
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China.,School of Animal Science and Technology, Anhui Agricultural University, Hefei, 230036, China
| | - Xiang-Jun Song
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China.,School of Animal Science and Technology, Anhui Agricultural University, Hefei, 230036, China
| | - You-Hua Zhang
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China. .,School of Information & Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Life Sciences, Anhui Agricultural University, Hefei, 230036, China.
| | - Jian Tu
- Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, Anhui Agricultural University, Hefei, 230036, China. .,School of Information & Computer, Anhui Agricultural University, Hefei, 230036, China. .,School of Animal Science and Technology, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
47
|
Wang Z, Wang Z, Lu YY, Sun F, Zhu S. SolidBin: improving metagenome binning with semi-supervised normalized cut. Bioinformatics 2020; 35:4229-4238. [PMID: 30977806 DOI: 10.1093/bioinformatics/btz253] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 03/14/2019] [Accepted: 04/05/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Metagenomic contig binning is an important computational problem in metagenomic research, which aims to cluster contigs from the same genome into the same group. Unlike classical clustering problem, contig binning can utilize known relationships among some of the contigs or the taxonomic identity of some contigs. However, the current state-of-the-art contig binning methods do not make full use of the additional biological information except the coverage and sequence composition of the contigs. RESULTS We developed a novel contig binning method, Semi-supervised Spectral Normalized Cut for Binning (SolidBin), based on semi-supervised spectral clustering. Using sequence feature similarity and/or additional biological information, such as the reliable taxonomy assignments of some contigs, SolidBin constructs two types of prior information: must-link and cannot-link constraints. Must-link constraints mean that the pair of contigs should be clustered into the same group, while cannot-link constraints mean that the pair of contigs should be clustered in different groups. These constraints are then integrated into a classical spectral clustering approach, normalized cut, for improved contig binning. The performance of SolidBin is compared with five state-of-the-art genome binners, CONCOCT, COCACOLA, MaxBin, MetaBAT and BMC3C on five next-generation sequencing benchmark datasets including simulated multi- and single-sample datasets and real multi-sample datasets. The experimental results show that, SolidBin has achieved the best performance in terms of F-score, Adjusted Rand Index and Normalized Mutual Information, especially while using the real datasets and the single-sample dataset. AVAILABILITY AND IMPLEMENTATION https://github.com/sufforest/SolidBin. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziye Wang
- Centre for Computational Systems Biology, School of Mathematical Sciences, Shanghai, China.,School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Zhengyang Wang
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
| | - Yang Young Lu
- Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Centre for Computational Systems Biology, School of Mathematical Sciences, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, China
| |
Collapse
|
48
|
Linard B, Swenson K, Pardi F. Rapid alignment-free phylogenetic identification of metagenomic sequences. Bioinformatics 2020; 35:3303-3312. [PMID: 30698645 DOI: 10.1093/bioinformatics/btz068] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 01/18/2019] [Accepted: 01/29/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Taxonomic classification is at the core of environmental DNA analysis. When a phylogenetic tree can be built as a prior hypothesis to such classification, phylogenetic placement (PP) provides the most informative type of classification because each query sequence is assigned to its putative origin in the tree. This is useful whenever precision is sought (e.g. in diagnostics). However, likelihood-based PP algorithms struggle to scale with the ever-increasing throughput of DNA sequencing. RESULTS We have developed RAPPAS (Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences) which uses an alignment-free approach, removing the hurdle of query sequence alignment as a preliminary step to PP. Our approach relies on the precomputation of a database of k-mers that may be present with non-negligible probability in relatives of the reference sequences. The placement is performed by inspecting the stored phylogenetic origins of the k-mers in the query, and their probabilities. The database can be reused for the analysis of several different metagenomes. Experiments show that the first implementation of RAPPAS is already faster than competing likelihood-based PP algorithms, while keeping similar accuracy for short reads. RAPPAS scales PP for the era of routine metagenomic diagnostics. AVAILABILITY AND IMPLEMENTATION Program and sources freely available for download at https://github.com/blinard-BIOINFO/RAPPAS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,ISEM, University of Montpellier, CNRS, IRD, EPHE, CIRAD, INRAP, Montpellier, France.,AGAP, University of Montpellier, CIRAD, INRA, Montpellier Supagro, Montpellier, France
| | - Krister Swenson
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle, Montpellier, France
| | - Fabio Pardi
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle, Montpellier, France
| |
Collapse
|
49
|
Shang J, Sun Y. CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 2020; 189:95-103. [PMID: 32454212 PMCID: PMC7255349 DOI: 10.1016/j.ymeth.2020.05.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 05/05/2020] [Accepted: 05/17/2020] [Indexed: 02/07/2023] Open
Abstract
The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER.
Collapse
Affiliation(s)
- Jiayu Shang
- Electrical Engineering Dept., City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region
| | - Yanni Sun
- Electrical Engineering Dept., City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region.
| |
Collapse
|
50
|
Levy Karin E, Mirdita M, Söding J. MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. MICROBIOME 2020; 8:48. [PMID: 32245390 PMCID: PMC7126354 DOI: 10.1186/s40168-020-00808-x] [Citation(s) in RCA: 103] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 02/14/2020] [Indexed: 05/10/2023]
Abstract
BACKGROUND Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play essential roles in most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, and parasites to plants and animals. Investigating their roles is therefore of great interest to ecology, biotechnology, human health, and evolution. However, the generally lower sequencing coverage, their more complex gene and genome architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics. RESULTS MetaEuk is a toolkit for high-throughput, reference-based discovery, and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk's power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted >12,000,000 protein-coding genes in 8 days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups. CONCLUSION The open-source (GPLv3) MetaEuk software (https://github.com/soedinglab/metaeuk) enables large-scale eukaryotic metagenomics through reference-based, sensitive taxonomic and functional annotation. Video abstract.
Collapse
Affiliation(s)
- Eli Levy Karin
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, 37077, Göttingen, Germany.
| | - Milot Mirdita
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, 37077, Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology, Max-Planck Institute for Biophysical Chemistry, 37077, Göttingen, Germany.
| |
Collapse
|