1
|
Gtari M, Maaoui R, Ghodhbane-Gtari F, Ben Slama K, Sbissi I. MAGs-centric crack: how long will, spore-positive Frankia and most Protofrankia, microsymbionts remain recalcitrant to axenic growth? Front Microbiol 2024; 15:1367490. [PMID: 39144212 PMCID: PMC11323853 DOI: 10.3389/fmicb.2024.1367490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 07/04/2024] [Indexed: 08/16/2024] Open
Abstract
Nearly 50 years after the ground-breaking isolation of the primary Comptonia peregrina microsymbiont under axenic conditions, efforts to isolate a substantial number of Protofrankia and Frankia strains continue with enduring challenges and complexities. This study aimed to streamline genomic insights through comparative and predictive tools to extract traits crucial for isolating specific Frankia in axenic conditions. Pangenome analysis unveiled significant genetic diversity, suggesting untapped potential for cultivation strategies. Shared metabolic strategies in cellular components, central metabolic pathways, and resource acquisition traits offered promising avenues for cultivation. Ecological trait extraction indicated that most uncultured strains exhibit no apparent barriers to axenic growth. Despite ongoing challenges, potential caveats, and errors that could bias predictive analyses, this study provides a nuanced perspective. It highlights potential breakthroughs and guides refined cultivation strategies for these yet-uncultured strains. We advocate for tailored media formulations enriched with simple carbon sources in aerobic environments, with atmospheric nitrogen optionally sufficient to minimize contamination risks. Temperature adjustments should align with strain preferences-28-29°C for Frankia and 32-35°C for Protofrankia-while maintaining an alkaline pH. Given potential extended incubation periods (predicted doubling times ranging from 3.26 to 9.60 days, possibly up to 21.98 days), patience and rigorous contamination monitoring are crucial for optimizing cultivation conditions.
Collapse
Affiliation(s)
- Maher Gtari
- Department of Biological and Chemical Engineering, USCR Molecular Bacteriology and Genomics, National Institute of Applied Sciences and Technology, University of Carthage, Tunis, Tunisia
| | - Radhi Maaoui
- Department of Biological and Chemical Engineering, USCR Molecular Bacteriology and Genomics, National Institute of Applied Sciences and Technology, University of Carthage, Tunis, Tunisia
| | - Faten Ghodhbane-Gtari
- Department of Biological and Chemical Engineering, USCR Molecular Bacteriology and Genomics, National Institute of Applied Sciences and Technology, University of Carthage, Tunis, Tunisia
- Higher Institute of Biotechnology Sidi Thabet, University of La Manouba, Tunisia
| | - Karim Ben Slama
- LR Bioresources, Environment, and Biotechnology (LR22ES04), Higher Institute of Applied Biological Sciences of Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Imed Sbissi
- LR Pastoral Ecology, Arid Regions Institute, University of Gabes, Medenine, Tunisia
| |
Collapse
|
2
|
Ghielmetti G, Kerr TJ, Bernitz N, Mhlophe SK, Streicher E, Loxton AG, Warren RM, Miller MA, Goosen WJ. Insights into mycobacteriome composition in Mycobacterium bovis-infected African buffalo (Syncerus caffer) tissue samples. Sci Rep 2024; 14:17537. [PMID: 39080347 PMCID: PMC11289279 DOI: 10.1038/s41598-024-68189-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
Animal tuberculosis significantly challenges global health, agriculture, and wildlife conservation efforts. Mycobacterial cultures are resource-intensive, time-consuming, and challenged by heterogeneous populations. In this study, we employed a culture-independent approach, using targeted long-read-based next-generation sequencing (tNGS), to investigate the mycobacterial composition in 60 DNA samples extracted from Mycobacterium bovis infected culture-confirmed African buffalo tissue. We detected mycobacterial DNA in 93.3% of the samples and the sensitivity for detecting Mycobacterium tuberculosis complex (MTBC) was 91.7%, demonstrating a high concordance of our culture-independent tNGS approach with mycobacterial culture results. In five samples, we identified heterogenous mycobacterial populations with various non-tuberculous mycobacteria, including members of the Mycobacterium avium complex (MAC), M. smegmatis, and M. komaniense. The latter Mycobacterium species was described in South Africa from bovine nasal swabs and environmental samples from the Hluhluwe-iMfolozi Park, which was the origin of the buffalo samples in the present study. This finding suggests that exposure to environmental mycobacteria may confound detection of MTBC in wildlife. In conclusion, our approach represents a promising alternative to conventional methods for detecting mycobacterial DNA. This high-throughput technique enables rapid differentiation of heterogeneous mycobacterial populations, which will contribute valuable insights into the epidemiology, pathogenesis, and microbial synergy during mycobacterial infections.
Collapse
Affiliation(s)
- Giovanni Ghielmetti
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
- Section of Veterinary Bacteriology, Institute for Food Safety and Hygiene, Vetsuisse Faculty, University of Zurich, Winterthurerstrasse 270, 8057, Zurich, Switzerland
| | - Tanya J Kerr
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
| | - Netanya Bernitz
- Cryptosporidiosis Lab, The Francis Crick Institute, London, UK
| | - Sinegugu K Mhlophe
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
| | - Elizma Streicher
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
| | - Andre G Loxton
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
| | - Robin M Warren
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
| | - Michele A Miller
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa
| | - Wynand J Goosen
- South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, PO Box 241, Cape Town, 8000, South Africa.
| |
Collapse
|
3
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
4
|
Cook R, Telatin A, Hsieh SY, Newberry F, Tariq MA, Baker DJ, Carding SR, Adriaenssens EM. Nanopore and Illumina sequencing reveal different viral populations from human gut samples. Microb Genom 2024; 10. [PMID: 38683195 DOI: 10.1099/mgen.0.001236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024] Open
Abstract
The advent of viral metagenomics, or viromics, has improved our knowledge and understanding of global viral diversity. High-throughput sequencing technologies enable explorations of the ecological roles, contributions to host metabolism, and the influence of viruses in various environments, including the human intestinal microbiome. However, bacterial metagenomic studies frequently have the advantage. The adoption of advanced technologies like long-read sequencing has the potential to be transformative in refining viromics and metagenomics. Here, we examined the effectiveness of long-read and hybrid sequencing by comparing Illumina short-read and Oxford Nanopore Technology (ONT) long-read sequencing technologies and different assembly strategies on recovering viral genomes from human faecal samples. Our findings showed that if a single sequencing technology is to be chosen for virome analysis, Illumina is preferable due to its superior ability to recover fully resolved viral genomes and minimise erroneous genomes. While ONT assemblies were effective in recovering viral diversity, the challenges related to input requirements and the necessity for amplification made it less ideal as a standalone solution. However, using a combined, hybrid approach enabled a more authentic representation of viral diversity to be obtained within samples.
Collapse
Affiliation(s)
- Ryan Cook
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | | | | | - Fiona Newberry
- Department of Biosciences, Nottingham Trent University, Nottingham, NG11 8NS, UK
| | - Mohammad A Tariq
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | - Dave J Baker
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | - Simon R Carding
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK
| | | |
Collapse
|
5
|
Hernandez JM, Almeida GBS, Portela ACR, Cardoso JF, Junior ECS, Lucena MSS, Nunes MRT, Gabbay YB, Silva LD. Microbial Diversity in Children with Gastroenteritis in the Amazon Region of Brazil: Development and Validation of a Molecular Method for Complete Sequencing of Viral Genomes. J Genomics 2024; 12:47-54. [PMID: 38638167 PMCID: PMC11024607 DOI: 10.7150/jgen.94116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/08/2024] [Indexed: 04/20/2024] Open
Abstract
INTRODUCTION Metagenomic sequencing is a powerful tool that is widely used in laboratories worldwide for taxonomic characterization of microorganisms in clinical and environmental samples. In this study, we utilized metagenomics to investigate comprehensively the microbial diversity in fecal samples of children over a four-year period. Our methods were carefully designed to ensure accurate and reliable results. MATERIAL AND METHODS Validated and analyzed were metagenomic data obtained from sequencing 27 fecal samples from children under 10 years old with gastroenteritis over a four-year period (2012-2016). The fecal specimens were collected from patients who received care at public health facilities in the northern region of Brazil. Sequencing libraries were prepared from cDNA and sequenced on the Illumina HiSeq. Kraken-2 was utilized to classify bacterial taxonomy based on the 16S rRNA gene, using the Silva rRNA database. Additionally, the Diamond program was used for mapping to the non-redundant protein database (NR database). Phylogenomic analyses were conducted using Geneious R10 and MEGA X software, and Bayesian estimation of phylogeny was performed using the MrBayes program. The results indicate significant heterogeneity among norovirus strains, with evidence of recombination and point mutations. This study presents the first complete genome of parechovirus 8 in the region. Additionally, it describes the bacterial populations and bacteriophages present in feces, with a high abundance of Firmicutes and Proteobacteria, including an increased proportion of the Enterobacteriaceae family. The presented data demonstrate the genetic diversity of microbial populations and provide a comprehensive report on viral molecular characterization. These findings are relevant for genomic studies in gastrointestinal infections. The metagenomic approach is a powerful tool for investigating microbial diversity in children with gastroenteritis. However, further studies are imperative to conduct genomic analysis of identified bacterial strains and thoroughly analyze antimicrobial resistance genes.
Collapse
Affiliation(s)
- Juliana Merces Hernandez
- Postgraduate Program in Biology of Infectious and Parasitic Agents, Federal University of Pará, Belém, Pará, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Hiralal A, Geelhoed JS, Hidalgo-Martinez S, Smets B, van Dijk JR, Meysman FJR. Closing the genome of unculturable cable bacteria using a combined metagenomic assembly of long and short sequencing reads. Microb Genom 2024; 10:001197. [PMID: 38376381 PMCID: PMC10926707 DOI: 10.1099/mgen.0.001197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/23/2024] [Indexed: 02/21/2024] Open
Abstract
Many environmentally relevant micro-organisms cannot be cultured, and even with the latest metagenomic approaches, achieving complete genomes for specific target organisms of interest remains a challenge. Cable bacteria provide a prominent example of a microbial ecosystem engineer that is currently unculturable. They occur in low abundance in natural sediments, but due to their capability for long-distance electron transport, they exert a disproportionately large impact on the biogeochemistry of their environment. Current available genomes of marine cable bacteria are highly fragmented and incomplete, hampering the elucidation of their unique electrogenic physiology. Here, we present a metagenomic pipeline that combines Nanopore long-read and Illumina short-read shotgun sequencing. Starting from a clonal enrichment of a cable bacterium, we recovered a circular metagenome-assembled genome (5.09 Mbp in size), which represents a novel cable bacterium species with the proposed name Candidatus Electrothrix scaldis. The closed genome contains 1109 novel identified genes, including key metabolic enzymes not previously described in incomplete genomes of cable bacteria. We examined in detail the factors leading to genome closure. Foremost, native, non-amplified long reads are crucial to resolve the many repetitive regions within the genome of cable bacteria, and by analysing the whole metagenomic assembly, we found that low strain diversity is key for achieving genome closure. The insights and approaches presented here could help achieve genome closure for other keystone micro-organisms present in complex environmental samples at low abundance.
Collapse
Affiliation(s)
- Anwar Hiralal
- Geobiology Research Group, University of Antwerp, Antwerp, Belgium
| | | | | | - Bent Smets
- Geobiology Research Group, University of Antwerp, Antwerp, Belgium
| | | | - Filip J. R. Meysman
- Geobiology Research Group, University of Antwerp, Antwerp, Belgium
- Department of Biotechnology, Delft University of Technology, Delft, Netherlands
| |
Collapse
|
7
|
Qi W, Xue MY, Jia MH, Zhang S, Yan Q, Sun HZ. - Invited Review - Understanding the functionality of the rumen microbiota: searching for better opportunities for rumen microbial manipulation. Anim Biosci 2024; 37:370-384. [PMID: 38186256 PMCID: PMC10838668 DOI: 10.5713/ab.23.0308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 11/03/2023] [Indexed: 01/09/2024] Open
Abstract
Rumen microbiota play a central role in the digestive process of ruminants. Their remarkable ability to break down complex plant fibers and proteins, converting them into essential organic compounds that provide animals with energy and nutrition. Research on rumen microbiota not only contributes to improving animal production performance and enhancing feed utilization efficiency but also holds the potential to reduce methane emissions and environmental impact. Nevertheless, studies on rumen microbiota face numerous challenges, including complexity, difficulties in cultivation, and obstacles in functional analysis. This review provides an overview of microbial species involved in the degradation of macromolecules, the fermentation processes, and methane production in the rumen, all based on cultivation methods. Additionally, the review introduces the applications, advantages, and limitations of emerging omics technologies such as metagenomics, metatranscriptomics, metaproteomics, and metabolomics, in investigating the functionality of rumen microbiota. Finally, the article offers a forward-looking perspective on the new horizons and technologies in the field of rumen microbiota functional research. These emerging technologies, with continuous refinement and mutual complementation, have deepened our understanding of rumen microbiota functionality, thereby enabling effective manipulation of the rumen microbial community.
Collapse
Affiliation(s)
- Wenlingli Qi
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming-Yuan Xue
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming-Hui Jia
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China
| | - Shuxian Zhang
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China
| | - Qiongxian Yan
- CAS Key Laboratory of Agro-Ecological Processes in Subtropical Region, Hunan Provincial Key Laboratory of Animal Nutritional Physiology and Metabolic Process, Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, China
| | - Hui-Zeng Sun
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
8
|
Kerkvliet JJ, Bossers A, Kers JG, Meneses R, Willems R, Schürch AC. Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements. PeerJ 2024; 12:e16695. [PMID: 38188174 PMCID: PMC10771768 DOI: 10.7717/peerj.16695] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 11/28/2023] [Indexed: 01/09/2024] Open
Abstract
Antimicrobial resistance genes (ARG) are commonly found on acquired mobile genetic elements (MGEs) such as plasmids or transposons. Understanding the spread of resistance genes associated with mobile elements (mARGs) across different hosts and environments requires linking ARGs to the existing mobile reservoir within bacterial communities. However, reconstructing mARGs in metagenomic data from diverse ecosystems poses computational challenges, including genome fragment reconstruction (assembly), high-throughput annotation of MGEs, and identification of their association with ARGs. Recently, several bioinformatics tools have been developed to identify assembled fragments of plasmids, phages, and insertion sequence (IS) elements in metagenomic data. These methods can help in understanding the dissemination of mARGs. To streamline the process of identifying mARGs in multiple samples, we combined these tools in an automated high-throughput open-source pipeline, MetaMobilePicker, that identifies ARGs associated with plasmids, IS elements and phages, starting from short metagenomic sequencing reads. This pipeline was used to identify these three elements on a simplified simulated metagenome dataset, comprising whole genome sequences from seven clinically relevant bacterial species containing 55 ARGs, nine plasmids and five phages. The results demonstrated moderate precision for the identification of plasmids (0.57) and phages (0.71), and moderate sensitivity of identification of IS elements (0.58) and ARGs (0.70). In this study, we aim to assess the main causes of this moderate performance of the MGE prediction tools in a comprehensive manner. We conducted a systematic benchmark, considering metagenomic read coverage, contig length cutoffs and investigating the performance of the classification algorithms. Our analysis revealed that the metagenomic assembly process is the primary bottleneck when linking ARGs to identified MGEs in short-read metagenomics sequencing experiments rather than ARGs and MGEs identification by the different tools.
Collapse
Affiliation(s)
- Jesse J. Kerkvliet
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Alex Bossers
- Utrecht University, Institute for Risk Assessment Sciences, Utrecht, The Netherlands
- Wageningen University, Wageningen Bioveterinary Research, Lelystad, The Netherlands
| | - Jannigje G. Kers
- Utrecht University, Institute for Risk Assessment Sciences, Utrecht, The Netherlands
| | - Rodrigo Meneses
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Rob Willems
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Anita C. Schürch
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| |
Collapse
|
9
|
Qi Q, Ghaly TM, Rajabal V, Gillings MR, Tetu SG. Dissecting molecular evolution of class 1 integron gene cassettes and identifying their bacterial hosts in suburban creeks via epicPCR. J Antimicrob Chemother 2024; 79:100-111. [PMID: 37962091 DOI: 10.1093/jac/dkad353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
OBJECTIVES Our study aimed to sequence class 1 integrons in uncultured environmental bacterial cells in freshwater from suburban creeks and uncover the taxonomy of their bacterial hosts. We also aimed to characterize integron gene cassettes with altered DNA sequences relative to those from databases or literature and identify key signatures of their molecular evolution. METHODS We applied a single-cell fusion PCR-based technique-emulsion, paired isolation and concatenation PCR (epicPCR)-to link class 1 integron gene cassette arrays to the phylogenetic markers of their bacterial hosts. The levels of streptomycin resistance conferred by the WT and altered aadA5 and aadA11 gene cassettes that encode aminoglycoside (3″) adenylyltransferases were experimentally quantified in an Escherichia coli host. RESULTS Class 1 integron gene cassette arrays were detected in Alphaproteobacteria and Gammaproteobacteria hosts. A subset of three gene cassettes displayed signatures of molecular evolution, namely the gain of a regulatory 5'-untranslated region (5'-UTR), the loss of attC recombination sites between adjacent gene cassettes, and the invasion of a 5'-UTR by an IS element. Notably, our experimental testing of a novel variant of the aadA11 gene cassette demonstrated that gaining the observed 5'-UTR contributed to a 3-fold increase in the MIC of streptomycin relative to the ancestral reference gene cassette in E. coli. CONCLUSIONS Dissecting the observed signatures of molecular evolution of class 1 integrons allowed us to explain their effects on antibiotic resistance phenotypes, while identifying their bacterial hosts enabled us to make better inferences on the likely origins of novel gene cassettes and IS that invade known gene cassettes.
Collapse
Affiliation(s)
- Qin Qi
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Timothy M Ghaly
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Vaheesan Rajabal
- ARC Centre of Excellence for Synthetic Biology, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Michael R Gillings
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
- ARC Centre of Excellence for Synthetic Biology, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| | - Sasha G Tetu
- School of Natural Sciences, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
- ARC Centre of Excellence for Synthetic Biology, 14 Eastern Road, Macquarie University, Sydney, NSW, Australia
| |
Collapse
|
10
|
Sapoval N, Tanevski M, Treangen TJ. KombOver: Efficient k-core and K-truss based characterization of perturbations within the human gut microbiome. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024; 29:506-520. [PMID: 38160303 PMCID: PMC10764071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
The microbes present in the human gastrointestinal tract are regularly linked to human health and disease outcomes. Thanks to technological and methodological advances in recent years, metagenomic sequencing data, and computational methods designed to analyze metagenomic data, have contributed to improved understanding of the link between the human gut microbiome and disease. However, while numerous methods have been recently developed to extract quantitative and qualitative results from host-associated microbiome data, improved computational tools are still needed to track microbiome dynamics with short-read sequencing data. Previously we have proposed KOMB as a de novo tool for identifying copy number variations in metagenomes for characterizing microbial genome dynamics in response to perturbations. In this work, we present KombOver (KO), which includes four key contributions with respect to our previous work: (i) it scales to large microbiome study cohorts, (ii) it includes both k-core and K-truss based analysis, (iii) we provide the foundation of a theoretical understanding of the relation between various graph-based metagenome representations, and (iv) we provide an improved user experience with easier-to-run code and more descriptive outputs/results. To highlight the aforementioned benefits, we applied KO to nearly 1000 human microbiome samples, requiring less than 10 minutes and 10 GB RAM per sample to process these data. Furthermore, we highlight how graph-based approaches such as k-core and K-truss can be informative for pinpointing microbial community dynamics within a myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) cohort. KO is open source and available for download/use at: https://github.com/treangenlab/komb.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX 77005, USA,
| | | | | |
Collapse
|
11
|
Afonso CL, Afonso AM. Next-Generation Sequencing for the Detection of Microbial Agents in Avian Clinical Samples. Vet Sci 2023; 10:690. [PMID: 38133241 PMCID: PMC10747646 DOI: 10.3390/vetsci10120690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/24/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023] Open
Abstract
Direct-targeted next-generation sequencing (tNGS), with its undoubtedly superior diagnostic capacity over real-time PCR (RT-PCR), and direct-non-targeted NGS (ntNGS), with its higher capacity to identify and characterize multiple agents, are both likely to become diagnostic methods of choice in the future. tNGS is a rapid and sensitive method for precise characterization of suspected agents. ntNGS, also known as agnostic diagnosis, does not require a hypothesis and has been used to identify unsuspected infections in clinical samples. Implemented in the form of multiplexed total DNA metagenomics or as total RNA sequencing, the approach produces comprehensive and actionable reports that allow semi-quantitative identification of most of the agents present in respiratory, cloacal, and tissue samples. The diagnostic benefits of the use of direct tNGS and ntNGS are high specificity, compatibility with different types of clinical samples (fresh, frozen, FTA cards, and paraffin-embedded), production of nearly complete infection profiles (viruses, bacteria, fungus, and parasites), production of "semi-quantitative" information, direct agent genotyping, and infectious agent mutational information. The achievements of NGS in terms of diagnosing poultry problems are described here, along with future applications. Multiplexing, development of standard operating procedures, robotics, sequencing kits, automated bioinformatics, cloud computing, and artificial intelligence (AI) are disciplines converging toward the use of this technology for active surveillance in poultry farms. Other advances in human and veterinary NGS sequencing are likely to be adaptable to avian species in the future.
Collapse
|
12
|
Kling JD, Lee MD, Walworth NG, Webb EA, Coelho JT, Wilburn P, Anderson SI, Zhou Q, Wang C, Phan MD, Fu F, Kremer CT, Litchman E, Rynearson TA, Hutchins DA. Dual thermal ecotypes coexist within a nearly genetically identical population of the unicellular marine cyanobacterium Synechococcus. Proc Natl Acad Sci U S A 2023; 120:e2315701120. [PMID: 37972069 PMCID: PMC10665897 DOI: 10.1073/pnas.2315701120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 10/11/2023] [Indexed: 11/19/2023] Open
Abstract
The extent and ecological significance of intraspecific functional diversity within marine microbial populations is still poorly understood, and it remains unclear if such strain-level microdiversity will affect fitness and persistence in a rapidly changing ocean environment. In this study, we cultured 11 sympatric strains of the ubiquitous marine picocyanobacterium Synechococcus isolated from a Narragansett Bay (RI) phytoplankton community thermal selection experiment. Thermal performance curves revealed selection at cool and warm temperatures had subdivided the initial population into thermotypes with pronounced differences in maximum growth temperatures. Curiously, the genomes of all 11 isolates were almost identical (average nucleotide identities of >99.99%, with >99% of the genome aligning) and no differences in gene content or single nucleotide variants were associated with either cool or warm temperature phenotypes. Despite a very high level of genomic similarity, sequenced epigenomes for two strains showed differences in methylation on genes associated with photosynthesis. These corresponded to measured differences in photophysiology, suggesting a potential pathway for future mechanistic research into thermal microdiversity. Our study demonstrates that present-day marine microbial populations can harbor cryptic but environmentally relevant thermotypes which may increase their resilience to future rising temperatures.
Collapse
Affiliation(s)
- Joshua D. Kling
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| | - Michael D. Lee
- ZOLL Medical Corporation, Chelmsford, MA01824
- Blue Marble Space Institute of Science, Seattle, WA98154
| | - Nathan G. Walworth
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| | - Eric A. Webb
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| | - Jordan T. Coelho
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| | - Paul Wilburn
- ZOLL Medical Corporation, Chelmsford, MA01824
- Kellogg Biological Station, College of Natural Science, Michigan State University, Hickory Corners, MI49060
| | - Stephanie I. Anderson
- Graduate School of Oceanography, The University of Rhode Island, Narragansett, RI02882
- Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Qianqian Zhou
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian361005, China
| | - Chunguang Wang
- Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, Fujian361005, China
| | - Megan D. Phan
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| | - Feixue Fu
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| | - Colin T. Kremer
- Kellogg Biological Station, College of Natural Science, Michigan State University, Hickory Corners, MI49060
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT06269
| | - Elena Litchman
- Kellogg Biological Station, College of Natural Science, Michigan State University, Hickory Corners, MI49060
- Department of Global Ecology, Carnegie Institution, Stanford University, Palo Alto, CA94305
| | - Tatiana A. Rynearson
- Graduate School of Oceanography, The University of Rhode Island, Narragansett, RI02882
| | - David A. Hutchins
- Department of Biological Sciences, University of Southern California, Los Angeles, CA90007
| |
Collapse
|
13
|
Kleikamp HBC, Grouzdev D, Schaasberg P, van Valderen R, van der Zwaan R, Wijgaart RVD, Lin Y, Abbas B, Pronk M, van Loosdrecht MCM, Pabst M. Metaproteomics, metagenomics and 16S rRNA sequencing provide different perspectives on the aerobic granular sludge microbiome. WATER RESEARCH 2023; 246:120700. [PMID: 37866247 DOI: 10.1016/j.watres.2023.120700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 09/29/2023] [Accepted: 10/04/2023] [Indexed: 10/24/2023]
Abstract
The tremendous progress in sequencing technologies has made DNA sequencing routine for microbiome studies. Additionally, advances in mass spectrometric techniques have extended conventional proteomics into the field of microbial ecology. However, systematic studies that provide a better understanding of the complementary nature of these 'omics' approaches, particularly for complex environments such as wastewater treatment sludge, are urgently needed. Here, we describe a comparative metaomics study on aerobic granular sludge from three different wastewater treatment plants. For this, we employed metaproteomics, whole metagenome, and 16S rRNA amplicon sequencing to study the same granule material with uniform size. We furthermore compare the taxonomic profiles using the Genome Taxonomy Database (GTDB) to enhance the comparability between the different approaches. Though the major taxonomies were consistently identified in the different aerobic granular sludge samples, the taxonomic composition obtained by the different omics techniques varied significantly at the lower taxonomic levels, which impacts the interpretation of the nutrient removal processes. Nevertheless, as demonstrated by metaproteomics, the genera that were consistently identified in all techniques cover the majority of the protein biomass. The established metaomics data and the contig classification pipeline are publicly available, which provides a valuable resource for further studies on metabolic processes in aerobic granular sludge.
Collapse
Affiliation(s)
- Hugo B C Kleikamp
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands.
| | | | - Pim Schaasberg
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ramon van Valderen
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ramon van der Zwaan
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Roel van de Wijgaart
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Yuemei Lin
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ben Abbas
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Mario Pronk
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | | | - Martin Pabst
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands.
| |
Collapse
|
14
|
Renzi S, Nenciarini S, Bacci G, Cavalieri D. Yeast metagenomics: analytical challenges in the analysis of the eukaryotic microbiome. MICROBIOME RESEARCH REPORTS 2023; 3:2. [PMID: 38455081 PMCID: PMC10917621 DOI: 10.20517/mrr.2023.27] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 10/09/2023] [Accepted: 10/17/2023] [Indexed: 03/09/2024]
Abstract
Even if their impact is often underestimated, yeasts and yeast-like fungi represent the most prevalent eukaryotic members of microbial communities on Earth. They play numerous roles in natural ecosystems and in association with their hosts. They are involved in the food industry and pharmaceutical production, but they can also cause diseases in other organisms, making the understanding of their biology mandatory. The ongoing loss of biodiversity due to overexploitation of environmental resources is a growing concern in many countries. Therefore, it becomes crucial to understand the ecology and evolutionary history of these organisms to systematically classify them. To achieve this, it is essential that our knowledge of the mycobiota reaches a level similar to that of the bacterial communities. To overcome the existing challenges in the study of fungal communities, the first step should be the establishment of standardized techniques for the correct identification of species, even from complex matrices, both in wet lab practices and in bioinformatic tools.
Collapse
Affiliation(s)
| | | | | | - Duccio Cavalieri
- Correspondence to: Prof. Duccio Cavalieri, Department of Biology, University of Florence, Via Madonna del Piano 6, Sesto Fiorentino 50019, Italy. E-mail:
| |
Collapse
|
15
|
Rodrigues Jardim B, Tran-Nguyen LTT, Gambley C, Al-Sadi AM, Al-Subhi AM, Foissac X, Salar P, Cai H, Yang JY, Davis R, Jones L, Rodoni B, Constable FE. The observation of taxonomic boundaries for the 16SrII and 16SrXXV phytoplasmas using genome-based delimitation. Int J Syst Evol Microbiol 2023; 73. [PMID: 37486824 DOI: 10.1099/ijsem.0.005977] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/26/2023] Open
Abstract
Within the 16SrII phytoplasma group, subgroups A-X have been classified based on restriction fragment length polymorphism of their 16S rRNA gene, and two species have been described, namely 'Candidatus Phytoplasma aurantifolia' and 'Ca. Phytoplasma australasia'. Strains of 16SrII phytoplasmas are detected across a broad geographic range within Africa, Asia, Australia, Europe and North and South America. Historically, all members of the 16SrII group share ≥97.5 % nucleotide sequence identity of their 16S rRNA gene. In this study, we used whole genome sequences to identify the species boundaries within the 16SrII group. Whole genome analyses were done using 42 phytoplasma strains classified into seven 16SrII subgroups, five 16SrII taxa without official 16Sr subgroup classifications, and one 16SrXXV-A phytoplasma strain used as an outgroup taxon. Based on phylogenomic analyses as well as whole genome average nucleotide and average amino acid identity (ANI and AAI), eight distinct 16SrII taxa equivalent to species were identified, six of which are novel descriptions. Strains within the same species had ANI and AAI values of >97 %, and shared ≥80 % of their genomic segments based on the ANI analysis. Species also had distinct biological and/or ecological features. A 16SrII subgroup often represented a distinct species, e.g., the 16SrII-B subgroup members. Members classified within the 16SrII-A, 16SrII-D, and 16SrII-V subgroups as well as strains classified as sweet potato little leaf phytoplasmas fulfilled criteria to be included as members of a single species, but with subspecies-level relationships with each other. The 16SrXXV-A taxon was also described as a novel phytoplasma species and, based on criteria used for other bacterial families, provided evidence that it could be classified as a distinct genus from the 16SrII phytoplasmas. As more phytoplasma genome sequences become available, the classification system of these bacteria can be further refined at the genus, species, and subspecies taxonomic ranks.
Collapse
Affiliation(s)
- Bianca Rodrigues Jardim
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
- Agriculture Victoria Research, Department of Energy, Environment and Climate Action, AgriBio, Bundoora, Victoria, Australia
| | | | - Cherie Gambley
- Horticulture and Forestry Science, Department of Agriculture and Fisheries Maroochy Research Facility, Nambour, Queensland, Australia
| | - Abdullah M Al-Sadi
- Department of Plant Sciences, College of Agricultural and Marine Sciences, Sultan Qaboos University, Muscat, Oman
| | - Ali M Al-Subhi
- Department of Plant Sciences, College of Agricultural and Marine Sciences, Sultan Qaboos University, Muscat, Oman
| | - Xavier Foissac
- University of Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, 33140, Bordeaux, Villenave d'Ornon, France
| | - Pascal Salar
- University of Bordeaux, INRAE, Biologie du Fruit et Pathologie, UMR 1332, 33140, Bordeaux, Villenave d'Ornon, France
| | - Hong Cai
- The Key Laboratory for Plant Pathology, Yunnan Agricultural University, Kunming 650201, PR China
| | - Jun-Yi Yang
- Institute of Biochemistry, National Chung Hsing University, Taichung 402, Taiwan, ROC
- Advanced Plant Biotechnology Center, National Chung Hsing University, Taichung 402, Taiwan, ROC
| | - Richard Davis
- Northern Australia Quarantine Strategy, Department of Agriculture, Fisheries and Forestry, Canberra, Australian Capital Territory 2601, Australia
| | - Lynne Jones
- Northern Australia Quarantine Strategy, Department of Agriculture, Fisheries and Forestry, Canberra, Australian Capital Territory 2601, Australia
| | - Brendan Rodoni
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
- Agriculture Victoria Research, Department of Energy, Environment and Climate Action, AgriBio, Bundoora, Victoria, Australia
| | - Fiona E Constable
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
- Agriculture Victoria Research, Department of Energy, Environment and Climate Action, AgriBio, Bundoora, Victoria, Australia
| |
Collapse
|
16
|
Padilla-Vaca F, de la Mora J, García-Contreras R, Ramírez-Prado JH, Alva-Murillo N, Fonseca-Yepez S, Serna-Gutiérrez I, Moreno-Galván CL, Montufar-Rodríguez JM, Vicente-Gómez M, Rangel-Serrano Á, Vargas-Maya NI, Franco B. Two-Component System Sensor Kinases from Asgardian Archaea May Be Witnesses to Eukaryotic Cell Evolution. Molecules 2023; 28:5042. [PMID: 37446705 DOI: 10.3390/molecules28135042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/22/2023] [Accepted: 06/25/2023] [Indexed: 07/15/2023] Open
Abstract
The signal transduction paradigm in bacteria involves two-component systems (TCSs). Asgardarchaeota are archaea that may have originated the current eukaryotic lifeforms. Most research on these archaea has focused on eukaryotic-like features, such as genes involved in phagocytosis, cytoskeleton structure, and vesicle trafficking. However, little attention has been given to specific prokaryotic features. Here, the sequence and predicted structural features of TCS sensor kinases analyzed from two metagenome assemblies and a genomic assembly from cultured Asgardian archaea are presented. The homology of the sensor kinases suggests the grouping of Lokiarchaeum closer to bacterial homologs. In contrast, one group from a Lokiarchaeum and a meta-genome assembly from Candidatus Heimdallarchaeum suggest the presence of a set of kinases separated from the typical bacterial TCS sensor kinases. AtoS and ArcB homologs were found in meta-genome assemblies along with defined domains for other well-characterized sensor kinases, suggesting the close link between these organisms and bacteria that may have resulted in the metabolic link to the establishment of symbiosis. Several kinases are predicted to be cytoplasmic; some contain several PAS domains. The data shown here suggest that TCS kinases in Asgardian bacteria are witnesses to the transition from bacteria to eukaryotic organisms.
Collapse
Affiliation(s)
- Felipe Padilla-Vaca
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Javier de la Mora
- Departamento de Genética Molecular, Instituto de Fisiologia Celular, Universidad Nacional Autonoma de Mexico, Circuito Exterior s/n, Mexico City 04510, Mexico
| | - Rodolfo García-Contreras
- Departamento de Microbiología y Parasitología, Facultad de Medicina, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | | | - Nayeli Alva-Murillo
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Sofia Fonseca-Yepez
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Isaac Serna-Gutiérrez
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Carolina Lisette Moreno-Galván
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - José Manolo Montufar-Rodríguez
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Marcos Vicente-Gómez
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Ángeles Rangel-Serrano
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Naurú Idalia Vargas-Maya
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| | - Bernardo Franco
- Departamento de Biología, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta s/n, Guanajuato 36050, Mexico
| |
Collapse
|
17
|
Sanders JG, Sprockett DD, Li Y, Mjungu D, Lonsdorf EV, Ndjango JBN, Georgiev AV, Hart JA, Sanz CM, Morgan DB, Peeters M, Hahn BH, Moeller AH. Widespread extinctions of co-diversified primate gut bacterial symbionts from humans. Nat Microbiol 2023; 8:1039-1050. [PMID: 37169918 PMCID: PMC10860671 DOI: 10.1038/s41564-023-01388-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 04/19/2023] [Indexed: 05/13/2023]
Abstract
Humans and other primates harbour complex gut bacterial communities that influence health and disease, but the evolutionary histories of these symbioses remain unclear. This is partly due to limited information about the microbiota of ancestral primates. Here, using phylogenetic analyses of metagenome-assembled genomes (MAGs), we show that hundreds of gut bacterial clades diversified in parallel (that is, co-diversified) with primate species over millions of years, but that humans have experienced widespread losses of these ancestral symbionts. Analyses of 9,460 human and non-human primate MAGs, including newly generated MAGs from chimpanzees and bonobos, revealed significant co-diversification within ten gut bacterial phyla, including Firmicutes, Actinobacteriota and Bacteroidota. Strikingly, ~44% of the co-diversifying clades detected in African apes were absent from available metagenomic data from humans and ~54% were absent from industrialized human populations. In contrast, only ~3% of non-co-diversifying clades detected in African apes were absent from humans. Co-diversifying clades present in both humans and chimpanzees displayed consistent genomic signatures of natural selection between the two host species but differed in functional content from co-diversifying clades lost from humans, consistent with selection against certain functions. This study discovers host-species-specific bacterial symbionts that predate hominid diversification, many of which have undergone accelerated extinctions from human populations.
Collapse
Affiliation(s)
- Jon G Sanders
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
| | - Daniel D Sprockett
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
| | - Yingying Li
- Departments of Medicine and Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Deus Mjungu
- Gombe Stream Research Center, Kigoma, Tanzania
| | - Elizabeth V Lonsdorf
- Department of Psychology and Biological Foundations of Behavior Program, Franklin and Marshall College, Lancaster, PA, USA
- Department of Anthropology, Emory University, Atlanta, GA, USA
| | - Jean-Bosco N Ndjango
- Department of Ecology and Management of Plant and Animal Resources, Faculty of Sciences, University of Kisangani, Kisangani, Democratic Republic of the Congo
| | - Alexander V Georgiev
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- School of Natural Sciences, Bangor University, Bangor, UK
| | - John A Hart
- Lukuru Wildlife Research Foundation, Tshuapa-Lomami-Lualaba Project, Kinshasa, Democratic Republic of the Congo
| | - Crickette M Sanz
- Department of Anthropology, Washington University in St Louis, Saint Louis, MO, USA
- Wildlife Conservation Society, Congo Program, Brazzaville, Republic of Congo
| | - David B Morgan
- Lester E. Fisher Center for the Study and Conservation of Apes, Lincoln Park Zoo, Chicago, IL, USA
| | - Martine Peeters
- Recherche Translationnelle Appliquée au VIH et aux Maladies Infectieuses, Institut de Recherche pour le Développement, University of Montpellier, INSERM, Montpellier, France
| | - Beatrice H Hahn
- Departments of Medicine and Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Andrew H Moeller
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
18
|
Luan T, Muralidharan HS, Alshehri M, Mittra I, Pop M. SCRAPT: an iterative algorithm for clustering large 16S rRNA gene data sets. Nucleic Acids Res 2023; 51:e46. [PMID: 36912074 PMCID: PMC10164572 DOI: 10.1093/nar/gkad158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/01/2023] [Accepted: 02/28/2023] [Indexed: 03/14/2023] Open
Abstract
16S rRNA gene sequence clustering is an important tool in characterizing the diversity of microbial communities. As 16S rRNA gene data sets are growing in size, existing sequence clustering algorithms increasingly become an analytical bottleneck. Part of this bottleneck is due to the substantial computational cost expended on small clusters and singleton sequences. We propose an iterative sampling-based 16S rRNA gene sequence clustering approach that targets the largest clusters in the data set, allowing users to stop the clustering process when sufficient clusters are available for the specific analysis being targeted. We describe a probabilistic analysis of the iterative clustering process that supports the intuition that the clustering process identifies the larger clusters in the data set first. Using real data sets of 16S rRNA gene sequences, we show that the iterative algorithm, coupled with an adaptive sampling process and a mode-shifting strategy for identifying cluster representatives, substantially speeds up the clustering process while being effective at capturing the large clusters in the data set. The experiments also show that SCRAPT (Sample, Cluster, Recruit, AdaPt and iTerate) is able to produce operational taxonomic units that are less fragmented than popular tools: UCLUST, CD-HIT and DNACLUST. The algorithm is implemented in the open-source package SCRAPT. The source code used to generate the results presented in this paper is available at https://github.com/hsmurali/SCRAPT.
Collapse
Affiliation(s)
- Tu Luan
- Department of Computer Science, University of Maryland, College Park, 20742 MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Harihara Subrahmaniam Muralidharan
- Department of Computer Science, University of Maryland, College Park, 20742 MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Marwan Alshehri
- Department of Computer Science, University of Maryland, College Park, 20742 MD, USA
| | - Ipsa Mittra
- Department of Computer Science, University of Maryland, College Park, 20742 MD, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, 20742 MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
19
|
Jia L, Wu Y, Dong Y, Chen J, Chen WH, Zhao XM. A survey on computational strategies for genome-resolved gut metagenomics. Brief Bioinform 2023; 24:7145904. [PMID: 37114640 DOI: 10.1093/bib/bbad162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/20/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
Recovering high-quality metagenome-assembled genomes (HQ-MAGs) is critical for exploring microbial compositions and microbe-phenotype associations. However, multiple sequencing platforms and computational tools for this purpose may confuse researchers and thus call for extensive evaluation. Here, we systematically evaluated a total of 40 combinations of popular computational tools and sequencing platforms (i.e. strategies), involving eight assemblers, eight metagenomic binners and four sequencing technologies, including short-, long-read and metaHiC sequencing. We identified the best tools for the individual tasks (e.g. the assembly and binning) and combinations (e.g. generating more HQ-MAGs) depending on the availability of the sequencing data. We found that the combination of the hybrid assemblies and metaHiC-based binning performed best, followed by the hybrid and long-read assemblies. More importantly, both long-read and metaHiC sequencings link more mobile elements and antibiotic resistance genes to bacterial hosts and improve the quality of public human gut reference genomes with 32% (34/105) HQ-MAGs that were either of better quality than those in the Unified Human Gastrointestinal Genome catalog version 2 or novel.
Collapse
Affiliation(s)
- Longhao Jia
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Yingjian Wu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yanqi Dong
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Jingchao Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai 264003, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Ministry of Education, Shanghai 200433, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
20
|
Esquerra-Ruvira B, Baquedano I, Ruiz R, Fernandez A, Montoliu L, Mojica FJM. Identification of the EH CRISPR-Cas9 system on a metagenome and its application to genome engineering. Microb Biotechnol 2023. [PMID: 37097160 DOI: 10.1111/1751-7915.14266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/04/2023] [Accepted: 04/12/2023] [Indexed: 04/26/2023] Open
Abstract
Non-coding RNAs (crRNAs) produced from clustered regularly interspaced short palindromic repeats (CRISPR) loci and CRISPR-associated (Cas) proteins of the prokaryotic CRISPR-Cas systems form complexes that interfere with the spread of transmissible genetic elements through Cas-catalysed cleavage of foreign genetic material matching the guide crRNA sequences. The easily programmable targeting of nucleic acids enabled by these ribonucleoproteins has facilitated the implementation of CRISPR-based molecular biology tools for in vivo and in vitro modification of DNA and RNA targets. Despite the diversity of DNA-targeting Cas nucleases so far identified, native and engineered derivatives of the Streptococcus pyogenes SpCas9 are the most widely used for genome engineering, at least in part due to their catalytic robustness and the requirement of an exceptionally short motif (5'-NGG-3' PAM) flanking the target sequence. However, the large size of the SpCas9 variants impairs the delivery of the tool to eukaryotic cells and smaller alternatives are desirable. Here, we identify in a metagenome a new CRISPR-Cas9 system associated with a smaller Cas9 protein (EHCas9) that targets DNA sequences flanked by 5'-NGG-3' PAMs. We develop a simplified EHCas9 tool that specifically cleaves DNA targets and is functional for genome editing applications in prokaryotes and eukaryotic cells.
Collapse
Affiliation(s)
- Belen Esquerra-Ruvira
- Department of Physiology, Genetics and Microbiology, University of Alicante, Alicante, Spain
| | - Ignacio Baquedano
- Department of Physiology, Genetics and Microbiology, University of Alicante, Alicante, Spain
| | - Raul Ruiz
- Department of Physiology, Genetics and Microbiology, University of Alicante, Alicante, Spain
| | - Almudena Fernandez
- Department of Molecular and Cellular Biology, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER-ISCIII), Madrid, Spain
| | - Lluis Montoliu
- Department of Molecular and Cellular Biology, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
- Centre for Biomedical Network Research on Rare Diseases (CIBERER-ISCIII), Madrid, Spain
| | - Francisco J M Mojica
- Department of Physiology, Genetics and Microbiology, University of Alicante, Alicante, Spain
- Multidisciplinary Institute for Environmental Studies "Ramón Margalef", University of Alicante, Alicante, Spain
| |
Collapse
|
21
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:v15020587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
- Correspondence:
| |
Collapse
|
22
|
Zhou J, Song W, Tu Q. To assemble or not to assemble: metagenomic profiling of microbially mediated biogeochemical pathways in complex communities. Brief Bioinform 2023; 24:6961613. [PMID: 36575570 DOI: 10.1093/bib/bbac594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/22/2022] [Accepted: 12/04/2022] [Indexed: 12/29/2022] Open
Abstract
High-throughput profiling of microbial functional traits involved in various biogeochemical cycling pathways using shotgun metagenomic sequencing has been routinely applied in microbial ecology and environmental science. Multiple bioinformatics data processing approaches are available, including assembly-based (single-sample assembly and multi-sample assembly) and read-based (merged reads and raw data). However, it remains not clear how these different approaches may differ in data analyses and affect result interpretation. In this study, using two typical shotgun metagenome datasets recovered from geographically distant coastal sediments, the performance of different data processing approaches was comparatively investigated from both technical and biological/ecological perspectives. Microbially mediated biogeochemical cycling pathways, including nitrogen cycling, sulfur cycling and B12 biosynthesis, were analyzed. As a result, multi-sample assembly provided the most amount of usable information for targeted functional traits, at a high cost of computational resources and running time. Single-sample assembly and read-based analysis were comparable in obtaining usable information, but the former was much more time- and resource-consuming. Critically, different approaches introduced much stronger variations in microbial profiles than biological differences. However, community-level differences between the two sampling sites could be consistently observed despite the approaches being used. In choosing an appropriate approach, researchers shall balance the trade-offs between multiple factors, including the scientific question, the amount of usable information, computational resources and time cost. This study is expected to provide valuable technical insights and guidelines for the various approaches used for metagenomic data analysis.
Collapse
Affiliation(s)
- Jiayin Zhou
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Wen Song
- Institute of Marine Science and Technology, Shandong University, Qingdao, China
| | - Qichao Tu
- Institute of Marine Science and Technology, Shandong University, Qingdao, China.,Joint Lab for Ocean Research and Education at Dalhousie University, Shandong University and Xiamen University, Qingdao, China.,Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangzhou, China
| |
Collapse
|
23
|
Bonin N, Doster E, Worley H, Pinnell LJ, Bravo JE, Ferm P, Marini S, Prosperi M, Noyes N, Morley PS, Boucher C. MEGARes and AMR++, v3.0: an updated comprehensive database of antimicrobial resistance determinants and an improved software pipeline for classification using high-throughput sequencing. Nucleic Acids Res 2023; 51:D744-D752. [PMID: 36382407 PMCID: PMC9825433 DOI: 10.1093/nar/gkac1047] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/14/2022] [Accepted: 10/24/2022] [Indexed: 11/17/2022] Open
Abstract
Antimicrobial resistance (AMR) is considered a critical threat to public health, and genomic/metagenomic investigations featuring high-throughput analysis of sequence data are increasingly common and important. We previously introduced MEGARes, a comprehensive AMR database with an acyclic hierarchical annotation structure that facilitates high-throughput computational analysis, as well as AMR++, a customized bioinformatic pipeline specifically designed to use MEGARes in high-throughput analysis for characterizing AMR genes (ARGs) in metagenomic sequence data. Here, we present MEGARes v3.0, a comprehensive database of published ARG sequences for antimicrobial drugs, biocides, and metals, and AMR++ v3.0, an update to our customized bioinformatic pipeline for high-throughput analysis of metagenomic data (available at MEGLab.org). Database annotations have been expanded to include information regarding specific genomic locations for single-nucleotide polymorphisms (SNPs) and insertions and/or deletions (indels) when required by specific ARGs for resistance expression, and the updated AMR++ pipeline uses this information to check for presence of resistance-conferring genetic variants in metagenomic sequenced reads. This new information encompasses 337 ARGs, whose resistance-conferring variants could not previously be confirmed in such a manner. In MEGARes 3.0, the nodes of the acyclic hierarchical ontology include 4 antimicrobial compound types, 59 resistance classes, 233 mechanisms and 1448 gene groups that classify the 8733 accessions.
Collapse
Affiliation(s)
- Nathalie Bonin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Enrique Doster
- VERO Program, Veterinary Medicine and Biomedical Sciences, Texas A&M University, Canyon, TX, USA
| | - Hannah Worley
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
| | - Lee J Pinnell
- VERO Program, Veterinary Medicine and Biomedical Sciences, Texas A&M University, Canyon, TX, USA
| | - Jonathan E Bravo
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | - Peter Ferm
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
| | - Simone Marini
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Noelle Noyes
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
| | - Paul S Morley
- VERO Program, Veterinary Medicine and Biomedical Sciences, Texas A&M University, Canyon, TX, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| |
Collapse
|
24
|
Mendes CI, Vila-Cerqueira P, Motro Y, Moran-Gilad J, Carriço JA, Ramirez M. LMAS: evaluating metagenomic short de novo assembly methods through defined communities. Gigascience 2022; 12:giac122. [PMID: 36576131 PMCID: PMC9795473 DOI: 10.1093/gigascience/giac122] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 09/26/2022] [Accepted: 11/16/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The de novo assembly of raw sequence data is key in metagenomic analysis. It allows recovering draft genomes from a pool of mixed raw reads, yielding longer sequences that offer contextual information and provide a more complete picture of the microbial community. FINDINGS To better compare de novo assemblers for metagenomic analysis, LMAS (Last Metagenomic Assembler Standing) was developed as a flexible platform allowing users to evaluate assembler performance given known standard communities. Overall, in our test datasets, k-mer De Bruijn graph assemblers outperformed the alternative approaches but came with a greater computational cost. Furthermore, assemblers branded as metagenomic specific did not consistently outperform other genomic assemblers in metagenomic samples. Some assemblers still in use, such as ABySS, MetaHipmer2, minia, and VelvetOptimiser, perform relatively poorly and should be used with caution when assembling complex samples. Meaningful strain resolution at the single-nucleotide polymorphism level was not achieved, even by the best assemblers tested. CONCLUSIONS The choice of a de novo assembler depends on the computational resources available, the replicon of interest, and the major goals of the analysis. No single assembler appeared an ideal choice for short-read metagenomic prokaryote replicon assembly, each showing specific strengths. The choice of metagenomic assembler should be guided by user requirements and characteristics of the sample of interest, and LMAS provides an interactive evaluation platform for this purpose. LMAS is open source, and the workflow and its documentation are available at https://github.com/B-UMMI/LMAS and https://lmas.readthedocs.io/, respectively.
Collapse
Affiliation(s)
- Catarina Inês Mendes
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| | - Pedro Vila-Cerqueira
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| | - Yair Motro
- Faculty of Health Sciences, Ben-Gurion University of the Negev, 8410501 Beer-Sheva, Israel
| | - Jacob Moran-Gilad
- Faculty of Health Sciences, Ben-Gurion University of the Negev, 8410501 Beer-Sheva, Israel
| | - João André Carriço
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| | - Mário Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| |
Collapse
|
25
|
Vuong P, Wise MJ, Whiteley AS, Kaur P. Ten simple rules for investigating (meta)genomic data from environmental ecosystems. PLoS Comput Biol 2022; 18:e1010675. [PMID: 36480496 PMCID: PMC9731419 DOI: 10.1371/journal.pcbi.1010675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Paton Vuong
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
| | - Michael J. Wise
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, Australia
- The Marshall Centre of Infectious Diseases, School of Biological Sciences, The University of Western Australia, Perth, Australia
| | - Andrew S. Whiteley
- Centre for Environment & Life Sciences, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Floreat, Australia
| | - Parwinder Kaur
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
- * E-mail:
| |
Collapse
|
26
|
Critical Assessment of Short-Read Assemblers for the Metagenomic Identification of Foodborne and Waterborne Pathogens Using Simulated Bacterial Communities. Microorganisms 2022; 10:microorganisms10122416. [PMID: 36557669 PMCID: PMC9784204 DOI: 10.3390/microorganisms10122416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 11/30/2022] [Accepted: 12/03/2022] [Indexed: 12/12/2022] Open
Abstract
Metagenomics offers the highest level of strain discrimination of bacterial pathogens from complex food and water microbiota. With the rapid evolvement of assembly algorithms, defining an optimal assembler based on the performance in the metagenomic identification of foodborne and waterborne pathogens is warranted. We aimed to benchmark short-read assemblers for the metagenomic identification of foodborne and waterborne pathogens using simulated bacterial communities. Bacterial communities on fresh spinach and in surface water were simulated by generating paired-end short reads of Illumina HiSeq, MiSeq, and NovaSeq at different sequencing depths. Multidrug-resistant Salmonella Indiana SI43 and Pseudomonas aeruginosa PAO1 were included in the simulated communities on fresh spinach and in surface water, respectively. ABySS, IDBA-UD, MaSuRCA, MEGAHIT, metaSPAdes, and Ray Meta were benchmarked in terms of assembly quality, identifications of plasmids, virulence genes, Salmonella pathogenicity island, antimicrobial resistance genes, chromosomal point mutations, serotyping, multilocus sequence typing, and whole-genome phylogeny. Overall, MEGHIT, metaSPAdes, and Ray Meta were more effective for metagenomic identification. We did not obtain an optimal assembler when using the extracted reads classified as Salmonella or P. aeruginosa for downstream genomic analyses, but the extracted reads showed consistent phylogenetic topology with the reference genome when they were aligned with Salmonella or P. aeruginosa strains. In most cases, HiSeq, MiSeq, and NovaSeq were comparable at the same sequencing depth, while higher sequencing depths generally led to more accurate results. As assembly algorithms advance and mature, the evaluation of assemblers should be a continuous process.
Collapse
|
27
|
Lai S, Pan S, Sun C, Coelho LP, Chen WH, Zhao XM. metaMIC: reference-free misassembly identification and correction of de novo metagenomic assemblies. Genome Biol 2022; 23:242. [PMID: 36376928 PMCID: PMC9661791 DOI: 10.1186/s13059-022-02810-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
Evaluating the quality of metagenomic assemblies is important for constructing reliable metagenome-assembled genomes and downstream analyses. Here, we present metaMIC ( https://github.com/ZhaoXM-Lab/metaMIC ), a machine learning-based tool for identifying and correcting misassemblies in metagenomic assemblies. Benchmarking results on both simulated and real datasets demonstrate that metaMIC outperforms existing tools when identifying misassembled contigs. Furthermore, metaMIC is able to localize the misassembly breakpoints, and the correction of misassemblies by splitting at misassembly breakpoints can improve downstream scaffolding and binning results.
Collapse
Affiliation(s)
- Senying Lai
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Shaojun Pan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei China
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei China
- College of Life Science, Henan Normal University, Xinxiang, Henan China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
- International Human Phenome Institutes (Shanghai), Shanghai, China
- Zhangjiang Fudan International Innovation Center, Shanghai, China
| |
Collapse
|
28
|
Slizovskiy IB, Oliva M, Settle JK, Zyskina LV, Prosperi M, Boucher C, Noyes NR. Target-enriched long-read sequencing (TELSeq) contextualizes antimicrobial resistance genes in metagenomes. MICROBIOME 2022; 10:185. [PMID: 36324140 PMCID: PMC9628182 DOI: 10.1186/s40168-022-01368-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 09/02/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Metagenomic data can be used to profile high-importance genes within microbiomes. However, current metagenomic workflows produce data that suffer from low sensitivity and an inability to accurately reconstruct partial or full genomes, particularly those in low abundance. These limitations preclude colocalization analysis, i.e., characterizing the genomic context of genes and functions within a metagenomic sample. Genomic context is especially crucial for functions associated with horizontal gene transfer (HGT) via mobile genetic elements (MGEs), for example antimicrobial resistance (AMR). To overcome this current limitation of metagenomics, we present a method for comprehensive and accurate reconstruction of antimicrobial resistance genes (ARGs) and MGEs from metagenomic DNA, termed target-enriched long-read sequencing (TELSeq). RESULTS Using technical replicates of diverse sample types, we compared TELSeq performance to that of non-enriched PacBio and short-read Illumina sequencing. TELSeq achieved much higher ARG recovery (>1,000-fold) and sensitivity than the other methods across diverse metagenomes, revealing an extensive resistome profile comprising many low-abundance ARGs, including some with public health importance. Using the long reads generated by TELSeq, we identified numerous MGEs and cargo genes flanking the low-abundance ARGs, indicating that these ARGs could be transferred across bacterial taxa via HGT. CONCLUSIONS TELSeq can provide a nuanced view of the genomic context of microbial resistomes and thus has wide-ranging applications in public, animal, and human health, as well as environmental surveillance and monitoring of AMR. Thus, this technique represents a fundamental advancement for microbiome research and application. Video abstract.
Collapse
Affiliation(s)
- Ilya B Slizovskiy
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
| | - Marco Oliva
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Jonathen K Settle
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Lidiya V Zyskina
- Program in Human-Computer Interaction, College of Information Studies, University of Maryland, College Park, MD, USA
| | - Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Noelle R Noyes
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA.
| |
Collapse
|
29
|
Okazaki Y, Nakano SI, Toyoda A, Tamaki H. Long-Read-Resolved, Ecosystem-Wide Exploration of Nucleotide and Structural Microdiversity of Lake Bacterioplankton Genomes. mSystems 2022; 7:e0043322. [PMID: 35938717 PMCID: PMC9426551 DOI: 10.1128/msystems.00433-22] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
Reconstruction of metagenome-assembled genomes (MAGs) has become a fundamental approach in microbial ecology. However, a MAG is hardly complete and overlooks genomic microdiversity because metagenomic assembly fails to resolve microvariants among closely related genotypes. Aiming at understanding the universal factors that drive or constrain prokaryotic genome diversification, we performed an ecosystem-wide high-resolution metagenomic exploration of microdiversity by combining spatiotemporal (2 depths × 12 months) sampling from a pelagic freshwater system, high-quality MAG reconstruction using long- and short-read metagenomic sequences, and profiling of single nucleotide variants (SNVs) and structural variants (SVs) through mapping of short and long reads to the MAGs, respectively. We reconstructed 575 MAGs, including 29 circular assemblies, providing high-quality reference genomes of freshwater bacterioplankton. Read mapping against these MAGs identified 100 to 101,781 SNVs/Mb and 0 to 305 insertions, 0 to 467 deletions, 0 to 41 duplications, and 0 to 6 inversions for each MAG. Nonsynonymous SNVs were accumulated in genes potentially involved in cell surface structural modification to evade phage recognition. Most (80.2%) deletions overlapped with a gene coding region, and genes of prokaryotic defense systems were most frequently (>8% of the genes) overlapped with a deletion. Some such deletions exhibited a monthly shift in their allele frequency, suggesting a rapid turnover of genotypes in response to phage predation. MAGs with extremely low microdiversity were either rare or opportunistic bloomers, suggesting that population persistency is key to their genomic diversification. The results concluded that prokaryotic genomic diversification is driven primarily by viral load and constrained by a population bottleneck. IMPORTANCE Identifying intraspecies genomic diversity (microdiversity) is crucial to understanding microbial ecology and evolution. However, microdiversity among environmental assemblages is not well investigated, because most microbes are difficult to culture. In this study, we performed cultivation-independent exploration of bacterial genomic microdiversity in a lake ecosystem using a combination of short- and long-read metagenomic analyses. The results revealed the broad spectrum of genomic microdiversity among the diverse bacterial species in the ecosystem, which has been overlooked by conventional approaches. Our ecosystem-wide exploration further allowed comparative analysis among the genomes and genes and revealed factors behind microbial genomic diversification, namely, that diversification is driven primarily by resistance against viral infection and constrained by the population size.
Collapse
Affiliation(s)
- Yusuke Okazaki
- Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba, Ibaraki, Japan
| | - Shin-ichi Nakano
- Center for Ecological Research, Kyoto University, Otsu, Shiga, Japan
| | - Atsushi Toyoda
- Advanced Genomics Center, National Institute of Genetics, Mishima City, Shizuoka, Japan
| | - Hideyuki Tamaki
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba, Ibaraki, Japan
| |
Collapse
|
30
|
Escudeiro P, Henry CS, Dias RP. Functional characterization of prokaryotic dark matter: the road so far and what lies ahead. CURRENT RESEARCH IN MICROBIAL SCIENCES 2022; 3:100159. [PMID: 36561390 PMCID: PMC9764257 DOI: 10.1016/j.crmicr.2022.100159] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 12/25/2022] Open
Abstract
Eight-hundred thousand to one trillion prokaryotic species may inhabit our planet. Yet, fewer than two-hundred thousand prokaryotic species have been described. This uncharted fraction of microbial diversity, and its undisclosed coding potential, is known as the "microbial dark matter" (MDM). Next-generation sequencing has allowed to collect a massive amount of genome sequence data, leading to unprecedented advances in the field of genomics. Still, harnessing new functional information from the genomes of uncultured prokaryotes is often limited by standard classification methods. These methods often rely on sequence similarity searches against reference genomes from cultured species. This hinders the discovery of unique genetic elements that are missing from the cultivated realm. It also contributes to the accumulation of prokaryotic gene products of unknown function among public sequence data repositories, highlighting the need for new approaches for sequencing data analysis and classification. Increasing evidence indicates that these proteins of unknown function might be a treasure trove of biotechnological potential. Here, we outline the challenges, opportunities, and the potential hidden within the functional dark matter (FDM) of prokaryotes. We also discuss the pitfalls surrounding molecular and computational approaches currently used to probe these uncharted waters, and discuss future opportunities for research and applications.
Collapse
Affiliation(s)
- Pedro Escudeiro
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Christopher S. Henry
- Argonne National Laboratory, Lemont, Illinois, USA
- University of Chicago, Chicago, Illinois, USA
| | - Ricardo P.M. Dias
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
- iXLab - Innovation for National Biological Resilience, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| |
Collapse
|
31
|
Metagenomic methylation patterns resolve bacterial genomes of unusual size and structural complexity. THE ISME JOURNAL 2022; 16:1921-1931. [PMID: 35459792 PMCID: PMC9296519 DOI: 10.1038/s41396-022-01242-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 04/05/2022] [Accepted: 04/08/2022] [Indexed: 01/01/2023]
Abstract
The plasticity of bacterial and archaeal genomes makes examining their ecological and evolutionary dynamics both exciting and challenging. The same mechanisms that enable rapid genomic change and adaptation confound current approaches for recovering complete genomes from metagenomes. Here, we use strain-specific patterns of DNA methylation to resolve complex bacterial genomes from long-read metagenomic data of a marine microbial consortium, the “pink berries” of the Sippewissett Marsh (USA). Unique combinations of restriction-modification (RM) systems encoded by the bacteria produced distinctive methylation profiles that were used to accurately bin and classify metagenomic sequences. Using this approach, we finished the largest and most complex circularized bacterial genome ever recovered from a metagenome (7.9 Mb with >600 transposons), the finished genome of Thiohalocapsa sp. PB-PSB1 the dominant bacteria in the consortia. From genomes binned by methylation patterns, we identified instances of horizontal gene transfer between sulfur-cycling symbionts (Thiohalocapsa sp. PB-PSB1 and Desulfofustis sp. PB-SRB1), phage infection, and strain-level structural variation. We also linked the methylation patterns of each metagenome-assembled genome with encoded DNA methyltransferases and discovered new RM defense systems, including novel associations of RM systems with RNase toxins.
Collapse
|
32
|
Giorgashvili E, Reichel K, Caswara C, Kerimov V, Borsch T, Gruenstaeudl M. Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic Calligonum bakuense. FRONTIERS IN PLANT SCIENCE 2022; 13:779830. [PMID: 35874012 PMCID: PMC9296850 DOI: 10.3389/fpls.2022.779830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
Collapse
Affiliation(s)
- Eka Giorgashvili
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Katja Reichel
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Calvinna Caswara
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Vuqar Kerimov
- Institute of Botany, Azerbaijan National Academy of Sciences (ANAS), Baku, Azerbaijan
| | - Thomas Borsch
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
- Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, Berlin, Germany
| | - Michael Gruenstaeudl
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
33
|
Trigodet F, Lolans K, Fogarty E, Shaiber A, Morrison HG, Barreiro L, Jabri B, Eren AM. High molecular weight DNA extraction strategies for long-read sequencing of complex metagenomes. Mol Ecol Resour 2022; 22:1786-1802. [PMID: 35068060 PMCID: PMC9177515 DOI: 10.1111/1755-0998.13588] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 12/10/2021] [Accepted: 01/14/2022] [Indexed: 11/28/2022]
Abstract
By offering extremely long contiguous characterization of individual DNA molecules, rapidly emerging long-read sequencing strategies offer comprehensive insights into the organization of genetic information in genomes and metagenomes. However, successful long-read sequencing experiments demand high concentrations of highly purified DNA of high molecular weight (HMW), which limits the utility of established DNA extraction kits designed for short-read sequencing. The challenges associated with input DNA quality intensify further when working with complex environmental samples of low microbial biomass, which requires new protocols that are tailored to study metagenomes with long-read sequencing. Here, we use human tongue scrapings to benchmark six HMW DNA extraction strategies that are based on commercially available kits, phenol-chloroform (PC) extraction and agarose encasement followed by agarase digestion. A typical end goal of HMW DNA extractions is to obtain the longest possible reads during sequencing, which is often achieved by PC extractions, as demonstrated in sequencing of cultured cells. Yet our analyses that consider overall read-size distribution, assembly performance and the number of circularized elements found in sequencing results suggest that column-based kits with enzyme supplementation, rather than PC methods, may be more appropriate for long-read sequencing of metagenomes.
Collapse
Affiliation(s)
- Florian Trigodet
- Department of MedicineThe University of ChicagoChicagoIllinoisUSA
| | - Karen Lolans
- Department of MedicineThe University of ChicagoChicagoIllinoisUSA
| | - Emily Fogarty
- Committee on MicrobiologyUniversity of ChicagoChicagoIllinoisUSA
| | - Alon Shaiber
- BioPhysical Sciences ProgramThe University of ChicagoChicagoIllinoisUSA
| | - Hilary G. Morrison
- Josephine Bay Paul Center for Comparative Molecular Biology and EvolutionMarine Biological LaboratoryWoods HoleMassachusettsUSA
| | - Luis Barreiro
- Department of MedicineThe University of ChicagoChicagoIllinoisUSA
| | - Bana Jabri
- Department of MedicineThe University of ChicagoChicagoIllinoisUSA
| | - A. Murat Eren
- Department of MedicineThe University of ChicagoChicagoIllinoisUSA
- Committee on MicrobiologyUniversity of ChicagoChicagoIllinoisUSA
- BioPhysical Sciences ProgramThe University of ChicagoChicagoIllinoisUSA
- Josephine Bay Paul Center for Comparative Molecular Biology and EvolutionMarine Biological LaboratoryWoods HoleMassachusettsUSA
| |
Collapse
|
34
|
Sim M, Lee J, Wy S, Park N, Lee D, Kwon D, Kim J. Generation and application of pseudo-long reads for metagenome assembly. Gigascience 2022; 11:giac044. [PMID: 35579554 PMCID: PMC9112764 DOI: 10.1093/gigascience/giac044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 03/10/2022] [Accepted: 04/03/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Metagenomic assembly using high-throughput sequencing data is a powerful method to construct microbial genomes in environmental samples without cultivation. However, metagenomic assembly, especially when only short reads are available, is a complex and challenging task because mixed genomes of multiple microorganisms constitute the metagenome. Although long read sequencing technologies have been developed and have begun to be used for metagenomic assembly, many metagenomic studies have been performed based on short reads because the generation of long reads requires higher sequencing cost than short reads. RESULTS In this study, we present a new method called PLR-GEN. It creates pseudo-long reads from metagenomic short reads based on given reference genome sequences by considering small sequence variations existing in individual genomes of the same or different species. When applied to a mock community data set in the Human Microbiome Project, PLR-GEN dramatically extended short reads in length of 101 bp to pseudo-long reads with N50 of 33 Kbp and 0.4% error rate. The use of these pseudo-long reads generated by PLR-GEN resulted in an obvious improvement of metagenomic assembly in terms of the number of sequences, assembly contiguity, and prediction of species and genes. CONCLUSIONS PLR-GEN can be used to generate artificial long read sequences without spending extra sequencing cost, thus aiding various studies using metagenomes.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| |
Collapse
|
35
|
Rehner J, Schmartz GP, Groeger L, Dastbaz J, Ludwig N, Hannig M, Rupf S, Seitz B, Flockerzi E, Berger T, Reichert MC, Krawczyk M, Meese E, Herr C, Bals R, Becker SL, Keller A, Müller R. Systematic Cross-biospecimen Evaluation of DNA Extraction Kits for Long- and Short-read Multi-metagenomic Sequencing Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:405-417. [PMID: 35680095 PMCID: PMC9684153 DOI: 10.1016/j.gpb.2022.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 05/13/2022] [Accepted: 05/19/2022] [Indexed: 01/05/2023]
Abstract
High-quality DNA extraction is a crucial step in metagenomic studies. Bias by different isolation kits impairs the comparison across datasets. A trending topic is, however, the analysis of multiple metagenomes from the same patients to draw a holistic picture of microbiota associated with diseases. We thus collected bile, stool, saliva, plaque, sputum, and conjunctival swab samples and performed DNA extraction with three commercial kits. For each combination of the specimen type and DNA extraction kit, 20-gigabase (Gb) metagenomic data were generated using short-read sequencing. While profiles of the specimen types showed close proximity to each other, we observed notable differences in the alpha diversity and composition of the microbiota depending on the DNA extraction kits. No kit outperformed all selected kits on every specimen. We reached consistently good results using the Qiagen QiAamp DNA Microbiome Kit. Depending on the specimen, our data indicate that over 10 Gb of sequencing data are required to achieve sufficient resolution, but DNA-based identification is superior to identification by mass spectrometry. Finally, long-read nanopore sequencing confirmed the results (correlation coefficient > 0.98). Our results thus suggest using a strategy with only one kit for studies aiming for a direct comparison of multiple microbiotas from the same patients.
Collapse
Affiliation(s)
- Jacqueline Rehner
- Institute of Medical Microbiology and Hygiene, Saarland University, D-66421 Homburg, Germany
| | | | - Laura Groeger
- Department of Human Genetics, Saarland University, D-66421 Homburg, Germany
| | - Jan Dastbaz
- Helmholtz Institute for Pharmaceutical Research Saarland, D-66123 Saarbrücken, Germany
| | - Nicole Ludwig
- Department of Human Genetics, Saarland University, D-66421 Homburg, Germany
| | - Matthias Hannig
- Clinic of Operative Dentistry, Periodontology and Preventive Dentistry, Saarland University, D-66421 Homburg, Germany
| | - Stefan Rupf
- Clinic of Operative Dentistry, Periodontology and Preventive Dentistry, Saarland University, D-66421 Homburg, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center, D-66421 Homburg, Germany
| | - Elias Flockerzi
- Department of Ophthalmology, Saarland University Medical Center, D-66421 Homburg, Germany
| | - Tim Berger
- Department of Ophthalmology, Saarland University Medical Center, D-66421 Homburg, Germany
| | | | - Marcin Krawczyk
- Department of Medicine II, Saarland University Medical Center, D-66421 Homburg, Germany
| | - Eckart Meese
- Department of Human Genetics, Saarland University, D-66421 Homburg, Germany
| | - Christian Herr
- Department of Internal Medicine V - Pulmonology, Allergology, Intensive Care Medicine, Saarland University, D-66421 Homburg, Germany
| | - Robert Bals
- Department of Internal Medicine V - Pulmonology, Allergology, Intensive Care Medicine, Saarland University, D-66421 Homburg, Germany
| | - Sören L Becker
- Institute of Medical Microbiology and Hygiene, Saarland University, D-66421 Homburg, Germany
| | - Andreas Keller
- Clinical Bioinformatics, Saarland University, D-66123 Saarbrücken, Germany.
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland, D-66123 Saarbrücken, Germany
| |
Collapse
|
36
|
Zhou Y, Liu M, Yang J. Recovering metagenome-assembled genomes from shotgun metagenomic sequencing data: methods, applications, challenges, and opportunities. Microbiol Res 2022; 260:127023. [DOI: 10.1016/j.micres.2022.127023] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 03/07/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
|
37
|
Zhang L, Chen F, Zeng Z, Xu M, Sun F, Yang L, Bi X, Lin Y, Gao Y, Hao H, Yi W, Li M, Xie Y. Advances in Metagenomics and Its Application in Environmental Microorganisms. Front Microbiol 2022; 12:766364. [PMID: 34975791 PMCID: PMC8719654 DOI: 10.3389/fmicb.2021.766364] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 11/18/2021] [Indexed: 01/04/2023] Open
Abstract
Metagenomics is a new approach to study microorganisms obtained from a specific environment by functional gene screening or sequencing analysis. Metagenomics studies focus on microbial diversity, community constitute, genetic and evolutionary relationships, functional activities, and interactions and relationships with the environment. Sequencing technologies have evolved from shotgun sequencing to high-throughput, next-generation sequencing (NGS), and third-generation sequencing (TGS). NGS and TGS have shown the advantage of rapid detection of pathogenic microorganisms. With the help of new algorithms, we can better perform the taxonomic profiling and gene prediction of microbial species. Functional metagenomics is helpful to screen new bioactive substances and new functional genes from microorganisms and microbial metabolites. In this article, basic steps, classification, and applications of metagenomics are reviewed.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - FengXin Chen
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Zhan Zeng
- Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| | - Mengjiao Xu
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Fangfang Sun
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Liu Yang
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Xiaoyue Bi
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Yanjie Lin
- Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| | - YuanJiao Gao
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - HongXiao Hao
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Wei Yi
- Department of Gynecology and Obstetrics, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Minghui Li
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China.,Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| | - Yao Xie
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China.,Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| |
Collapse
|
38
|
Balaji A, Sapoval N, Seto C, Leo Elworth R, Fu Y, Nute MG, Savidge T, Segarra S, Treangen TJ. KOMB: K-core based de novo characterization of copy number variation in microbiomes. Comput Struct Biotechnol J 2022; 20:3208-3222. [PMID: 35832621 PMCID: PMC9249589 DOI: 10.1016/j.csbj.2022.06.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open
Abstract
Characterizing metagenomes via kmer-based, database-dependent taxonomic classification has yielded key insights into underlying microbiome dynamics. However, novel approaches are needed to track community dynamics and genomic flux within metagenomes, particularly in response to perturbations. We describe KOMB, a novel method for tracking genome level dynamics within microbiomes. KOMB utilizes K-core decomposition to identify Structural variations (SVs), specifically, population-level Copy Number Variation (CNV) within microbiomes. K-core decomposition partitions the graph into shells containing nodes of induced degree at least K, yielding reduced computational complexity compared to prior approaches. Through validation on a synthetic community, we show that KOMB recovers and profiles repetitive genomic regions in the sample. KOMB is shown to identify functionally-important regions in Human Microbiome Project datasets, and was used to analyze longitudinal data and identify keystone taxa in Fecal Microbiota Transplantation (FMT) samples. In summary, KOMB represents a novel graph-based, taxonomy-oblivious, and reference-free approach for tracking CNV within microbiomes. KOMB is open source and available for download at https://gitlab.com/treangenlab/komb.
Collapse
Affiliation(s)
- Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Charlie Seto
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - R.A. Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Michael G. Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Tor Savidge
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
- Corresponding author.
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Corresponding author.
| |
Collapse
|
39
|
Abstract
Microbial communities are key components of all ecosystems, but characterization of their complete genomic structure remains challenging. Typical analysis tends to elude the complexity of the mixes in terms of species, strains, as well as extrachromosomal DNA molecules. Recently, approaches have been developed that bins DNA contigs into individual genomes and episomes according to their 3D contact frequencies. Those contacts are quantified by chromosome conformation capture experiments (3C, Hi-C), also known as proximity-ligation approaches, applied to metagenomics samples. Here, we present a simple computational pipeline that allows to recover high-quality Metagenomics Assemble Genomes (MAGs) starting from metagenomic 3C or Hi-C datasets and a metagenome assembly.
Collapse
|
40
|
Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, Zhang L. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J 2021; 19:6301-6314. [PMID: 34900140 PMCID: PMC8640167 DOI: 10.1016/j.csbj.2021.11.028] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 12/16/2022] Open
Abstract
Metagenomic sequencing provides a culture-independent avenue to investigate the complex microbial communities by constructing metagenome-assembled genomes (MAGs). A MAG represents a microbial genome by a group of sequences from genome assembly with similar characteristics. It enables us to identify novel species and understand their potential functions in a dynamic ecosystem. Many computational tools have been developed to construct and annotate MAGs from metagenomic sequencing, however, there is a prominent gap to comprehensively introduce their background and practical performance. In this paper, we have thoroughly investigated the computational tools designed for both upstream and downstream analyses, including metagenome assembly, metagenome binning, gene prediction, functional annotation, taxonomic classification, and profiling. We have categorized the commonly used tools into unique groups based on their functional background and introduced the underlying core algorithms and associated information to demonstrate a comparative outlook. Furthermore, we have emphasized the computational requisition and offered guidance to the users to select the most efficient tools. Finally, we have indicated current limitations, potential solutions, and future perspectives for further improving the tools of MAG construction and annotation. We believe that our work provides a consolidated resource for the current stage of MAG studies and shed light on the future development of more effective MAG analysis tools on metagenomic sequencing.
Collapse
Key Words
- CNN, convolutional neural network
- DBG, De Bruijn graph
- GTDB, Genome Taxonomy Database
- Gene functional annotation
- Gene prediction
- Genome assembly
- HMM, Hidden Markov Model
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- LCA, lowest common ancestor
- LPA, label propagation algorithm
- MAGs, metagenome-assembled genomes
- Metagenome binning
- Metagenome-assembled genomes
- Metagenomic sequencing
- Microbial abundance profiling
- OLC, overlap-layout consensus
- ONT, Oxford Nanopore Technologies
- ORFs, open reading frames
- PacBio, Pacific Biosciences
- QC, quality control
- SLR, synthetic long reads
- TNFs, tetranucleotide frequencies
- Taxonomic classification
Collapse
Affiliation(s)
- Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Debajyoti Chowdhury
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - William K. Cheung
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Aiping Lu
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhaoxiang Bian
- Institute of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Chinese Medicine Clinical Study Center, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
| |
Collapse
|
41
|
Abstract
Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities.
Collapse
|
42
|
Kiguchi Y, Nishijima S, Kumar N, Hattori M, Suda W. Long-read metagenomics of multiple displacement amplified DNA of low-biomass human gut phageomes by SACRA pre-processing chimeric reads. DNA Res 2021; 28:6377780. [PMID: 34586399 DOI: 10.1093/dnares/dsab019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Indexed: 01/21/2023] Open
Abstract
The human gut bacteriophage community (phageome) plays an important role in the host's health and disease; however, the entire structure is poorly understood, partly owing to the generation of many incomplete genomes in conventional short-read metagenomics. Here, we show long-read metagenomics of amplified DNA of low-biomass phageomes with multiple displacement amplification (MDA), involving the development of a novel bioinformatics tool, split amplified chimeric read algorithm (SACRA), that efficiently pre-processed numerous chimeric reads generated through MDA. Using five samples, SACRA markedly reduced the average chimera ratio from 72% to 1.5% in PacBio reads with an average length of 1.8 kb. De novo assembly of chimera-less PacBio long reads reconstructed contigs of ≥5 kb with an average proportion of 27%, which was 1% in contigs from MiSeq short reads, thereby dramatically improving contig length and genome completeness. Comparison of PacBio and MiSeq contigs found MiSeq contig fragmentations frequently near local repeats and hypervariable regions in the phage genomes, and those caused by multiple homologous phage genomes coexisting in the community. We also developed a reference-independent method to assess the completeness of the linear phage genomes. Overall, we established a SACRA-coupled long-read metagenomics robust to highly diverse gut phageomes, identifying high-quality circular and linear phage genomes with adequate sequence quantity.
Collapse
Affiliation(s)
- Yuya Kiguchi
- Cooperative Major in Advanced Health Science, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo 169-8555, Japan
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Suguru Nishijima
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo 169-8555, Japan
- Integrated Institute for Regulatory Science, Waseda University, Tokyo 169-8555, Japan
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Naveen Kumar
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Masahira Hattori
- Cooperative Major in Advanced Health Science, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Wataru Suda
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| |
Collapse
|
43
|
DeWeese KJ, Osborne MG. Understanding the metabolome and metagenome as extended phenotypes: The next frontier in macroalgae domestication and improvement. JOURNAL OF THE WORLD AQUACULTURE SOCIETY 2021; 52:1009-1030. [PMID: 34732977 PMCID: PMC8562568 DOI: 10.1111/jwas.12782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 02/25/2021] [Indexed: 06/01/2023]
Abstract
"Omics" techniques (including genomics, transcriptomics, metabolomics, proteomics, and metagenomics) have been employed with huge success in the improvement of agricultural crops. As marine aquaculture of macroalgae expands globally, biologists are working to domesticate species of macroalgae by applying these techniques tested in agriculture to wild macroalgae species. Metabolomics has revealed metabolites and pathways that influence agriculturally relevant traits in crops, allowing for informed crop crossing schemes and genomic improvement strategies that would be pivotal to inform selection on macroalgae for domestication. Advances in metagenomics have improved understanding of host-symbiont interactions and the potential for microbial organisms to improve crop outcomes. There is much room in the field of macroalgal biology for further research toward improvement of macroalgae cultivars in aquaculture using metabolomic and metagenomic analyses. To this end, this review discusses the application and necessary expansion of the omics tool kit for macroalgae domestication as we move to enhance seaweed farming worldwide.
Collapse
Affiliation(s)
- Kelly J DeWeese
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, California, Los Angeles
| | - Melisa G Osborne
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, California, Los Angeles
| |
Collapse
|
44
|
Kayani MUR, Huang W, Feng R, Chen L. Genome-resolved metagenomics using environmental and clinical samples. Brief Bioinform 2021; 22:bbab030. [PMID: 33758906 PMCID: PMC8425419 DOI: 10.1093/bib/bbab030] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 12/25/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.
Collapse
Affiliation(s)
- Masood ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Wanqiu Huang
- Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200,000, China
| | - Ru Feng
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| |
Collapse
|
45
|
Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform 2021; 21:584-594. [PMID: 30815668 PMCID: PMC7299287 DOI: 10.1093/bib/bbz020] [Citation(s) in RCA: 100] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/31/2019] [Accepted: 02/01/2019] [Indexed: 02/07/2023] Open
Abstract
In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.
Collapse
Affiliation(s)
- Martin Ayling
- Earlham Institute, Norwich Research Park, Norwich, UK
| | | | | |
Collapse
|
46
|
Nearing JT, Comeau AM, Langille MGI. Identifying biases and their potential solutions in human microbiome studies. MICROBIOME 2021; 9:113. [PMID: 34006335 PMCID: PMC8132403 DOI: 10.1186/s40168-021-01059-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 03/24/2021] [Indexed: 05/13/2023]
Abstract
Advances in DNA sequencing technology have vastly improved the ability of researchers to explore the microbial inhabitants of the human body. Unfortunately, while these studies have uncovered the importance of these microbial communities to our health, they often do not result in similar findings. One possible reason for the disagreement in these results is due to the multitude of systemic biases that are introduced during sequence-based microbiome studies. These biases begin with sample collection and continue to be introduced throughout the entire experiment leading to an observed community that is significantly altered from the true underlying microbial composition. In this review, we will highlight the various steps in typical sequence-based human microbiome studies where significant bias can be introduced, and we will review the current efforts within the field that aim to reduce the impact of these biases. Video abstract.
Collapse
Affiliation(s)
- Jacob T Nearing
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - André M Comeau
- Integrated Microbiome Resource, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Morgan G I Langille
- Integrated Microbiome Resource, Dalhousie University, Halifax, Nova Scotia, Canada.
- Department of Pharmacology, Dalhousie University, Halifax, Nova Scotia, Canada.
| |
Collapse
|
47
|
Liu B, Thippabhotla S, Zhang J, Zhong C. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data. Front Genet 2021; 12:669495. [PMID: 34025724 PMCID: PMC8131839 DOI: 10.3389/fgene.2021.669495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 12/21/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
Collapse
Affiliation(s)
- Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States.,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, United States
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States.,Bioengineering Program, The University of Kansas, Lawrence, KS, United States.,Center for Computational Biology, The University of Kansas, Lawrence, KS, United States
| |
Collapse
|
48
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
49
|
Singleton CM, Petriglieri F, Kristensen JM, Kirkegaard RH, Michaelsen TY, Andersen MH, Kondrotaite Z, Karst SM, Dueholm MS, Nielsen PH, Albertsen M. Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing. Nat Commun 2021; 12:2009. [PMID: 33790294 PMCID: PMC8012365 DOI: 10.1038/s41467-021-22203-2] [Citation(s) in RCA: 153] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 02/12/2021] [Indexed: 12/17/2022] Open
Abstract
Microorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.
Collapse
Affiliation(s)
- Caitlin M Singleton
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Francesca Petriglieri
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Jannie M Kristensen
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Rasmus H Kirkegaard
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Thomas Y Michaelsen
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Martin H Andersen
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Zivile Kondrotaite
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Søren M Karst
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Morten S Dueholm
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Per H Nielsen
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| | - Mads Albertsen
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.
| |
Collapse
|
50
|
Lapidus AL, Korobeynikov AI. Metagenomic Data Assembly - The Way of Decoding Unknown Microorganisms. Front Microbiol 2021; 12:613791. [PMID: 33833738 PMCID: PMC8021871 DOI: 10.3389/fmicb.2021.613791] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 03/03/2021] [Indexed: 01/08/2023] Open
Abstract
Metagenomics is a segment of conventional microbial genomics dedicated to the sequencing and analysis of combined genomic DNA of entire environmental samples. The most critical step of the metagenomic data analysis is the reconstruction of individual genes and genomes of the microorganisms in the communities using metagenomic assemblers - computational programs that put together small fragments of sequenced DNA generated by sequencing instruments. Here, we describe the challenges of metagenomic assembly, a wide spectrum of applications in which metagenomic assemblies were used to better understand the ecology and evolution of microbial ecosystems, and present one of the most efficient microbial assemblers, SPAdes that was upgraded to become applicable for metagenomics.
Collapse
Affiliation(s)
- Alla L. Lapidus
- Center for Algorithmic Biotechnology, St. Petersburg State University, Saint Petersburg, Russia
| | | |
Collapse
|