1
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
2
|
Ejaz MR, Badr K, Hassan ZU, Al-Thani R, Jaoua S. Metagenomic approaches and opportunities in arid soil research. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176173. [PMID: 39260494 DOI: 10.1016/j.scitotenv.2024.176173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 09/04/2024] [Accepted: 09/07/2024] [Indexed: 09/13/2024]
Abstract
Arid soils present unique challenges and opportunities for studying microbial diversity and bioactive potential due to the extreme environmental conditions they bear. This review article investigates soil metagenomics as an emerging tool to explore complex microbial dynamics and unexplored bioactive potential in harsh environments. Utilizing advanced metagenomic techniques, diverse microbial populations that grow under extreme conditions such as high temperatures, salinity, high pH levels, and exposure to metals and radiation can be studied. The use of extremophiles to discover novel natural products and biocatalysts emphasizes the role of functional metagenomics in identifying enzymes and secondary metabolites for industrial and pharmaceutical purposes. Metagenomic sequencing uncovers a complex network of microbial diversity, offering significant potential for discovering new bioactive compounds. Functional metagenomics, connecting taxonomic diversity to genetic capabilities, provides a pathway to identify microbes' mechanisms to synthesize valuable secondary metabolites and other bioactive substances. Contrary to the common perception of desert soil as barren land, the metagenomic analysis reveals a rich diversity of life forms adept at extreme survival. It provides valuable findings into their resilience and potential applications in biotechnology. Moreover, the challenges associated with metagenomics in arid soils, such as low microbial biomass, high DNA degradation rates, and DNA extraction inhibitors and strategies to overcome these issues, outline the latest advancements in extraction methods, high-throughput sequencing, and bioinformatics. The importance of metagenomics for investigating diverse environments opens the way for future research to develop sustainable solutions in agriculture, industry, and medicine. Extensive studies are necessary to utilize the full potential of these powerful microbial communities. This research will significantly improve our understanding of microbial ecology and biotechnology in arid environments.
Collapse
Affiliation(s)
- Muhammad Riaz Ejaz
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Kareem Badr
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Zahoor Ul Hassan
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Roda Al-Thani
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar
| | - Samir Jaoua
- Environmental Science Program, Department of Biological and Environmental Sciences, College of Arts and Science, Qatar University, P.O. Box 2713, Doha, Qatar.
| |
Collapse
|
3
|
Wang Z, Lloyd D, Zhao S, Motsinger-Reif A. Taxanorm: a novel taxa-specific normalization approach for microbiome data. BMC Bioinformatics 2024; 25:304. [PMID: 39285319 PMCID: PMC11406911 DOI: 10.1186/s12859-024-05918-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 08/28/2024] [Indexed: 09/19/2024] Open
Abstract
BACKGROUND In high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa. RESULTS We developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. We also propose two corresponding diagnosis tests on the varying sequencing depth effect for validation. We find that TaxaNorm achieves comparable performance to existing methods in most simulation scenarios in downstream analysis and reaches a higher power for some cases. Specifically, it balances power and false discovery control well. When applying the method in a real dataset, TaxaNorm has improved performance when correcting technical bias. CONCLUSION TaxaNorm both sample- and taxon- specific bias by introducing an appropriate regression framework in the microbiome data, which aids in data interpretation and visualization. The 'TaxaNorm' R package is freely available through the CRAN repository https://CRAN.R-project.org/package=TaxaNorm and the source code can be downloaded at https://github.com/wangziyue57/TaxaNorm .
Collapse
Affiliation(s)
- Ziyue Wang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, 27709, USA
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - Dillon Lloyd
- Department of Biological Sciences and Statistics, North Carolina State University, Raleigh, NC, 27695, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Shanshan Zhao
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, 27709, USA
| | - Alison Motsinger-Reif
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, 27709, USA.
| |
Collapse
|
4
|
Goussarov G, Mysara M, Cleenwerck I, Claesen J, Leys N, Vandamme P, Van Houdt R. Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities. MICROBIOLOGY (READING, ENGLAND) 2024; 170:001469. [PMID: 38916949 PMCID: PMC11261854 DOI: 10.1099/mic.0.001469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/23/2024] [Indexed: 06/26/2024]
Abstract
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Bioinformatics group, Information Technology & Computer Science, Nile University, Giza, Egypt
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
5
|
Lou EG, Fu Y, Wang Q, Treangen TJ, Stadler LB. Sensitivity and consistency of long- and short-read metagenomics and epicPCR for the detection of antibiotic resistance genes and their bacterial hosts in wastewater. JOURNAL OF HAZARDOUS MATERIALS 2024; 469:133939. [PMID: 38490149 DOI: 10.1016/j.jhazmat.2024.133939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 02/12/2024] [Accepted: 02/29/2024] [Indexed: 03/17/2024]
Abstract
Wastewater surveillance is a powerful tool to assess the risks associated with antibiotic resistance in communities. One challenge is selecting which analytical tool to deploy to measure risk indicators, such as antibiotic resistance genes (ARGs) and their respective bacterial hosts. Although metagenomics is frequently used for analyzing ARGs, few studies have compared the performance of long-read and short-read metagenomics in identifying which bacteria harbor ARGs in wastewater. Furthermore, for ARG host detection, untargeted metagenomics has not been compared to targeted methods such as epicPCR. Here, we 1) evaluated long-read and short-read metagenomics as well as epicPCR for detecting ARG hosts in wastewater, and 2) investigated the host range of ARGs across the wastewater treatment plant (WWTP) to evaluate host proliferation. Results highlighted long-read revealed a wider range of ARG hosts compared to short-read metagenomics. Nonetheless, the ARG host range detected by long-read metagenomics only represented a subset of the hosts detected by epicPCR. The ARG-host linkages across the influent and effluent of the WWTP were characterized. Results showed the ARG-host phylum linkages were relatively consistent across the WWTP, whereas new ARG-host species linkages appeared in the WWTP effluent. The ARG-host linkages of several clinically relevant species found in the effluent were identified.
Collapse
Affiliation(s)
- Esther G Lou
- Department of Civil and Environmental Engineering, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Qi Wang
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA
| | - Lauren B Stadler
- Department of Civil and Environmental Engineering, Rice University, 6100 Main Street, Houston, TX 77005, USA.
| |
Collapse
|
6
|
Eisenhofer R, Nesme J, Santos-Bay L, Koziol A, Sørensen SJ, Alberdi A, Aizpurua O. A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics. Microbiol Spectr 2024; 12:e0359023. [PMID: 38451230 PMCID: PMC10986573 DOI: 10.1128/spectrum.03590-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/11/2024] [Indexed: 03/08/2024] Open
Abstract
Shotgun metagenomics enables the reconstruction of complex microbial communities at a high level of detail. Such an approach can be conducted using both short-read and long-read sequencing data, as well as a combination of both. To assess the pros and cons of these different approaches, we used 22 fecal DNA extracts collected weekly for 11 weeks from two respective lab mice to study seven performance metrics over four combinations of sequencing depth and technology: (i) 20 Gbp of Illumina short-read data, (ii) 40 Gbp of short-read data, (iii) 20 Gbp of PacBio HiFi long-read data, and (iv) 40 Gbp of hybrid (20 Gbp of short-read +20 Gbp of long-read) data. No strategy was best for all metrics; instead, each one excelled across different metrics. The long-read approach yielded the best assembly statistics, with the highest N50 and lowest number of contigs. The 40 Gbp short-read approach yielded the highest number of refined bins. Finally, the hybrid approach yielded the longest assemblies and the highest mapping rate to the bacterial genomes. Our results suggest that while long-read sequencing significantly improves the quality of reconstructed bacterial genomes, it is more expensive and requires deeper sequencing than short-read approaches to recover a comparable amount of reconstructed genomes. The most optimal strategy is study-specific and depends on how researchers assess the trade-off between the quantity and quality of recovered genomes.IMPORTANCEMice are an important model organism for understanding the gut microbiome. When studying these gut microbiomes using DNA techniques, researchers can choose from technologies that use short or long DNA reads. In this study, we perform an extensive benchmark between short- and long-read DNA sequencing for studying mice gut microbiomes. We find that no one approach was best for all metrics and provide information that can help guide researchers in planning their experiments.
Collapse
Affiliation(s)
- Raphael Eisenhofer
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Joseph Nesme
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Luisa Santos-Bay
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Adam Koziol
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Søren Johannes Sørensen
- Section of Microbiology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ostaizka Aizpurua
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
7
|
Wu S, Feng T, Tang W, Qi C, Gao J, He X, Wang J, Zhou H, Fang Z. metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model. Brief Bioinform 2024; 25:bbae085. [PMID: 38487846 PMCID: PMC10940841 DOI: 10.1093/bib/bbae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/26/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Beneficial bacteria remain largely unexplored. Lacking systematic methods, understanding probiotic community traits becomes challenging, leading to various conclusions about their probiotic effects among different publications. We developed language model-based metaProbiotics to rapidly detect probiotic bins from metagenomes, demonstrating superior performance in simulated benchmark datasets. Testing on gut metagenomes from probiotic-treated individuals, it revealed the probioticity of intervention strains-derived bins and other probiotic-associated bins beyond the training data, such as a plasmid-like bin. Analyses of these bins revealed various probiotic mechanisms and bai operon as probiotic Ruminococcaceae's potential marker. In different health-disease cohorts, these bins were more common in healthy individuals, signifying their probiotic role, but relevant health predictions based on the abundance profiles of these bins faced cross-disease challenges. To better understand the heterogeneous nature of probiotics, we used metaProbiotics to construct a comprehensive probiotic genome set from global gut metagenomic data. Module analysis of this set shows that diseased individuals often lack certain probiotic gene modules, with significant variation of the missing modules across different diseases. Additionally, different gene modules on the same probiotic have heterogeneous effects on various diseases. We thus believe that gene function integrity of the probiotic community is more crucial in maintaining gut homeostasis than merely increasing specific gene abundance, and adding probiotics indiscriminately might not boost health. We expect that the innovative language model-based metaProbiotics tool will promote novel probiotic discovery using large-scale metagenomic data and facilitate systematic research on bacterial probiotic effects. The metaProbiotics program can be freely downloaded at https://github.com/zhenchengfang/metaProbiotics.
Collapse
Affiliation(s)
- Shufang Wu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Waijiao Tang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Cancan Qi
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jie Gao
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Department of Gastroenterology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xiaolong He
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jiaxuan Wang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
8
|
Valencia EM, Maki KA, Dootz JN, Barb JJ. Mock community taxonomic classification performance of publicly available shotgun metagenomics pipelines. Sci Data 2024; 11:81. [PMID: 38233447 PMCID: PMC10794705 DOI: 10.1038/s41597-023-02877-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/22/2023] [Indexed: 01/19/2024] Open
Abstract
Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
Collapse
Affiliation(s)
- E Michael Valencia
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA
| | - Katherine A Maki
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA
| | - Jennifer N Dootz
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Jennifer J Barb
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA.
| |
Collapse
|
9
|
Sanchez FB, Sato Guima SE, Setubal JC. How to Obtain and Compare Metagenome-Assembled Genomes. Methods Mol Biol 2024; 2802:135-163. [PMID: 38819559 DOI: 10.1007/978-1-0716-3838-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Metagenome-assembled genomes, or MAGs, are genomes retrieved from metagenome datasets. In the vast majority of cases, MAGs are genomes from prokaryotic species that have not been isolated or cultivated in the lab. They, therefore, provide us with information on these species that are impossible to obtain otherwise, at least until new cultivation methods are devised. Thanks to improvements and cost reductions of DNA sequencing technologies and growing interest in microbial ecology, the rise in number of MAGs in genome repositories has been exponential. This chapter covers the basics of MAG retrieval and processing and provides a practical step-by-step guide using a real dataset and state-of-the-art tools for MAG analysis and comparison.
Collapse
Affiliation(s)
- Fabio Beltrame Sanchez
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Suzana Eiko Sato Guima
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
10
|
Liang X, Zhang J, Kim Y, Ho J, Liu K, Keenum I, Gupta S, Davis B, Hepp SL, Zhang L, Xia K, Knowlton KF, Liao J, Vikesland PJ, Pruden A, Heath LS. ARGem: a new metagenomics pipeline for antibiotic resistance genes: metadata, analysis, and visualization. Front Genet 2023; 14:1219297. [PMID: 37811141 PMCID: PMC10558085 DOI: 10.3389/fgene.2023.1219297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 09/01/2023] [Indexed: 10/10/2023] Open
Abstract
Antibiotic resistance is of crucial interest to both human and animal medicine. It has been recognized that increased environmental monitoring of antibiotic resistance is needed. Metagenomic DNA sequencing is becoming an attractive method to profile antibiotic resistance genes (ARGs), including a special focus on pathogens. A number of computational pipelines are available and under development to support environmental ARG monitoring; the pipeline we present here is promising for general adoption for the purpose of harmonized global monitoring. Specifically, ARGem is a user-friendly pipeline that provides full-service analysis, from the initial DNA short reads to the final visualization of results. The capture of extensive metadata is also facilitated to support comparability across projects and broader monitoring goals. The ARGem pipeline offers efficient analysis of a modest number of samples along with affordable computational components, though the throughput could be increased through cloud resources, based on the user's configuration. The pipeline components were carefully assessed and selected to satisfy tradeoffs, balancing efficiency and flexibility. It was essential to provide a step to perform short read assembly in a reasonable time frame to ensure accurate annotation of identified ARGs. Comprehensive ARG and mobile genetic element databases are included in ARGem for annotation support. ARGem further includes an expandable set of analysis tools that include statistical and network analysis and supports various useful visualization techniques, including Cytoscape visualization of co-occurrence and correlation networks. The performance and flexibility of the ARGem pipeline is demonstrated with analysis of aquatic metagenomes. The pipeline is freely available at https://github.com/xlxlxlx/ARGem.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Jingyi Zhang
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Yoonjin Kim
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Josh Ho
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Kevin Liu
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Ishi Keenum
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Suraj Gupta
- Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Benjamin Davis
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Shannon L. Hepp
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Liqing Zhang
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Kang Xia
- School of Plant and Environmental Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Katharine F. Knowlton
- Department of Dairy Science, Virginia Polytechnic Institute and State University, Blacksburg, VaA, United States
| | - Jingqiu Liao
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Peter J. Vikesland
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Amy Pruden
- Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Lenwood S. Heath
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| |
Collapse
|
11
|
Wang Q, Liu Z, Ma A, Li Z, Liu B, Ma Q. Computational methods and challenges in analyzing intratumoral microbiome data. Trends Microbiol 2023; 31:707-722. [PMID: 36841736 PMCID: PMC10272078 DOI: 10.1016/j.tim.2023.01.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 01/29/2023] [Accepted: 01/30/2023] [Indexed: 02/25/2023]
Abstract
The human microbiome is intimately related to cancer biology and plays a vital role in the efficacy of cancer treatments, including immunotherapy. Extraordinary evidence has revealed that several microbes influence tumor development through interaction with the host immune system, that is, immuno-oncology-microbiome (IOM). This review focuses on the intratumoral microbiome in IOM and describes the available data and computational methods for discovering biological insights of microbial profiling from host bulk, single-cell, and spatial sequencing data. Critical challenges in data analysis and integration are discussed. Specifically, the microorganisms associated with cancer and cancer treatment in the context of IOM are collected and integrated from the literature. Lastly, we provide our perspectives for future directions in IOM research.
Collapse
Affiliation(s)
- Qi Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Zhaoqian Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA; Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China; Shandong National Center for Applied Mathematics, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA; Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA.
| |
Collapse
|
12
|
Cristina Diaconu C, Madalina Pitica I, Chivu-Economescu M, Georgiana Necula L, Botezatu A, Virginia Iancu I, Iulia Neagu A, L. Radu E, Matei L, Maria Ruta S, Bleotu C. SARS-CoV-2 Variant Surveillance in Genomic Medicine Era. Infect Dis (Lond) 2023. [DOI: 10.5772/intechopen.107137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/26/2024] Open
Abstract
In the genomic medicine era, the emergence of SARS-CoV-2 was immediately followed by viral genome sequencing and world-wide sequences sharing. Almost in real-time, based on these sequences, resources were developed and applied around the world, such as molecular diagnostic tests, informed public health decisions, and vaccines. Molecular SARS-CoV-2 variant surveillance was a normal approach in this context yet, considering that the viral genome modification occurs commonly in viral replication process, the challenge is to identify the modifications that significantly affect virulence, transmissibility, reduced effectiveness of vaccines and therapeutics or failure of diagnostic tests. However, assessing the importance of the emergence of new mutations and linking them to epidemiological trend, is still a laborious process and faster phenotypic evaluation approaches, in conjunction with genomic data, are required in order to release timely and efficient control measures.
Collapse
|
13
|
Gabrielli M, Dai Z, Delafont V, Timmers PHA, van der Wielen PWJJ, Antonelli M, Pinto AJ. Identifying Eukaryotes and Factors Influencing Their Biogeography in Drinking Water Metagenomes. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:3645-3660. [PMID: 36827617 PMCID: PMC9996835 DOI: 10.1021/acs.est.2c09010] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
The biogeography of eukaryotes in drinking water systems is poorly understood relative to that of prokaryotes or viruses, limiting the understanding of their role and management. A challenge with studying complex eukaryotic communities is that metagenomic analysis workflows are currently not as mature as those that focus on prokaryotes or viruses. In this study, we benchmarked different strategies to recover eukaryotic sequences and genomes from metagenomic data and applied the best-performing workflow to explore the factors affecting the relative abundance and diversity of eukaryotic communities in drinking water distribution systems (DWDSs). We developed an ensemble approach exploiting k-mer- and reference-based strategies to improve eukaryotic sequence identification and identified MetaBAT2 as the best-performing binning approach for their clustering. Applying this workflow to the DWDS metagenomes showed that eukaryotic sequences typically constituted small proportions (i.e., <1%) of the overall metagenomic data with higher relative abundances in surface water-fed or chlorinated systems with high residuals. The α and β diversities of eukaryotes were correlated with those of prokaryotic and viral communities, highlighting the common role of environmental/management factors. Finally, a co-occurrence analysis highlighted clusters of eukaryotes whose members' presence and abundance in DWDSs were affected by disinfection strategies, climate conditions, and source water types.
Collapse
Affiliation(s)
- Marco Gabrielli
- Dipartimento
di Ingegneria Civile e Ambientale—Sezione Ambientale, Politecnico di Milano, Milan 20133, Italy
| | - Zihan Dai
- Research
Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Vincent Delafont
- Laboratoire
Ecologie et Biologie des Interactions (EBI), Equipe Microorganismes,
Hôtes, Environnements, Université
de Poitiers, Poitiers 86073, France
| | - Peer H. A. Timmers
- KWR
Watercycle Research Institute, 3433 PE Nieuwegein, The Netherlands
- Department
of Microbiology, Radboud University, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands
| | - Paul W. J. J. van der Wielen
- KWR
Watercycle Research Institute, 3433 PE Nieuwegein, The Netherlands
- Laboratory
of Microbiology, Wageningen University, 6700 HB Wageningen, The Netherlands
| | - Manuela Antonelli
- Dipartimento
di Ingegneria Civile e Ambientale—Sezione Ambientale, Politecnico di Milano, Milan 20133, Italy
| | - Ameet J. Pinto
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
14
|
Normand P, Pujic P, Abrouk D, Vemulapally S, Guerra T, Carlos-Shanley C, Hahn D. Draft Genomes of Frankia strains AiPa1 and AiPs1 Retrieved from Soil with Monocultures of Picea abies or Pinus sylvestris using Alnus incana as Capture Plant. J Genomics 2023; 11:1-8. [PMID: 36594039 PMCID: PMC9760358 DOI: 10.7150/jgen.77880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/12/2022] [Indexed: 11/24/2022] Open
Abstract
The genomes of two nitrogen-fixing Frankia strains, AiPa1 and AiPs1, are described as representatives of two novel candidate species. Both strains were isolated from root nodules of Alnus incana, used as capture plants in bioassays on soils from a reforested site at Karttula, Finland, that was devoid of actinorhizal plants but contained 25 year-old monocultures of spruce (Picea abies (L.) Karsten) or pine (Pinus sylvestris L.), respectively. ANI analyses indicate that each strain represents a novel Frankia species, with genome sizes of 6.98 and 7.35 Mb for AiPa1 and AiPs1, respectively. Both genomes harbored genes typical for many other symbiotic frankiae, including genes essential for nitrogen-fixation, for synthesis of hopanoid lipids and iron-sulfur clusters, as well as clusters of orthologous genes, secondary metabolite determinants and transcriptional regulators. Genomes of AiPa1 and AiPs1 had lost 475 and 112 genes, respectively, compared to those of other cultivated Alnus-infective strains with large genomes. Lost genes included one hup cluster in AiPa1 and the gvp cluster in AiPs1, suggesting that some genome erosion has started to occur in a different manner in the two strains.
Collapse
Affiliation(s)
- Philippe Normand
- Université Claude-Bernard Lyon 1, Université de Lyon, UMR 5557 CNRS Ecologie Microbienne, Villeurbanne, France
| | - Petar Pujic
- Université Claude-Bernard Lyon 1, Université de Lyon, UMR 5557 CNRS Ecologie Microbienne, Villeurbanne, France
| | - Danis Abrouk
- Université Claude-Bernard Lyon 1, Université de Lyon, UMR 5557 CNRS Ecologie Microbienne, Villeurbanne, France
| | - Spandana Vemulapally
- Texas State University, Department of Biology, 601 University Drive, San Marcos, TX 78666, USA
| | - Trina Guerra
- Texas State University, Department of Biology, 601 University Drive, San Marcos, TX 78666, USA
| | - Camila Carlos-Shanley
- Texas State University, Department of Biology, 601 University Drive, San Marcos, TX 78666, USA
| | - Dittmar Hahn
- Texas State University, Department of Biology, 601 University Drive, San Marcos, TX 78666, USA
| |
Collapse
|
15
|
Zafeiropoulos H, Beracochea M, Ninidakis S, Exter K, Potirakis A, De Moro G, Richardson L, Corre E, Machado J, Pafilis E, Kotoulas G, Santi I, Finn RD, Cox CJ, Pavloudi C. metaGOflow: a workflow for the analysis of marine Genomic Observatories shotgun metagenomics data. Gigascience 2022; 12:giad078. [PMID: 37850871 PMCID: PMC10583283 DOI: 10.1093/gigascience/giad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/30/2023] [Accepted: 09/11/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND Genomic Observatories (GOs) are sites of long-term scientific study that undertake regular assessments of the genomic biodiversity. The European Marine Omics Biodiversity Observation Network (EMO BON) is a network of GOs that conduct regular biological community samplings to generate environmental and metagenomic data of microbial communities from designated marine stations around Europe. The development of an effective workflow is essential for the analysis of the EMO BON metagenomic data in a timely and reproducible manner. FINDINGS Based on the established MGnify resource, we developed metaGOflow. metaGOflow supports the fast inference of taxonomic profiles from GO-derived data based on ribosomal RNA genes and their functional annotation using the raw reads. Thanks to the Research Object Crate packaging, relevant metadata about the sample under study, and the details of the bioinformatics analysis it has been subjected to, are inherited to the data product while its modular implementation allows running the workflow partially. The analysis of 2 EMO BON samples and 1 Tara Oceans sample was performed as a use case. CONCLUSIONS metaGOflow is an efficient and robust workflow that scales to the needs of projects producing big metagenomic data such as EMO BON. It highlights how containerization technologies along with modern workflow languages and metadata package approaches can support the needs of researchers when dealing with ever-increasing volumes of biological data. Despite being initially oriented to address the needs of EMO BON, metaGOflow is a flexible and easy-to-use workflow that can be broadly used for one-sample-at-a-time analysis of shotgun metagenomics data.
Collapse
Affiliation(s)
- Haris Zafeiropoulos
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
- KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, 3000 Leuven, Belgium
| | - Martin Beracochea
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stelios Ninidakis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Katrina Exter
- Flanders Marine Institute (VLIZ), 8400 Oostende, Belgium
| | - Antonis Potirakis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Gianluca De Moro
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Erwan Corre
- CNRS, FR 2424, ABiMS Platform, Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - João Machado
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
| | - Ioulia Santi
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
- European Marine Biological Resource Centre (EMBRC-ERIC), 75005 Paris, France
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cymon J Cox
- Centro de Ciências do Mar (CCMAR), Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, 71003 Heraklion, Crete, Greece
- Department of Biological Sciences, The George Washington University, 20052 Washington, DC, USA
| |
Collapse
|
16
|
Sun J, Qiu Z, Egan R, Ho H, Li Y, Wang Z. Persistent memory as an effective alternative to random access memory in metagenome assembly. BMC Bioinformatics 2022; 23:513. [PMID: 36451083 PMCID: PMC9710083 DOI: 10.1186/s12859-022-05052-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 11/11/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become too large to be assembled on a single computer node, and there is no reliable method to predict the memory requirement due to data-specific memory consumption pattern. Currently, out-of-memory (OOM) is one of the most prevalent factors that causes metagenome assembly failures. RESULTS In this study, we explored the possibility of using Persistent Memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers. We evaluated the execution time and memory usage of three popular metagenome assemblers (MetaSPAdes, MEGAHIT, and MetaHipMer2) in datasets up to one terabase. We found that PMem can enable metagenome assemblers on terabyte-sized datasets by partially or fully substituting DRAM. Depending on the configured DRAM/PMEM ratio, running metagenome assemblies with PMem can achieve a similar speed as DRAM, while in the worst case it showed a roughly two-fold slowdown. In addition, different assemblers displayed distinct memory/speed trade-offs in the same hardware/software environment. CONCLUSIONS We demonstrated that PMem is capable of expanding the capacity of DRAM to allow larger metagenome assembly with a potential tradeoff in speed. Because PMem can be used directly without any application-specific code modification, these findings are likely to be generalized to other memory-intensive bioinformatics applications.
Collapse
Affiliation(s)
| | | | - Rob Egan
- grid.451309.a0000 0004 0449 479XDepartment of Energy Joint Genome Institute, Berkeley, CA 94720 USA
| | - Harrison Ho
- grid.451309.a0000 0004 0449 479XDepartment of Energy Joint Genome Institute, Berkeley, CA 94720 USA ,grid.266096.d0000 0001 0049 1282School of Natural Sciences, University of California at Merced, Merced, CA 95343 USA
| | - Yue Li
- MemVerge Inc, Milpitas, CA 95035 USA
| | - Zhong Wang
- grid.451309.a0000 0004 0449 479XDepartment of Energy Joint Genome Institute, Berkeley, CA 94720 USA ,grid.266096.d0000 0001 0049 1282School of Natural Sciences, University of California at Merced, Merced, CA 95343 USA ,grid.184769.50000 0001 2231 4551Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| |
Collapse
|
17
|
Slizovskiy IB, Oliva M, Settle JK, Zyskina LV, Prosperi M, Boucher C, Noyes NR. Target-enriched long-read sequencing (TELSeq) contextualizes antimicrobial resistance genes in metagenomes. MICROBIOME 2022; 10:185. [PMID: 36324140 PMCID: PMC9628182 DOI: 10.1186/s40168-022-01368-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 09/02/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Metagenomic data can be used to profile high-importance genes within microbiomes. However, current metagenomic workflows produce data that suffer from low sensitivity and an inability to accurately reconstruct partial or full genomes, particularly those in low abundance. These limitations preclude colocalization analysis, i.e., characterizing the genomic context of genes and functions within a metagenomic sample. Genomic context is especially crucial for functions associated with horizontal gene transfer (HGT) via mobile genetic elements (MGEs), for example antimicrobial resistance (AMR). To overcome this current limitation of metagenomics, we present a method for comprehensive and accurate reconstruction of antimicrobial resistance genes (ARGs) and MGEs from metagenomic DNA, termed target-enriched long-read sequencing (TELSeq). RESULTS Using technical replicates of diverse sample types, we compared TELSeq performance to that of non-enriched PacBio and short-read Illumina sequencing. TELSeq achieved much higher ARG recovery (>1,000-fold) and sensitivity than the other methods across diverse metagenomes, revealing an extensive resistome profile comprising many low-abundance ARGs, including some with public health importance. Using the long reads generated by TELSeq, we identified numerous MGEs and cargo genes flanking the low-abundance ARGs, indicating that these ARGs could be transferred across bacterial taxa via HGT. CONCLUSIONS TELSeq can provide a nuanced view of the genomic context of microbial resistomes and thus has wide-ranging applications in public, animal, and human health, as well as environmental surveillance and monitoring of AMR. Thus, this technique represents a fundamental advancement for microbiome research and application. Video abstract.
Collapse
Affiliation(s)
- Ilya B Slizovskiy
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA
| | - Marco Oliva
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Jonathen K Settle
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Lidiya V Zyskina
- Program in Human-Computer Interaction, College of Information Studies, University of Maryland, College Park, MD, USA
| | - Mattia Prosperi
- Data Intelligence Systems Lab, Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, FL, USA
| | - Noelle R Noyes
- Food-Centric Corridor, Infectious Disease Laboratory, Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, St. Paul, MN, USA.
| |
Collapse
|
18
|
Waite DW, Liefting L, Delmiglio C, Chernyavtseva A, Ha HJ, Thompson JR. Development and Validation of a Bioinformatic Workflow for the Rapid Detection of Viruses in Biosecurity. Viruses 2022; 14:v14102163. [PMID: 36298719 PMCID: PMC9610911 DOI: 10.3390/v14102163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/25/2022] [Indexed: 11/05/2022] Open
Abstract
The field of biosecurity has greatly benefited from the widespread adoption of high-throughput sequencing technologies, for its ability to deeply query plant and animal samples for pathogens for which no tests exist. However, the bioinformatics analysis tools designed for rapid analysis of these sequencing datasets are not developed with this application in mind, limiting the ability of diagnosticians to standardise their workflows using published tool kits. We sought to assess previously published bioinformatic tools for their ability to identify plant- and animal-infecting viruses while distinguishing from the host genetic material. We discovered that many of the current generation of virus-detection pipelines are not adequate for this task, being outperformed by more generic classification tools. We created synthetic MinION and HiSeq libraries simulating plant and animal infections of economically important viruses and assessed a series of tools for their suitability for rapid and accurate detection of infection, and further tested the top performing tools against the VIROMOCK Challenge dataset to ensure that our findings were reproducible when compared with international standards. Our work demonstrated that several methods provide sensitive and specific detection of agriculturally important viruses in a timely manner and provides a key piece of ground truthing for method development in this space.
Collapse
Affiliation(s)
- David W. Waite
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
- Correspondence:
| | - Lia Liefting
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
| | - Catia Delmiglio
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
| | | | - Hye Jeong Ha
- Animal Health Laboratory, Ministry for Primary Industries, Upper Hutt 5018, New Zealand
| | - Jeremy R. Thompson
- Plant Health and Environment Laboratory, Ministry for Primary Industries, P.O. Box 2095, Auckland 1140, New Zealand
| |
Collapse
|
19
|
Churcheward B, Millet M, Bihouée A, Fertin G, Chaffron S. MAGNETO: An Automated Workflow for Genome-Resolved Metagenomics. mSystems 2022; 7:e0043222. [PMID: 35703559 PMCID: PMC9426564 DOI: 10.1128/msystems.00432-22] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 05/06/2022] [Indexed: 12/24/2022] Open
Abstract
Metagenome-assembled genomes (MAGs) represent individual genomes recovered from metagenomic data. MAGs are extremely useful to analyze uncultured microbial genomic diversity, as well as to characterize associated functional and metabolic potential in natural environments. Recent computational developments have considerably improved MAG reconstruction but also emphasized several limitations, such as the nonbinning of sequence regions with repetitions or distinct nucleotidic composition. Different assembly and binning strategies are often used; however, it still remains unclear which assembly strategy, in combination with which binning approach, offers the best performance for MAG recovery. Several workflows have been proposed in order to reconstruct MAGs, but users are usually limited to single-metagenome assembly or need to manually define sets of metagenomes to coassemble prior to genome binning. Here, we present MAGNETO, an automated workflow dedicated to MAG reconstruction, which includes a fully-automated coassembly step informed by optimal clustering of metagenomic distances, and implements complementary genome binning strategies, for improving MAG recovery. MAGNETO is implemented as a Snakemake workflow and is available at: https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto. IMPORTANCE Genome-resolved metagenomics has led to the discovery of previously untapped biodiversity within the microbial world. As the development of computational methods for the recovery of genomes from metagenomes continues, existing strategies need to be evaluated and compared to eventually lead to standardized computational workflows. In this study, we compared commonly used assembly and binning strategies and assessed their performance using both simulated and real metagenomic data sets. We propose a novel approach to automate coassembly, avoiding the requirement for a priori knowledge to combine metagenomic information. The comparison against a previous coassembly approach demonstrates a strong impact of this step on genome binning results, but also the benefits of informing coassembly for improving the quality of recovered genomes. MAGNETO integrates complementary assembly-binning strategies to optimize genome reconstruction and provides a complete reads-to-genomes workflow for the growing microbiome research community.
Collapse
Affiliation(s)
| | - Maxime Millet
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Audrey Bihouée
- Nantes Université, CNRS, INSERM, l’institut du thorax, F-44000 Nantes, France
- Nantes Université, CHU Nantes, SFR Bonamy, F-44000 Nantes, France
| | - Guillaume Fertin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
| | - Samuel Chaffron
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France
- Research Federation for the study of Global Ocean Systems Ecology and Evolution, FR2022/Tara Oceans, Paris, France
| |
Collapse
|
20
|
Hempel CA, Wright N, Harvie J, Hleap JS, Adamowicz S, Steinke D. Metagenomics versus total RNA sequencing: most accurate data-processing tools, microbial identification accuracy and perspectives for ecological assessments. Nucleic Acids Res 2022; 50:9279-9293. [PMID: 35979944 PMCID: PMC9458450 DOI: 10.1093/nar/gkac689] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/05/2022] [Accepted: 07/29/2022] [Indexed: 12/24/2022] Open
Abstract
Metagenomics and total RNA sequencing (total RNA-Seq) have the potential to improve the taxonomic identification of diverse microbial communities, which could allow for the incorporation of microbes into routine ecological assessments. However, these target-PCR-free techniques require more testing and optimization. In this study, we processed metagenomics and total RNA-Seq data from a commercially available microbial mock community using 672 data-processing workflows, identified the most accurate data-processing tools, and compared their microbial identification accuracy at equal and increasing sequencing depths. The accuracy of data-processing tools substantially varied among replicates. Total RNA-Seq was more accurate than metagenomics at equal sequencing depths and even at sequencing depths almost one order of magnitude lower than those of metagenomics. We show that while data-processing tools require further exploration, total RNA-Seq might be a favorable alternative to metagenomics for target-PCR-free taxonomic identifications of microbial communities and might enable a substantial reduction in sequencing costs while maintaining accuracy. This could be particularly an advantage for routine ecological assessments, which require cost-effective yet accurate methods, and might allow for the incorporation of microbes into ecological assessments.
Collapse
Affiliation(s)
- Christopher A Hempel
- To whom correspondence should be addressed. Tel: +1 519 824 4120; Fax: +1 519 824 5703;
| | - Natalie Wright
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Julia Harvie
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Jose S Hleap
- SHARCNET, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Sarah J Adamowicz
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Dirk Steinke
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada,Centre for Biodiversity Genomics, University of Guelph, Guelph, ON N1G 2W1, Canada
| |
Collapse
|
21
|
Vollmers J, Wiegand S, Lenk F, Kaster AK. How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner. Nucleic Acids Res 2022; 50:e76. [PMID: 35536293 PMCID: PMC9303271 DOI: 10.1093/nar/gkac294] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 11/12/2022] Open
Abstract
As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.
Collapse
Affiliation(s)
- John Vollmers
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Sandra Wiegand
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Florian Lenk
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Anne-Kristin Kaster
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
22
|
Lobanov V, Gobet A, Joyce A. Ecosystem-specific microbiota and microbiome databases in the era of big data. ENVIRONMENTAL MICROBIOME 2022; 17:37. [PMID: 35842686 PMCID: PMC9287977 DOI: 10.1186/s40793-022-00433-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/29/2022] [Indexed: 05/05/2023]
Abstract
The rapid development of sequencing methods over the past decades has accelerated both the potential scope and depth of microbiota and microbiome studies. Recent developments in the field have been marked by an expansion away from purely categorical studies towards a greater investigation of community functionality. As in-depth genomic and environmental coverage is often distributed unequally across major taxa and ecosystems, it can be difficult to identify or substantiate relationships within microbial communities. Generic databases containing datasets from diverse ecosystems have opened a new era of data accessibility despite costs in terms of data quality and heterogeneity. This challenge is readily embodied in the integration of meta-omics data alongside habitat-specific standards which help contextualise datasets both in terms of sample processing and background within the ecosystem. A special case of large genomic repositories, ecosystem-specific databases (ES-DB's), have emerged to consolidate and better standardise sample processing and analysis protocols around individual ecosystems under study, allowing independent studies to produce comparable datasets. Here, we provide a comprehensive review of this emerging tool for microbial community analysis in relation to current trends in the field. We focus on the factors leading to the formation of ES-DB's, their comparison to traditional microbial databases, the potential for ES-DB integration with meta-omics platforms, as well as inherent limitations in the applicability of ES-DB's.
Collapse
Affiliation(s)
- Victor Lobanov
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
| | | | - Alyssa Joyce
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden.
| |
Collapse
|
23
|
Haryono MAS, Law YY, Arumugam K, Liew LCW, Nguyen TQN, Drautz-Moses DI, Schuster SC, Wuertz S, Williams RBH. Recovery of High Quality Metagenome-Assembled Genomes From Full-Scale Activated Sludge Microbial Communities in a Tropical Climate Using Longitudinal Metagenome Sampling. Front Microbiol 2022; 13:869135. [PMID: 35756038 PMCID: PMC9230771 DOI: 10.3389/fmicb.2022.869135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
The analysis of metagenome data based on the recovery of draft genomes (so called metagenome-assembled genomes, or MAG) has assumed an increasingly central role in microbiome research in recent years. Microbial communities underpinning the operation of wastewater treatment plants are particularly challenging targets for MAG analysis due to their high ecological complexity, and remain important, albeit understudied, microbial communities that play ssa key role in mediating interactions between human and natural ecosystems. Here we consider strategies for recovery of MAG sequence from time series metagenome surveys of full-scale activated sludge microbial communities. We generate MAG catalogs from this set of data using several different strategies, including the use of multiple individual sample assemblies, two variations on multi-sample co-assembly and a recently published MAG recovery workflow using deep learning. We obtain a total of just under 9,100 draft genomes, which collapse to around 3,100 non-redundant genomic clusters. We examine the strengths and weaknesses of these approaches in relation to MAG yield and quality, showing that co-assembly may offer advantages over single-sample assembly in the case of metagenome data obtained from closely sampled longitudinal study designs. Around 1,000 MAGs were candidates for being considered high quality, based on single-copy marker gene occurrence statistics, however only 58 MAG formally meet the MIMAG criteria for being high quality draft genomes. These findings carry broader broader implications for performing genome-resolved metagenomics on highly complex communities, the design and implementation of genome recoverability strategies, MAG decontamination and the search for better binning methodology.
Collapse
Affiliation(s)
- Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| | - Ying Yu Law
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Krithika Arumugam
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Larry C-W Liew
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Thi Quynh Ngoc Nguyen
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Daniela I Drautz-Moses
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Stephan C Schuster
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Stefan Wuertz
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| |
Collapse
|
24
|
Deshpande AS, Fahrenfeld NL. Abundance, diversity, and host assignment of total, intracellular, and extracellular antibiotic resistance genes in riverbed sediments. WATER RESEARCH 2022; 217:118363. [PMID: 35390554 DOI: 10.1016/j.watres.2022.118363] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/23/2022] [Accepted: 03/24/2022] [Indexed: 06/14/2023]
Abstract
Human health risk assessment for environmental antibiotic resistant microbes requires not only quantifying the abundance of antibiotic resistance genes (ARGs) in environmental matrices, but also understanding their hosts and genetic context. Further, differentiating ARGs in intracellular and extracellular DNA (iDNA and eDNA) fractions may help refine our understanding of ARG transferability. The objectives of this study were to understand the (O1) abundance and diversity of extracellular, intracellular, and total ARGs along a land use gradient and (O2) impact of bioinformatics pipeline on the assignment of putative hosts for the ARGs observed in the different DNA fractions. Sediment samples were collected along a land use gradient in the Raritan River, New Jersey, USA. DNA was extracted to separate eDNA and iDNA and qPCR was performed for select ARGs and the 16S rRNA gene. Shotgun metagenomic sequencing was performed on DNA extracts for the different DNA fractions. ARG hosts were assigned via two different bioinformatic pipelines: network analysis of raw reads versus assembly. Results of the two pipelines were compared to evaluate their performance in terms of number and diversity of linkages and accuracy of in silico matrix spike host assignments. No differences were observed in the 16S rRNA gene normalized sul1 concentrations between the DNA fractions. The overall microbial community structure was more similar for iDNA and total DNA compared to eDNA and generally clustered by sampling site. ARGs associated with mobile genetic elements increased in iDNA for the downstream sites. Regarding host assignment, the raw reads pipeline via network analysis identified 247 ARG hosts as compared to 53 hosts identified by assembly pipeline. Other comparisons between the pipelines were made including ARG assignment to taxa containing waterborne pathogens and practical considerations regarding processing time.
Collapse
Affiliation(s)
- A S Deshpande
- Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | - N L Fahrenfeld
- Civil and Environmental Engineering, Rutgers University, 500 Bartholomew Rd., Piscataway, NJ 08854, USA.
| |
Collapse
|
25
|
Goussarov G, Mysara M, Vandamme P, Van Houdt R. Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data. Microbiologyopen 2022; 11:e1298. [PMID: 35765182 PMCID: PMC9179125 DOI: 10.1002/mbo3.1298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/18/2022] Open
Abstract
The rise of metagenomics offers a leap forward for understanding the genetic diversity of microorganisms in many different complex environments by providing a platform that can identify potentially unlimited numbers of known and novel microorganisms. As such, it is impossible to imagine new major initiatives without metagenomics. Nevertheless, it represents a relatively new discipline with various levels of complexity and demands on bioinformatics. The underlying principles and methods used in metagenomics are often seen as common knowledge and often not detailed or fragmented. Therefore, we reviewed these to guide microbiologists in taking the first steps into metagenomics. We specifically focus on a workflow aimed at reconstructing individual genomes, that is, metagenome-assembled genomes, integrating DNA sequencing, assembly, binning, identification and annotation.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| |
Collapse
|
26
|
Van der Jeugt F, Dawyndt P, Mesuere B. FragGeneScanRs: faster gene prediction for short reads. BMC Bioinformatics 2022; 23:198. [PMID: 35643462 PMCID: PMC9148508 DOI: 10.1186/s12859-022-04736-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 05/17/2022] [Indexed: 11/16/2022] Open
Abstract
Background FragGeneScan is currently the most accurate and popular tool for gene prediction in short and error-prone reads, but its execution speed is insufficient for use on larger data sets. The parallelization which should have addressed this is inefficient. Its alternative implementation FragGeneScan+ is faster, but introduced a number of bugs related to memory management, race conditions and even output accuracy. Results This paper introduces FragGeneScanRs, a faster Rust implementation of the FragGeneScan gene prediction model. Its command line interface is backward compatible and adds extra features for more flexible usage. Its output is equivalent to the original FragGeneScan implementation. Conclusions Compared to the current C implementation, shotgun metagenomic reads are processed up to 22 times faster using a single thread, with better scaling for multithreaded execution. The Rust code of FragGeneScanRs is freely available from GitHub under the GPL-3.0 license with instructions for installation, usage and other documentation (https://github.com/unipept/FragGeneScanRs). Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04736-5.
Collapse
|
27
|
Castañeda S, Paniz-Mondolfi A, Ramírez JD. Detangling the Crosstalk Between Ascaris, Trichuris and Gut Microbiota: What´s Next? Front Cell Infect Microbiol 2022; 12:852900. [PMID: 35694539 PMCID: PMC9174645 DOI: 10.3389/fcimb.2022.852900] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 04/21/2022] [Indexed: 11/25/2022] Open
Abstract
Helminth infections remain a global public health issue, particularly in low- and middle-income countries, where roundworms from theTrichuris and Ascaris genera are most prevalent. These geohelminths not only impact human health but most importantly also affect animal well-being, in particular the swine industry. Host-helminth parasite interactions are complex and at the same time essential to understand the biology, dynamics and pathophysiology of these infections. Within these interactions, the immunomodulatory capacity of these helminths in the host has been extensively studied. Moreover, in recent years a growing interest on how helminths interact with the intestinal microbiota of the host has sparked, highlighting how this relationship plays an essential role in the establishment of initial infection, survival and persistence of the parasite, as well as in the development of chronic infections. Identifying the changes generated by these helminths on the composition and structure of the host intestinal microbiota constitutes a field of great scientific interest, since this can provide essential and actionable information for designing effective control and therapeutic strategies. Helminths like Trichuris and Ascaris are a focus of special importance due to their high prevalence, higher reinfection rates, resistance to anthelmintic therapy and unavailability of vaccines. Therefore, characterizing interactions between these helminths and the host intestinal microbiota represents an important approach to better understand the nature of this dynamic interface and explore novel therapeutic alternatives based on management of host microbiota. Given the extraordinary impact this may have from a biological, clinical, and epidemiological public health standpoint, this review aims to provide a comprehensive overview of current knowledge and future perspectives examining the parasite-microbiota interplay and its impact on host immunity.
Collapse
Affiliation(s)
- Sergio Castañeda
- Centro de Investigaciones en Microbiología y Biotecnología-UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Alberto Paniz-Mondolfi
- Molecular Microbiology Laboratory, Department of Pathology, Molecular and Cell-Based Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Juan David Ramírez
- Centro de Investigaciones en Microbiología y Biotecnología-UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
- Molecular Microbiology Laboratory, Department of Pathology, Molecular and Cell-Based Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- *Correspondence: Juan David Ramírez, ;
| |
Collapse
|
28
|
Yu KHO, Fang X, Yao H, Ng B, Leung TK, Wang LL, Lin CH, Chan ASW, Leung WK, Leung SY, Ho JWK. Evaluation of Experimental Protocols for Shotgun Whole-Genome Metagenomic Discovery of Antibiotic Resistance Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1313-1321. [PMID: 32750872 DOI: 10.1109/tcbb.2020.3004063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Shotgun metagenomics has enabled the discovery of antibiotic resistance genes (ARGs). Although there have been numerous studies benchmarking the bioinformatics methods for shotgun metagenomic data analysis, there has not yet been a study that systematically evaluates the performance of different experimental protocols on metagenomic species profiling and ARG detection. In this study, we generated 35 whole genome shotgun metagenomic sequencing data sets for five samples (three human stool and two microbial standard) using seven experimental protocols (KAPA or Flex kits at 50ng, 10ng, or 5ng input amounts; XT kit at 1ng input amount). Using this comprehensive resource, we evaluated the seven protocols in terms of robust detection of ARGs and microbial abundance estimation at various sequencing depths. We found that the data generated by the seven protocols are largely similar. The inter-protocol variability is significantly smaller than the variability between samples or sequencing depths. We found that a sequencing depth of more than 30M is suitable for human stool samples. A higher input amount (50ng) is generally favorable for the KAPA and Flex kits. This systematic benchmarking study sheds light on the impact of sequencing depth, experimental protocol, and DNA input amount on ARG detection in human stool samples.
Collapse
|
29
|
Altermann E, Tegetmeyer HE, Chanyi RM. The evolution of bacterial genome assemblies - where do we need to go next? MICROBIOME RESEARCH REPORTS 2022; 1:15. [PMID: 38046358 PMCID: PMC10688829 DOI: 10.20517/mrr.2022.02] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/08/2022] [Accepted: 03/24/2022] [Indexed: 12/05/2023]
Abstract
Genome sequencing has fundamentally changed our ability to decipher and understand the genetic blueprint of life and how it changes over time in response to environmental and evolutionary pressures. The pace of sequencing is still increasing in response to advances in technologies, paving the way from sequenced genes to genomes to metagenomes to metagenome-assembled genomes (MAGs). Our ability to interrogate increasingly complex microbial communities through metagenomes and MAGs is opening up a tantalizing future where we may be able to delve deeper into the mechanisms and genetic responses emerging over time. In the near future, we will be able to detect MAG assembly variations within strains originating from diverging sub-populations, and one of the emerging challenges will be to capture these variations in a biologically relevant way. Here, we present a brief overview of sequencing technologies and the current state of metagenome assemblies to suggest the need to develop new data formats that can capture the genetic variations within strains and communities, which previously remained invisible due to sequencing technology limitations.
Collapse
Affiliation(s)
- Eric Altermann
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
- Massey University, School of Veterinary Science, Palmerston North 4100, New Zealand
| | - Halina E. Tegetmeyer
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Center for Biotechnology, Bielefeld University, Universitaetsstrasse 27, Bielefeld 33615, Germany
| | - Ryan M. Chanyi
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
| |
Collapse
|
30
|
Vuong P, Moreira-Grez B, Wise MJ, Whiteley AS, Kumaresan D, Kaur P. From Rags to Enriched: Metagenomic Insights into Ammonia-oxidizing Archaea Following Ammonia Enrichment of a Denuded Oligotrophic Soil Ecosystem. Environ Microbiol 2022; 24:3097-3110. [PMID: 35384236 PMCID: PMC9545067 DOI: 10.1111/1462-2920.15994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/28/2022] [Indexed: 11/29/2022]
Abstract
Stored topsoil acts as a microbial inoculant for ecological restoration of land after disturbance, but the altered circumstances frequently create unfavorable conditions for microbial survival. Nitrogen cycling is a critical indicator for ecological success and this study aimed to investigate the cornerstone taxa driving the process. Previous in-silico studies investigating stored topsoil discovered persistent archaeal taxa with the potential for re-establishing ecological activity. Ammonia oxidization is the limiting step in nitrification and as such, ammonia oxidizing archaea (AOA) can be considered as the one of the gatekeepers for the re-establishment of the nitrogen cycle in disturbed soils. Semi-arid soil samples were enriched with ammonium sulfate to promote the selective enrichment of ammonia oxidizers for targeted genomic recovery, and to investigate the microbial response of the microcosm to nitrogen input. Ammonia addition produced an increase in AOA population, particularly within the genus Candidatus Nitrosotalea, from which metagenome-assembled genomes (MAGs) were successfully recovered. The Ca. Nitrosotalea archaeon candidates' ability to survive in extreme conditions and rapidly respond to ammonia input makes it a potential bioprospecting target for application in ecological restoration of semi-arid soils and the recovered MAGs provide a metabolic blueprint for developing potential strategies towards isolation of these acclimated candidates. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Paton Vuong
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
| | - Benjamin Moreira-Grez
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
| | - Michael J Wise
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, Australia.,The Marshall Centre of Infectious Diseases, School of Biological Sciences, The University of Western Australia, Perth, Australia
| | - Andrew S Whiteley
- Centre for Environment & Life Sciences, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Floreat, Australia
| | - Deepak Kumaresan
- School of Biological Sciences, Queen's University of Belfast, Belfast, UK
| | - Parwinder Kaur
- UWA School of Agriculture & Environment, University of Western Australia, Perth, Australia
| |
Collapse
|
31
|
Grealey J, Lannelongue L, Saw WY, Marten J, Méric G, Ruiz-Carmona S, Inouye M. THE CARBON FOOTPRINT OF BIOINFORMATICS. Mol Biol Evol 2022; 39:6526403. [PMID: 35143670 PMCID: PMC8892942 DOI: 10.1093/molbev/msac034] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
Collapse
Affiliation(s)
- Jason Grealey
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Mathematics and Statistics, La Trobe University, Melbourne, Australia
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Woei-Yuh Saw
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Jonathan Marten
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Australia
| | - Sergio Ruiz-Carmona
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.,British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.,The Alan Turing Institute, London, UK
| |
Collapse
|
32
|
Dai D, Brown C, Bürgmann H, Larsson DGJ, Nambi I, Zhang T, Flach CF, Pruden A, Vikesland PJ. Long-read metagenomic sequencing reveals shifts in associations of antibiotic resistance genes with mobile genetic elements from sewage to activated sludge. MICROBIOME 2022; 10:20. [PMID: 35093160 PMCID: PMC8801152 DOI: 10.1186/s40168-021-01216-5] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 12/13/2021] [Indexed: 05/10/2023]
Abstract
BACKGROUND There is concern that the microbially rich activated sludge environment of wastewater treatment plants (WWTPs) may contribute to the dissemination of antibiotic resistance genes (ARGs). We applied long-read (nanopore) sequencing to profile ARGs and their neighboring genes to illuminate their fate in the activated sludge treatment by comparing their abundance, genetic locations, mobility potential, and bacterial hosts within activated sludge relative to those in influent sewage across five WWTPs from three continents. RESULTS The abundances (gene copies per Gb of reads, aka gc/Gb) of all ARGs and those carried by putative pathogens decreased 75-90% from influent sewage (192-605 gc/Gb) to activated sludge (31-62 gc/Gb) at all five WWTPs. Long reads enabled quantification of the percent abundance of ARGs with mobility potential (i.e., located on plasmids or co-located with other mobile genetic elements (MGEs)). The abundance of plasmid-associated ARGs decreased at four of five WWTPs (from 40-73 to 31-68%), and ARGs co-located with transposable, integrative, and conjugative element hallmark genes showed similar trends. Most ARG-associated elements decreased 0.35-13.52% while integrative and transposable elements displayed slight increases at two WWTPs (1.4-2.4%). While resistome and taxonomic compositions both shifted significantly, host phyla for chromosomal ARG classes remained relatively consistent, indicating vertical gene transfer via active biomass growth in activated sludge as the key pathway of chromosomal ARG dissemination. CONCLUSIONS Overall, our results suggest that the activated sludge process acted as a barrier against the proliferation of most ARGs, while those that persisted or increased warrant further attention. Video abstract.
Collapse
Affiliation(s)
- Dongjuan Dai
- Department of Civil and Environmental Engineering, Virginia Polytechnic and State University, Blacksburg, VA, USA
| | - Connor Brown
- Department of Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic and State University, Blacksburg, VA, USA
| | - Helmut Bürgmann
- Eawag: Swiss Federal Institute of Aquatic Science and Technology, Kastanienbaum, Switzerland
| | - D G Joakim Larsson
- Institute of Biomedicine, Department of Infectious Diseases, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research (CARe), University of Gothenburg, Gothenburg, Sweden
| | - Indumathi Nambi
- Department of Civil Engineering, Indian Institute of Technology, Madras, India
| | - Tong Zhang
- Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Carl-Fredrik Flach
- Institute of Biomedicine, Department of Infectious Diseases, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research (CARe), University of Gothenburg, Gothenburg, Sweden
| | - Amy Pruden
- Department of Civil and Environmental Engineering, Virginia Polytechnic and State University, Blacksburg, VA, USA.
| | - Peter J Vikesland
- Department of Civil and Environmental Engineering, Virginia Polytechnic and State University, Blacksburg, VA, USA.
| |
Collapse
|
33
|
Fabiańska I, Borutzki S, Richter B, Tran HQ, Neubert A, Mayer D. LABRADOR-A Computational Workflow for Virus Detection in High-Throughput Sequencing Data. Viruses 2021; 13:v13122541. [PMID: 34960810 PMCID: PMC8704571 DOI: 10.3390/v13122541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/13/2021] [Accepted: 12/16/2021] [Indexed: 11/16/2022] Open
Abstract
High-throughput sequencing (HTS) allows detection of known and unknown viruses in samples of broad origin. This makes HTS a perfect technology to determine whether or not the biological products, such as vaccines are free from the adventitious agents, which could support or replace extensive testing using various in vitro and in vivo assays. Due to bioinformatics complexities, there is a need for standardized and reliable methods to manage HTS generated data in this field. Thus, we developed LABRADOR—an analysis pipeline for adventitious virus detection. The pipeline consists of several third-party programs and is divided into two major parts: (i) direct reads classification based on the comparison of characteristic profiles between reads and sequences deposited in the database supported with alignment of to the best matching reference sequence and (ii) de novo assembly of contigs and their classification on nucleotide and amino acid levels. To meet the requirements published in guidelines for biologicals’ safety we generated a custom nucleotide database with viral sequences. We tested our pipeline on publicly available HTS datasets and showed that LABRADOR can reliably detect viruses in mixtures of model viruses, vaccines and clinical samples.
Collapse
|
34
|
Behera BK, Dehury B, Rout AK, Patra B, Mantri N, Chakraborty HJ, Sarkar DJ, Kaushik NK, Bansal V, Singh I, Das BK, Rao AR, Rai A. Metagenomics study in aquatic resource management: Recent trends, applied methodologies and future needs. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101372] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
35
|
Fuentes-Trillo A, Monzó C, Manzano I, Santiso-Bellón C, Andrade JDSRD, Gozalbo-Rovira R, García-García AB, Rodríguez-Díaz J, Chaves FJ. Benchmarking different approaches for Norovirus genome assembly in metagenome samples. BMC Genomics 2021; 22:849. [PMID: 34819031 PMCID: PMC8611953 DOI: 10.1186/s12864-021-08067-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 10/10/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Genome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Furthermore, there is no standard method for obtaining whole-genome sequences in non-related patients. After polyA RNA isolation and sequencing in eight patients with acute gastroenteritis, we evaluated two de Bruijn graph assemblers (SPAdes and MEGAHIT), combined with four different and common pre-assembly strategies, and compared those yielding whole genome Norovirus contigs. RESULTS Reference-genome guided strategies with both host and target virus did not present any advantages compared to the assembly of non-filtered data in the case of SPAdes, and in the case of MEGAHIT, only host genome filtering presented improvements. MEGAHIT performed better than SPAdes in most samples, reaching complete genome sequences in most of them for all the strategies employed. Read binning with CD-HIT improved assembly when paired with different analysis strategies, and more notably in the case of SPAdes. CONCLUSIONS Not all metagenome assemblies are equal and the choice in the workflow depends on the species studied and the prior steps to analysis. We may need different approaches even for samples treated equally due to the presence of high intra host variability. We tested and compared different workflows for the accurate assembly of Norovirus genomes and established their assembly capacities for this purpose.
Collapse
Affiliation(s)
- Azahara Fuentes-Trillo
- Unit of Genomics and Diabetes. Research Foundation of Valencia University Clinical Hospital- INCLIVA, Valencia, Spain
| | - Carolina Monzó
- Unit of Genomics and Diabetes. Research Foundation of Valencia University Clinical Hospital- INCLIVA, Valencia, Spain
| | - Iris Manzano
- Unit of Genomics and Diabetes. Research Foundation of Valencia University Clinical Hospital- INCLIVA, Valencia, Spain
| | | | | | | | - Ana-Bárbara García-García
- Unit of Genomics and Diabetes. Research Foundation of Valencia University Clinical Hospital- INCLIVA, Valencia, Spain.
- Spanish Biomedical Research Network in Diabetes and Associated Metabolic Disorders (CIBERDEM), Madrid, Spain.
| | - Jesús Rodríguez-Díaz
- Department of Microbiology, School of Medicine, University of Valencia, Valencia, Spain
| | - Felipe Javier Chaves
- Unit of Genomics and Diabetes. Research Foundation of Valencia University Clinical Hospital- INCLIVA, Valencia, Spain
- Spanish Biomedical Research Network in Diabetes and Associated Metabolic Disorders (CIBERDEM), Madrid, Spain
- Sequencing Multiplex S.L., Valencia, Spain
| |
Collapse
|
36
|
Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, Zhang L. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J 2021; 19:6301-6314. [PMID: 34900140 PMCID: PMC8640167 DOI: 10.1016/j.csbj.2021.11.028] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 12/16/2022] Open
Abstract
Metagenomic sequencing provides a culture-independent avenue to investigate the complex microbial communities by constructing metagenome-assembled genomes (MAGs). A MAG represents a microbial genome by a group of sequences from genome assembly with similar characteristics. It enables us to identify novel species and understand their potential functions in a dynamic ecosystem. Many computational tools have been developed to construct and annotate MAGs from metagenomic sequencing, however, there is a prominent gap to comprehensively introduce their background and practical performance. In this paper, we have thoroughly investigated the computational tools designed for both upstream and downstream analyses, including metagenome assembly, metagenome binning, gene prediction, functional annotation, taxonomic classification, and profiling. We have categorized the commonly used tools into unique groups based on their functional background and introduced the underlying core algorithms and associated information to demonstrate a comparative outlook. Furthermore, we have emphasized the computational requisition and offered guidance to the users to select the most efficient tools. Finally, we have indicated current limitations, potential solutions, and future perspectives for further improving the tools of MAG construction and annotation. We believe that our work provides a consolidated resource for the current stage of MAG studies and shed light on the future development of more effective MAG analysis tools on metagenomic sequencing.
Collapse
Key Words
- CNN, convolutional neural network
- DBG, De Bruijn graph
- GTDB, Genome Taxonomy Database
- Gene functional annotation
- Gene prediction
- Genome assembly
- HMM, Hidden Markov Model
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- LCA, lowest common ancestor
- LPA, label propagation algorithm
- MAGs, metagenome-assembled genomes
- Metagenome binning
- Metagenome-assembled genomes
- Metagenomic sequencing
- Microbial abundance profiling
- OLC, overlap-layout consensus
- ONT, Oxford Nanopore Technologies
- ORFs, open reading frames
- PacBio, Pacific Biosciences
- QC, quality control
- SLR, synthetic long reads
- TNFs, tetranucleotide frequencies
- Taxonomic classification
Collapse
Affiliation(s)
- Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Debajyoti Chowdhury
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - William K. Cheung
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Aiping Lu
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Institute of Integrated Bioinformedicine and Translational Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Zhaoxiang Bian
- Institute of Brain and Gut Research, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Chinese Medicine Clinical Study Center, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong Special Administrative Region
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong Special Administrative Region
- Computational Medicine Lab, Hong Kong Baptist University, Hong Kong Special Administrative Region
| |
Collapse
|
37
|
Abstract
Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities.
Collapse
|
38
|
Mäklin T, Kallonen T, Alanko J, Samuelsen Ø, Hegstad K, Mäkinen V, Corander J, Heinz E, Honkela A. Bacterial genomic epidemiology with mixed samples. Microb Genom 2021; 7:000691. [PMID: 34779765 PMCID: PMC8743562 DOI: 10.1099/mgen.0.000691] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 09/13/2021] [Indexed: 11/18/2022] Open
Abstract
Genomic epidemiology is a tool for tracing transmission of pathogens based on whole-genome sequencing. We introduce the mGEMS pipeline for genomic epidemiology with plate sweeps representing mixed samples of a target pathogen, opening the possibility to sequence all colonies on selective plates with a single DNA extraction and sequencing step. The pipeline includes the novel mGEMS read binner for probabilistic assignments of sequencing reads, and the scalable pseudoaligner Themisto. We demonstrate the effectiveness of our approach using closely related samples in a nosocomial setting, obtaining results that are comparable to those based on single-colony picks. Our results lend firm support to more widespread consideration of genomic epidemiology with mixed infection samples.
Collapse
Affiliation(s)
- Tommi Mäklin
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Teemu Kallonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Jarno Alanko
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Ørjan Samuelsen
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
- Department of Pharmacy, UT The Arctic University of Norway, Tromsø, Norway
| | - Kristin Hegstad
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
- Research group for Host-Microbe Interactions, Department of Medical Biology, Faculty of Health Sciences, UT The Arctic University of Norway, Tromsø, Norway
| | - Veli Mäkinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Eva Heinz
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Liverpool School of Tropical Medicine, Liverpool, UK
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
39
|
DeWeese KJ, Osborne MG. Understanding the metabolome and metagenome as extended phenotypes: The next frontier in macroalgae domestication and improvement. JOURNAL OF THE WORLD AQUACULTURE SOCIETY 2021; 52:1009-1030. [PMID: 34732977 PMCID: PMC8562568 DOI: 10.1111/jwas.12782] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 02/25/2021] [Indexed: 06/01/2023]
Abstract
"Omics" techniques (including genomics, transcriptomics, metabolomics, proteomics, and metagenomics) have been employed with huge success in the improvement of agricultural crops. As marine aquaculture of macroalgae expands globally, biologists are working to domesticate species of macroalgae by applying these techniques tested in agriculture to wild macroalgae species. Metabolomics has revealed metabolites and pathways that influence agriculturally relevant traits in crops, allowing for informed crop crossing schemes and genomic improvement strategies that would be pivotal to inform selection on macroalgae for domestication. Advances in metagenomics have improved understanding of host-symbiont interactions and the potential for microbial organisms to improve crop outcomes. There is much room in the field of macroalgal biology for further research toward improvement of macroalgae cultivars in aquaculture using metabolomic and metagenomic analyses. To this end, this review discusses the application and necessary expansion of the omics tool kit for macroalgae domestication as we move to enhance seaweed farming worldwide.
Collapse
Affiliation(s)
- Kelly J DeWeese
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, California, Los Angeles
| | - Melisa G Osborne
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, California, Los Angeles
| |
Collapse
|
40
|
Casar CP, Momper LM, Kruger BR, Osburn MR. Iron-Fueled Life in the Continental Subsurface: Deep Mine Microbial Observatory, South Dakota, USA. Appl Environ Microbiol 2021; 87:e0083221. [PMID: 34378953 PMCID: PMC8478452 DOI: 10.1128/aem.00832-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/29/2021] [Indexed: 11/20/2022] Open
Abstract
Iron-bearing minerals are key components of the Earth's crust and potentially critical energy sources for subsurface microbial life. The Deep Mine Microbial Observatory (DeMMO) is situated in a range of iron-rich lithologies, and fracture fluids here reach concentrations as high as 8.84 mg/liter. Iron cycling is likely an important process, given the high concentrations of iron in fracture fluids and detection of putative iron-cycling taxa via marker gene surveys. However, a previous metagenomic survey detected no iron cycling potential at two DeMMO localities. Here, we revisited the potential for iron cycling at DeMMO using a new metagenomic data set including all DeMMO sites and FeGenie, a new annotation pipeline that is optimized for the detection of iron cycling genes. We annotated functional genes from whole metagenomic assemblies and metagenome-assembled genomes and characterized putative iron cycling pathways and taxa in the context of local geochemical conditions and available metabolic energy estimated from thermodynamic models. We reannotated previous metagenomic data, revealing iron cycling potential that was previously missed. Across both metagenomic data sets, we found that not only is there genetic potential for iron cycling at DeMMO, but also, iron is likely an important source of energy across the system. In response to the dramatic differences we observed between annotation approaches, we recommend the use of optimized pipelines where the detection of iron cycling genes is a major goal. IMPORTANCE We investigated iron cycling potential among microbial communities inhabiting iron-rich fracture fluids to a depth of 1.5 km in the continental crust. A previous study found no iron cycling potential in the communities despite the iron-rich nature of the system. A new tool for detecting iron cycling genes was recently published, which we used on a new data set. We combined this with a number of other approaches to get a holistic view of metabolic strategies across the communities, revealing iron cycling to be an important process here. In addition, we used the tool on the data from the previous study, revealing previously missed iron cycling potential. Iron is common in continental crust; thus, our findings are likely not unique to our study site. Our new view of important metabolic strategies underscores the importance of choosing optimized tools for detecting the potential for metabolisms like iron cycling that may otherwise be missed.
Collapse
Affiliation(s)
- C. P. Casar
- Department of Earth and Planetary Sciences, Northwestern University, Evanston, Illinois, USA
| | - L. M. Momper
- Earth and Environmental Sciences Practice, Exponent, Inc., Pasadena, California, USA
| | - B. R. Kruger
- Division of Hydrologic Sciences, Desert Research Institute, Las Vegas, Nevada, USA
| | - M. R. Osburn
- Department of Earth and Planetary Sciences, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
41
|
Kayani MUR, Huang W, Feng R, Chen L. Genome-resolved metagenomics using environmental and clinical samples. Brief Bioinform 2021; 22:bbab030. [PMID: 33758906 PMCID: PMC8425419 DOI: 10.1093/bib/bbab030] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 12/25/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.
Collapse
Affiliation(s)
- Masood ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Wanqiu Huang
- Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200,000, China
| | - Ru Feng
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| |
Collapse
|
42
|
Abstract
Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics.
Collapse
|
43
|
Nethery MA, Korvink M, Makarova KS, Wolf YI, Koonin EV, Barrangou R. CRISPRclassify: Repeat-Based Classification of CRISPR Loci. CRISPR J 2021; 4:558-574. [PMID: 34406047 DOI: 10.1089/crispr.2021.0021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Detection and classification of CRISPR-Cas systems in metagenomic data have become increasingly prevalent in recent years due to their potential for diverse applications in genome editing. Traditionally, CRISPR-Cas systems are classified through reference-based identification of proximate cas genes. Here, we present a machine learning approach for the detection and classification of CRISPR loci using repeat sequences in a cas-independent context, enabling identification of unclassified loci missed by traditional cas-based approaches. Using biological attributes of the CRISPR repeat, the core element in CRISPR arrays, and leveraging methods from natural language processing, we developed a machine learning model capable of accurate classification of CRISPR loci in an extensive set of metagenomes, resulting in an F1 measure of 0.82 across all predictions and an F1 measure of 0.97 when limiting to classifications with probabilities >0.85. Furthermore, assessing performance on novel repeats yielded an F1 measure of 0.96. Although the performance of cas-based identification will exceed that of a repeat-based approach in many cases, CRISPRclassify provides an efficient approach to classification of CRISPR loci for cases in which cas gene information is unavailable, such as metagenomes and fragmented genome assemblies.
Collapse
Affiliation(s)
- Matthew A Nethery
- Genomic Sciences Graduate Program, North Carolina State University, Raleigh, North Carolina, USA; National Library of Medicine, Bethesda, Maryland, USA
| | - Michael Korvink
- ITS Data Science, Premier Inc., Charlotte, North Carolina, USA; and National Library of Medicine, Bethesda, Maryland, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Rodolphe Barrangou
- Genomic Sciences Graduate Program, North Carolina State University, Raleigh, North Carolina, USA; National Library of Medicine, Bethesda, Maryland, USA
| |
Collapse
|
44
|
Stewart AG, Satlin MJ, Schlebusch S, Isler B, Forde BM, Paterson DL, Harris PNA. Completing the Picture-Capturing the Resistome in Antibiotic Clinical Trials. Clin Infect Dis 2021; 72:e1122-e1129. [PMID: 33354717 DOI: 10.1093/cid/ciaa1877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Indexed: 11/12/2022] Open
Abstract
Despite the accepted dogma that antibiotic use is the largest contributor to antimicrobial resistance (AMR) and human microbiome disruption, our knowledge of specific antibiotic-microbiome effects remains basic. Detection of associations between new or old antimicrobials and specific AMR burden is patchy and heterogeneous. Various microbiome analysis tools are available to determine antibiotic effects on microbial communities in vivo. Microbiome analysis of treatment groups in antibiotic clinical trials, powered to measure clinically meaningful endpoints would greatly assist the antibiotic development pipeline and clinician antibiotic decision making.
Collapse
Affiliation(s)
- Adam G Stewart
- Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Royal Brisbane and Women's Hospital Campus, Brisbane, Australia.,Department of Infectious Diseases, Royal Brisbane and Women's Hospital, Brisbane, Australia
| | - Michael J Satlin
- Department of Medicine, Division of Infectious Diseases, Weill Cornell Medicine, New York, New York, USA
| | - Sanmarié Schlebusch
- Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Royal Brisbane and Women's Hospital Campus, Brisbane, Australia.,Department of Microbiology, Pathology Queensland, Royal Brisbane and Women's Hospital, Brisbane, Australia.,Forensic and Scientific Services, Health Support Queensland, Queensland Health, Brisbane, Australia
| | - Burcu Isler
- Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Royal Brisbane and Women's Hospital Campus, Brisbane, Australia
| | - Brian M Forde
- School of Chemistry and Molecular Biosciences, The University of Queensland, Queensland, Australia.,Australian Infectious Diseases Research Centre, The University of Queensland, Queensland, Australia
| | - David L Paterson
- Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Royal Brisbane and Women's Hospital Campus, Brisbane, Australia.,Department of Infectious Diseases, Royal Brisbane and Women's Hospital, Brisbane, Australia.,Australian Infectious Diseases Research Centre, The University of Queensland, Queensland, Australia
| | - Patrick N A Harris
- Centre for Clinical Research, Faculty of Medicine, The University of Queensland, Royal Brisbane and Women's Hospital Campus, Brisbane, Australia.,Department of Microbiology, Pathology Queensland, Royal Brisbane and Women's Hospital, Brisbane, Australia.,Australian Infectious Diseases Research Centre, The University of Queensland, Queensland, Australia
| |
Collapse
|
45
|
Alam I, Kamau AA, Ngugi DK, Gojobori T, Duarte CM, Bajic VB. KAUST Metagenomic Analysis Platform (KMAP), enabling access to massive analytics of re-annotated metagenomic data. Sci Rep 2021; 11:11511. [PMID: 34075103 PMCID: PMC8169707 DOI: 10.1038/s41598-021-90799-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 05/18/2021] [Indexed: 11/09/2022] Open
Abstract
Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data. However, this also generates a procedural bottleneck for on-going re-analysis as reference databases grow and methods improve, and analyses need be updated for consistency, which require acceess to increasingly demanding bioinformatic and computational resources. Here, we present the KAUST Metagenomic Analysis Platform (KMAP), a new integrated open web-based tool for the comprehensive exploration of shotgun metagenomic data. We illustrate the capacities KMAP provides through the re-assembly of ~ 27,000 public metagenomic samples captured in ~ 450 studies sampled across ~ 77 diverse habitats. A small subset of these metagenomic assemblies is used in this pilot study grouped into 36 new habitat-specific gene catalogs, all based on full-length (complete) genes. Extensive taxonomic and gene annotations are stored in Gene Information Tables (GITs), a simple tractable data integration format useful for analysis through command line or for database management. KMAP pilot study provides the exploration and comparison of microbial GITs across different habitats with over 275 million genes. KMAP access to data and analyses is available at https://www.cbrc.kaust.edu.sa/aamg/kmap.start .
Collapse
Affiliation(s)
- Intikhab Alam
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
| | - Allan Anthony Kamau
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - David Kamanda Ngugi
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Brunswick, Germany
| | - Takashi Gojobori
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Carlos M Duarte
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.,Red Sea Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| |
Collapse
|
46
|
Lomsadze A, Bonny C, Strozzi F, Borodovsky M. GeneMark-HM: improving gene prediction in DNA sequences of human microbiome. NAR Genom Bioinform 2021; 3:lqab047. [PMID: 34056597 PMCID: PMC8153819 DOI: 10.1093/nargab/lqab047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 04/27/2021] [Accepted: 05/24/2021] [Indexed: 11/14/2022] Open
Abstract
Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools.
Collapse
Affiliation(s)
| | | | | | - Mark Borodovsky
- Gene Probe, Inc., 1106 Wrights Mill Ct, Atlanta, GA 30324, USA
| |
Collapse
|
47
|
Nearing JT, Comeau AM, Langille MGI. Identifying biases and their potential solutions in human microbiome studies. MICROBIOME 2021; 9:113. [PMID: 34006335 PMCID: PMC8132403 DOI: 10.1186/s40168-021-01059-0] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 03/24/2021] [Indexed: 05/13/2023]
Abstract
Advances in DNA sequencing technology have vastly improved the ability of researchers to explore the microbial inhabitants of the human body. Unfortunately, while these studies have uncovered the importance of these microbial communities to our health, they often do not result in similar findings. One possible reason for the disagreement in these results is due to the multitude of systemic biases that are introduced during sequence-based microbiome studies. These biases begin with sample collection and continue to be introduced throughout the entire experiment leading to an observed community that is significantly altered from the true underlying microbial composition. In this review, we will highlight the various steps in typical sequence-based human microbiome studies where significant bias can be introduced, and we will review the current efforts within the field that aim to reduce the impact of these biases. Video abstract.
Collapse
Affiliation(s)
- Jacob T Nearing
- Department of Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - André M Comeau
- Integrated Microbiome Resource, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Morgan G I Langille
- Integrated Microbiome Resource, Dalhousie University, Halifax, Nova Scotia, Canada.
- Department of Pharmacology, Dalhousie University, Halifax, Nova Scotia, Canada.
| |
Collapse
|
48
|
Lui LM, Nielsen TN, Arkin AP. A method for achieving complete microbial genomes and improving bins from metagenomics data. PLoS Comput Biol 2021; 17:e1008972. [PMID: 33961626 PMCID: PMC8172020 DOI: 10.1371/journal.pcbi.1008972] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 06/02/2021] [Accepted: 04/16/2021] [Indexed: 11/19/2022] Open
Abstract
Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app.
Collapse
Affiliation(s)
- Lauren M. Lui
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Torben N. Nielsen
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Adam P. Arkin
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
- Department of Bioengineering, University of California, Berkeley, California, United States of America
- Innovative Genomics Institute, Berkeley, CA, United States of America
| |
Collapse
|
49
|
Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat Protoc 2021; 16:2520-2541. [PMID: 33864056 DOI: 10.1038/s41596-021-00508-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 01/12/2021] [Indexed: 02/02/2023]
Abstract
Recovering genomes from shotgun metagenomic sequence data allows detailed taxonomic and functional characterization of individual species or strains in a microbial community. Retrieving these metagenome-assembled genomes (MAGs) involves seven stages. First, low-quality bases, along with adapter and host sequences, are removed. Second, overlapping sequences are assembled to create longer contiguous fragments. Third, these fragments are clustered based on sequence composition and abundance. Fourth, these sequence clusters, or bins, undergo rounds of quality assessment and refinement to yield MAGs. The optional fifth stage is dereplication of MAGs to select representatives. Next, each MAG is taxonomically classified. The optional seventh stage is assessing the fraction of diversity that has been recovered. The output of this protocol is draft genomes, which can provide invaluable clues about uncultured organisms. This protocol takes ~1 week to run, depending on computational resources available, and requires prior experience with high-performance computing, shell script programming and Python.
Collapse
|
50
|
Werbin ZR, Hackos B, Lopez-Nava J, Dietze MC, Bhatnagar JM. The National Ecological Observatory Network's soil metagenomes: assembly and basic analysis. F1000Res 2021; 10:299. [PMID: 35707452 PMCID: PMC9178279 DOI: 10.12688/f1000research.51494.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/08/2022] [Indexed: 11/20/2022] Open
Abstract
The largest dataset of soil metagenomes has recently been released by the National Ecological Observatory Network (NEON), which performs annual shotgun sequencing of soils at 47 sites across the United States. NEON serves as a valuable educational resource, thanks to its open data and programming tutorials, but there is currently no introductory tutorial for accessing and analyzing the soil shotgun metagenomic dataset. Here, we describe methods for processing raw soil metagenome sequencing reads using a bioinformatics pipeline tailored to the high complexity and diversity of the soil microbiome. We describe the rationale, necessary resources, and implementation of steps such as cleaning raw reads, taxonomic classification, assembly into contigs or genomes, annotation of predicted genes using custom protein databases, and exporting data for downstream analysis. The workflow presented here aims to increase the accessibility of NEON's shotgun metagenome data, which can provide important clues about soil microbial communities and their ecological roles.
Collapse
Affiliation(s)
- Zoey R. Werbin
- Department of Biology, Boston University, Boston, MA, 02215, USA
| | - Briana Hackos
- Department of Mathematics, University of Colorado, Boulder, Boulder, CO, 80309, USA
| | - Jorge Lopez-Nava
- Department of Mathematics, Swarthmore College, Swarthmore, PA 19081, USA
| | - Michael C. Dietze
- Department of Earth & Environment, Boston University, Boston, MA, 02215, USA
| | | |
Collapse
|