1
|
Lumian J, Sumner DY, Grettenberger CL, Jungblut AD, Irber L, Pierce-Ward NT, Brown CT. Biogeographic distribution of five Antarctic cyanobacteria using large-scale k-mer searching with sourmash branchwater. Front Microbiol 2024; 15:1328083. [PMID: 38440141 PMCID: PMC10909832 DOI: 10.3389/fmicb.2024.1328083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 02/06/2024] [Indexed: 03/06/2024] Open
Abstract
Cyanobacteria form diverse communities and are important primary producers in Antarctic freshwater environments, but their geographic distribution patterns in Antarctica and globally are still unresolved. There are however few genomes of cultured cyanobacteria from Antarctica available and therefore metagenome-assembled genomes (MAGs) from Antarctic cyanobacteria microbial mats provide an opportunity to explore distribution of uncultured taxa. These MAGs also allow comparison with metagenomes of cyanobacteria enriched communities from a range of habitats, geographic locations, and climates. However, most MAGs do not contain 16S rRNA gene sequences, making a 16S rRNA gene-based biogeography comparison difficult. An alternative technique is to use large-scale k-mer searching to find genomes of interest in public metagenomes. This paper presents the results of k-mer based searches for 5 Antarctic cyanobacteria MAGs from Lake Fryxell and Lake Vanda, assigned the names Phormidium pseudopriestleyi FRX01, Microcoleus sp. MP8IB2.171, Leptolyngbya sp. BulkMat.35, Pseudanabaenaceae cyanobacterium MP8IB2.15, and Leptolyngbyaceae cyanobacterium MP9P1.79 in 498,942 unassembled metagenomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The Microcoleus sp. MP8IB2.171 MAG was found in a wide variety of environments, the P. pseudopriestleyi MAG was found in environments with challenging conditions, the Leptolyngbyaceae cyanobacterium MP9P1.79 MAG was only found in Antarctica, and the Leptolyngbya sp. BulkMat.35 and Pseudanabaenaceae cyanobacterium MP8IB2.15 MAGs were found in Antarctic and other cold environments. The findings based on metagenome matches and global comparisons suggest that these Antarctic cyanobacteria have distinct distribution patterns ranging from locally restricted to global distribution across the cold biosphere and other climatic zones.
Collapse
Affiliation(s)
- Jessica Lumian
- Department of Earth and Planetary Sciences, Microbiology Graduate Group, University of California Davis, Davis, CA, United States
| | - Dawn Y. Sumner
- Department of Earth and Planetary Sciences, University of California Davis, Davis, CA, United States
| | - Christen L. Grettenberger
- Department of Earth and Planetary Sciences, University of California Davis, Davis, CA, United States
- Department of Environmental Toxicology, University of California Davis, Davis, CA, United States
| | - Anne D. Jungblut
- Department of Science, The Natural History Museum, London, United Kingdom
| | - Luiz Irber
- Population Health and Reproduction, University of California Davis, Davis, CA, United States
| | - N. Tessa Pierce-Ward
- Population Health and Reproduction, University of California Davis, Davis, CA, United States
| | - C. Titus Brown
- Population Health and Reproduction, University of California Davis, Davis, CA, United States
| |
Collapse
|
2
|
Puente-Sánchez F, Hoetzinger M, Buck M, Bertilsson S. Exploring environmental intra-species diversity through non-redundant pangenome assemblies. Mol Ecol Resour 2023; 23:1724-1736. [PMID: 37382302 DOI: 10.1111/1755-0998.13826] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 05/24/2023] [Accepted: 06/15/2023] [Indexed: 06/30/2023]
Abstract
At the genome level, microorganisms are highly adaptable both in terms of allele and gene composition. Such heritable traits emerge in response to different environmental niches and can have a profound influence on microbial community dynamics. As a consequence, any individual genome or population will contain merely a fraction of the total genetic diversity of any operationally defined "species", whose ecological potential can thus be only fully understood by studying all of their genomes and the genes therein. This concept, known as the pangenome, is valuable for studying microbial ecology and evolution, as it partitions genomes into core (present in all the genomes from a species, and responsible for housekeeping and species-level niche adaptation among others) and accessory regions (present only in some, and responsible for intra-species differentiation). Here we present SuperPang, an algorithm producing pangenome assemblies from a set of input genomes of varying quality, including metagenome-assembled genomes (MAGs). SuperPang runs in linear time and its results are complete, non-redundant, preserve gene ordering and contain both coding and non-coding regions. Our approach provides a modular view of the pangenome, identifying operons and genomic islands, and allowing to track their prevalence in different populations. We illustrate this by analysing intra-species diversity in Polynucleobacter, a bacterial genus ubiquitous in freshwater ecosystems, characterized by their streamlined genomes and their ecological versatility. We show how SuperPang facilitates the simultaneous analysis of allelic and gene content variation under different environmental pressures, allowing us to study the drivers of microbial diversification at unprecedented resolution.
Collapse
Affiliation(s)
- Fernando Puente-Sánchez
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Matthias Hoetzinger
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Moritz Buck
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Stefan Bertilsson
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
3
|
van Dijk B, Buffard P, Farr AD, Giersdorf F, Meijer J, Dutilh BE, Rainey PB. Identifying and tracking mobile elements in evolving compost communities yields insights into the nanobiome. ISME COMMUNICATIONS 2023; 3:90. [PMID: 37640834 PMCID: PMC10462680 DOI: 10.1038/s43705-023-00294-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 08/02/2023] [Accepted: 08/08/2023] [Indexed: 08/31/2023]
Abstract
Microbial evolution is driven by rapid changes in gene content mediated by horizontal gene transfer (HGT). While mobile genetic elements (MGEs) are important drivers of gene flux, the nanobiome-the zoo of Darwinian replicators that depend on microbial hosts-remains poorly characterised. New approaches are necessary to increase our understanding beyond MGEs shaping individual populations, towards their impacts on complex microbial communities. A bioinformatic pipeline (xenoseq) was developed to cross-compare metagenomic samples from microbial consortia evolving in parallel, aimed at identifying MGE dissemination, which was applied to compost communities which underwent periodic mixing of MGEs. We show that xenoseq can distinguish movement of MGEs from demographic changes in community composition that otherwise confounds identification, and furthermore demonstrate the discovery of various unexpected entities. Of particular interest was a nanobacterium of the candidate phylum radiation (CPR) which is closely related to a species identified in groundwater ecosystems (Candidatus Saccharibacterium), and appears to have a parasitic lifestyle. We also highlight another prolific mobile element, a 313 kb plasmid hosted by a Cellvibrio lineage. The host was predicted to be capable of nitrogen fixation, and acquisition of the plasmid coincides with increased ammonia production. Taken together, our data show that new experimental strategies combined with bioinformatic analyses of metagenomic data stand to provide insight into the nanobiome as a driver of microbial community evolution.
Collapse
Affiliation(s)
- Bram van Dijk
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany.
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Utrecht, the Netherlands.
| | - Pauline Buffard
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Andrew D Farr
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Franz Giersdorf
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Jeroen Meijer
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Utrecht, the Netherlands
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Utrecht, the Netherlands
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
| | - Paul B Rainey
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany.
- Laboratory of Biophysics and Evolution, CBI, ESPCI Paris, Université PSL CNRS, Paris, France.
| |
Collapse
|
4
|
Martin S, Ayling M, Patrono L, Caccamo M, Murcia P, Leggett RM. Capturing variation in metagenomic assembly graphs with MetaCortex. Bioinformatics 2023; 39:6986127. [PMID: 36722204 PMCID: PMC9889960 DOI: 10.1093/bioinformatics/btad020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 11/10/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION The assembly of contiguous sequence from metagenomic samples presents a particular challenge, due to the presence of multiple species, often closely related, at varying levels of abundance. Capturing diversity within species, for example, viral haplotypes, or bacterial strain-level diversity, is even more challenging. RESULTS We present MetaCortex, a metagenome assembler that captures intra-species diversity by searching for signatures of local variation along assembled sequences in the underlying assembly graph and outputting these sequences in sequence graph format. We show that MetaCortex produces accurate assemblies with higher genome coverage and contiguity than other popular metagenomic assemblers on mock viral communities with high levels of strain-level diversity and on simulated communities containing simulated strains. AVAILABILITY AND IMPLEMENTATION Source code is freely available to download from https://github.com/SR-Martin/metacortex, is implemented in C and supported on MacOS and Linux. The version used for the results presented in this article is available at doi.org/10.5281/zenodo.7273627. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | - Pablo Murcia
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
| | | |
Collapse
|
5
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
6
|
Roach MJ, Pierce-Ward NT, Suchecki R, Mallawaarachchi V, Papudeshi B, Handley SA, Brown CT, Watson-Haigh NS, Edwards RA. Ten simple rules and a template for creating workflows-as-applications. PLoS Comput Biol 2022; 18:e1010705. [PMID: 36520686 PMCID: PMC9754251 DOI: 10.1371/journal.pcbi.1010705] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Michael J. Roach
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
- * E-mail:
| | - N. Tessa Pierce-Ward
- Department of Population Health and Reproduction, University of California, Davis, California, United States of America
| | | | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
| | - Scott A. Handley
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - C. Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, California, United States of America
| | | | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
7
|
Khan J, Kokot M, Deorowicz S, Patro R. Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2. Genome Biol 2022; 23:190. [PMID: 36076275 PMCID: PMC9454175 DOI: 10.1186/s13059-022-02743-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17-23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54-58 h, using considerably more memory.
Collapse
Affiliation(s)
- Jamshed Khan
- Department of Computer Science, University of Maryland, College Park, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| | - Marek Kokot
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Sebastian Deorowicz
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| |
Collapse
|
8
|
Haryono MAS, Law YY, Arumugam K, Liew LCW, Nguyen TQN, Drautz-Moses DI, Schuster SC, Wuertz S, Williams RBH. Recovery of High Quality Metagenome-Assembled Genomes From Full-Scale Activated Sludge Microbial Communities in a Tropical Climate Using Longitudinal Metagenome Sampling. Front Microbiol 2022; 13:869135. [PMID: 35756038 PMCID: PMC9230771 DOI: 10.3389/fmicb.2022.869135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
The analysis of metagenome data based on the recovery of draft genomes (so called metagenome-assembled genomes, or MAG) has assumed an increasingly central role in microbiome research in recent years. Microbial communities underpinning the operation of wastewater treatment plants are particularly challenging targets for MAG analysis due to their high ecological complexity, and remain important, albeit understudied, microbial communities that play ssa key role in mediating interactions between human and natural ecosystems. Here we consider strategies for recovery of MAG sequence from time series metagenome surveys of full-scale activated sludge microbial communities. We generate MAG catalogs from this set of data using several different strategies, including the use of multiple individual sample assemblies, two variations on multi-sample co-assembly and a recently published MAG recovery workflow using deep learning. We obtain a total of just under 9,100 draft genomes, which collapse to around 3,100 non-redundant genomic clusters. We examine the strengths and weaknesses of these approaches in relation to MAG yield and quality, showing that co-assembly may offer advantages over single-sample assembly in the case of metagenome data obtained from closely sampled longitudinal study designs. Around 1,000 MAGs were candidates for being considered high quality, based on single-copy marker gene occurrence statistics, however only 58 MAG formally meet the MIMAG criteria for being high quality draft genomes. These findings carry broader broader implications for performing genome-resolved metagenomics on highly complex communities, the design and implementation of genome recoverability strategies, MAG decontamination and the search for better binning methodology.
Collapse
Affiliation(s)
- Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| | - Ying Yu Law
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Krithika Arumugam
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Larry C-W Liew
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Thi Quynh Ngoc Nguyen
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Daniela I Drautz-Moses
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Stephan C Schuster
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Stefan Wuertz
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| |
Collapse
|
9
|
Pan S, Zhu C, Zhao XM, Coelho LP. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat Commun 2022; 13:2326. [PMID: 35484115 PMCID: PMC9051138 DOI: 10.1038/s41467-022-29843-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 03/31/2022] [Indexed: 12/14/2022] Open
Abstract
Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.
Collapse
Affiliation(s)
- Shaojun Pan
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
| | - Chengkai Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China
- School of Life Sciences, Fudan University, Shanghai, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China.
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China.
| |
Collapse
|
10
|
Vanni C, Schechter MS, Acinas SG, Barberán A, Buttigieg PL, Casamayor EO, Delmont TO, Duarte CM, Eren AM, Finn RD, Kottmann R, Mitchell A, Sánchez P, Siren K, Steinegger M, Gloeckner FO, Fernàndez-Guerra A. Unifying the known and unknown microbial coding sequence space. eLife 2022; 11:e67667. [PMID: 35356891 PMCID: PMC9132574 DOI: 10.7554/elife.67667] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/30/2022] [Indexed: 12/02/2022] Open
Abstract
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
Collapse
Affiliation(s)
- Chiara Vanni
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
- Jacobs University BremenBremenGermany
| | - Matthew S Schechter
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
- Department of Medicine, University of ChicagoChicagoUnited States
| | - Silvia G Acinas
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
| | - Albert Barberán
- Department of Environmental Science, University of ArizonaTucsonUnited States
| | - Pier Luigi Buttigieg
- Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Alfred Wegener InstituteBremerhavenGermany
| | - Emilio O Casamayor
- Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for ResearchBlanesSpain
| | - Tom O Delmont
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-SaclayEvryFrance
| | - Carlos M Duarte
- Red Sea Research Centre and Computational Bioscience Research Center, King Abdullah University of Science and TechnologyThuwalSaudi Arabia
| | - A Murat Eren
- Department of Medicine, University of ChicagoChicagoUnited States
- Josephine Bay Paul Center, Marine Biological LaboratoryWoods HoleUnited States
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
| | - Renzo Kottmann
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
| | - Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome CampusHinxtonUnited Kingdom
| | - Pablo Sánchez
- Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)BarcelonaSpain
| | - Kimmo Siren
- Section for Evolutionary Genomics, The GLOBE Institute, University of CopenhagenCopenhagenDenmark
| | - Martin Steinegger
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Institute of Molecular Biology and Genetics, Seoul National UniversitySeoulRepublic of Korea
| | - Frank Oliver Gloeckner
- Jacobs University BremenBremenGermany
- University of Bremen and Life Sciences and ChemistryBremenGermany
- Computing Center, Helmholtz Center for Polar and Marine ResearchBremerhavenGermany
| | - Antonio Fernàndez-Guerra
- Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine MicrobiologyBremenGermany
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of CopenhagenCopenhagenDenmark
| |
Collapse
|
11
|
Balaji A, Sapoval N, Seto C, Leo Elworth R, Fu Y, Nute MG, Savidge T, Segarra S, Treangen TJ. KOMB: K-core based de novo characterization of copy number variation in microbiomes. Comput Struct Biotechnol J 2022; 20:3208-3222. [PMID: 35832621 PMCID: PMC9249589 DOI: 10.1016/j.csbj.2022.06.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/29/2022] Open
Abstract
Characterizing metagenomes via kmer-based, database-dependent taxonomic classification has yielded key insights into underlying microbiome dynamics. However, novel approaches are needed to track community dynamics and genomic flux within metagenomes, particularly in response to perturbations. We describe KOMB, a novel method for tracking genome level dynamics within microbiomes. KOMB utilizes K-core decomposition to identify Structural variations (SVs), specifically, population-level Copy Number Variation (CNV) within microbiomes. K-core decomposition partitions the graph into shells containing nodes of induced degree at least K, yielding reduced computational complexity compared to prior approaches. Through validation on a synthetic community, we show that KOMB recovers and profiles repetitive genomic regions in the sample. KOMB is shown to identify functionally-important regions in Human Microbiome Project datasets, and was used to analyze longitudinal data and identify keystone taxa in Fecal Microbiota Transplantation (FMT) samples. In summary, KOMB represents a novel graph-based, taxonomy-oblivious, and reference-free approach for tracking CNV within microbiomes. KOMB is open source and available for download at https://gitlab.com/treangenlab/komb.
Collapse
Affiliation(s)
- Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Charlie Seto
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - R.A. Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Michael G. Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Tor Savidge
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
- Corresponding author.
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Corresponding author.
| |
Collapse
|
12
|
Tierney BT, Szymanski E, Henriksen JR, Kostic AD, Patel CJ. Using Cartesian Doubt To Build a Sequencing-Based View of Microbiology. mSystems 2021; 6:e0057421. [PMID: 34636670 PMCID: PMC8510522 DOI: 10.1128/msystems.00574-21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 09/23/2021] [Indexed: 12/13/2022] Open
Abstract
The technological leap of DNA sequencing generated a tension between modern metagenomics and historical microbiology. We are forcibly harmonizing the output of a modern tool with centuries of experimental knowledge derived from culture-based microbiology. As a thought experiment, we borrow the notion of Cartesian doubt from philosopher Rene Descartes, who used doubt to build a philosophical framework from his incorrigible statement that "I think therefore I am." We aim to cast away preconceived notions and conceptualize microorganisms through the lens of metagenomic sequencing alone. Specifically, we propose funding and building analysis and engineering methods that neither search for nor rely on the assumption of independent genomes bound by lipid barriers containing discrete functional roles and taxonomies. We propose that a view of microbial communities based in sequencing will engender novel insights into metagenomic structure and may capture functional biology not reflected within the current paradigm.
Collapse
Affiliation(s)
- Braden T. Tierney
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, Massachusetts, USA
- Section on Islet Cell and Regenerative Biology, Joslin Diabetes Center, Boston, Massachusetts, USA
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Erika Szymanski
- Department of English, Colorado State University, Fort Collins, Colorado, USA
| | | | - Aleksandar D. Kostic
- Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, Massachusetts, USA
- Section on Islet Cell and Regenerative Biology, Joslin Diabetes Center, Boston, Massachusetts, USA
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Chirag J. Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
13
|
Quince C, Nurk S, Raguideau S, James R, Soyer OS, Summers JK, Limasset A, Eren AM, Chikhi R, Darling AE. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 2021; 22:214. [PMID: 34311761 PMCID: PMC8311964 DOI: 10.1186/s13059-021-02419-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 06/29/2021] [Indexed: 12/30/2022] Open
Abstract
We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.
Collapse
Affiliation(s)
- Christopher Quince
- Organisms and Ecosystems, Earlham Institute, Norwich, NR4 7UZ, UK.
- Gut Microbes and Health, Quadram Institute, Norwich, NR4 7UQ, UK.
- Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK.
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, 20892, MD, USA.
| | - Sebastien Raguideau
- Organisms and Ecosystems, Earlham Institute, Norwich, NR4 7UZ, UK
- Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK
| | - Robert James
- Gut Microbes and Health, Quadram Institute, Norwich, NR4 7UQ, UK
| | - Orkun S Soyer
- School of Life Sciences, University of Warwick, Coventry, CV4 7AL, UK
| | | | | | - A Murat Eren
- Department of Medicine, University of Chicago, Chicago, Illinois, USA
- Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, Massachusetts, USA
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, C3BI USR 3756 IP CNRS, Paris, France
| | - Aaron E Darling
- The iThree institute, University of Technology Sydney, 15 Broadway, Ultimo, 2007, NSW, Australia
| |
Collapse
|
14
|
Lumian JE, Jungblut AD, Dillion ML, Hawes I, Doran PT, Mackey TJ, Dick GJ, Grettenberger CL, Sumner DY. Metabolic Capacity of the Antarctic Cyanobacterium Phormidium pseudopriestleyi That Sustains Oxygenic Photosynthesis in the Presence of Hydrogen Sulfide. Genes (Basel) 2021; 12:genes12030426. [PMID: 33809699 PMCID: PMC8002359 DOI: 10.3390/genes12030426] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/05/2021] [Accepted: 03/12/2021] [Indexed: 01/11/2023] Open
Abstract
Sulfide inhibits oxygenic photosynthesis by blocking electron transfer between H2O and the oxygen-evolving complex in the D1 protein of Photosystem II. The ability of cyanobacteria to counter this effect has implications for understanding the productivity of benthic microbial mats in sulfidic environments throughout Earth history. In Lake Fryxell, Antarctica, the benthic, filamentous cyanobacterium Phormidium pseudopriestleyi creates a 1–2 mm thick layer of 50 µmol L−1 O2 in otherwise sulfidic water, demonstrating that it sustains oxygenic photosynthesis in the presence of sulfide. A metagenome-assembled genome of P. pseudopriestleyi indicates a genetic capacity for oxygenic photosynthesis, including multiple copies of psbA (encoding the D1 protein of Photosystem II), and anoxygenic photosynthesis with a copy of sqr (encoding the sulfide quinone reductase protein that oxidizes sulfide). The genomic content of P. pseudopriestleyi is consistent with sulfide tolerance mechanisms including increasing psbA expression or directly oxidizing sulfide with sulfide quinone reductase. However, the ability of the organism to reduce Photosystem I via sulfide quinone reductase while Photosystem II is sulfide-inhibited, thereby performing anoxygenic photosynthesis in the presence of sulfide, has yet to be demonstrated.
Collapse
Affiliation(s)
- Jessica E. Lumian
- Microbiology Graduate Group, University of California, Davis, CA 95616, USA;
| | - Anne D. Jungblut
- Life Sciences Department, The Natural History Museum, London SW7 5BD, UK;
| | - Megan L. Dillion
- Genomics and Bioinformatics, Novozymes, Inc., Davis, CA 95616, USA;
| | - Ian Hawes
- Coastal Marine Field Station, University of Waikato, Tauranga 3110, New Zealand;
| | - Peter T. Doran
- Geology and Geophysics, Louisiana State University, Baton Rouge, LA 70803, USA;
| | - Tyler J. Mackey
- Department of Earth and Planetary Sciences, University of New Mexico, Albuquerque, NM 87131, USA;
| | - Gregory J. Dick
- Department of Earth and Environmental Sciences, University of Michigan, Ann Arbor, MI 48109, USA;
| | | | - Dawn Y. Sumner
- Department of Earth and Planetary Sciences, University of California, Davis, CA 95616, USA;
- Correspondence: ; Tel.: +1-530-752-5353
| |
Collapse
|
15
|
Overview of bioinformatic methods for analysis of antibiotic resistome from genome and metagenome data. J Microbiol 2021; 59:270-280. [DOI: 10.1007/s12275-021-0652-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 12/13/2022]
|
16
|
Paoli L, Sunagawa S. Space, time and microdiversity: towards a resolution revolution in microbiomics. ENVIRONMENTAL MICROBIOLOGY REPORTS 2021; 13:31-35. [PMID: 33063432 PMCID: PMC7894491 DOI: 10.1111/1758-2229.12897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/12/2020] [Indexed: 06/11/2023]
Affiliation(s)
- Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH ZürichZürichSwitzerland
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH ZürichZürichSwitzerland
| |
Collapse
|
17
|
Reiter T, Brooks† PT, Irber† L, Joslin† SEK, Reid† CM, Scott† C, Brown CT, Pierce-Ward NT. Streamlining data-intensive biology with workflow systems. Gigascience 2021; 10:giaa140. [PMID: 33438730 PMCID: PMC8631065 DOI: 10.1093/gigascience/giaa140] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 11/06/2020] [Accepted: 11/13/2020] [Indexed: 11/14/2022] Open
Abstract
As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.
Collapse
Affiliation(s)
- Taylor Reiter
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - Phillip T Brooks†
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - Luiz Irber†
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - Shannon E K Joslin†
- Department of Animal Science, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - Charles M Reid†
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - Camille Scott†
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - C Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| | - N Tessa Pierce-Ward
- Department of Population Health and Reproduction, University of California, Davis, 1 Shields Avenue, Davis, CA 95616, USA
| |
Collapse
|