1
|
Aizpurua O, Dunn RR, Hansen LH, Gilbert MTP, Alberdi A. Field and laboratory guidelines for reliable bioinformatic and statistical analysis of bacterial shotgun metagenomic data. Crit Rev Biotechnol 2024; 44:1164-1182. [PMID: 37731336 DOI: 10.1080/07388551.2023.2254933] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/22/2023] [Accepted: 06/27/2023] [Indexed: 09/22/2023]
Abstract
Shotgun metagenomics is an increasingly cost-effective approach for profiling environmental and host-associated microbial communities. However, due to the complexity of both microbiomes and the molecular techniques required to analyze them, the reliability and representativeness of the results are contingent upon the field, laboratory, and bioinformatic procedures employed. Here, we consider 15 field and laboratory issues that critically impact downstream bioinformatic and statistical data processing, as well as result interpretation, in bacterial shotgun metagenomic studies. The issues we consider encompass intrinsic properties of samples, study design, and laboratory-processing strategies. We identify the links of field and laboratory steps with downstream analytical procedures, explain the means for detecting potential pitfalls, and propose mitigation measures to overcome or minimize their impact in metagenomic studies. We anticipate that our guidelines will assist data scientists in appropriately processing and interpreting their data, while aiding field and laboratory researchers to implement strategies for improving the quality of the generated results.
Collapse
Affiliation(s)
- Ostaizka Aizpurua
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Robert R Dunn
- Department of Applied Ecology, North Carolina State University, Raleigh, NC, USA
| | - Lars H Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - M T P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
2
|
Clark CM, Kwan JC. Creating and leveraging bespoke large-scale knowledge graphs for comparative genomics and multi-omics drug discovery with SocialGene. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.16.608329. [PMID: 39229008 PMCID: PMC11370487 DOI: 10.1101/2024.08.16.608329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
The rapid expansion of multi-omics data has transformed biological research, offering unprecedented opportunities to explore complex genomic relationships across diverse organisms. However, the vast volume and heterogeneity of these datasets presents significant challenges for analyses. Here we introduce SocialGene, a comprehensive software suite designed to collect, analyze, and organize multi-omics data into structured knowledge graphs, with the ability to handle small projects to repository-scale analyses. Originally developed to enhance genome mining for natural product drug discovery, SocialGene has been effective across various applications, including functional genomics, evolutionary studies, and systems biology. SocialGene's concerted Python and Nextflow libraries streamline data ingestion, manipulation, aggregation, and analysis, culminating in a custom Neo4j database. The software not only facilitates the exploration of genomic synteny but also provides a foundational knowledge graph supporting the integration of additional diverse datasets and the development of advanced search engines and analyses. This manuscript introduces some of SocialGene's capabilities through brief case studies including targeted genome mining for drug discovery, accelerated searches for similar and distantly related biosynthetic gene clusters in biobank-available organisms, integration of chemical and analytical data, and more. SocialGene is free, open-source, MIT-licensed, designed for adaptability and extension, and available from github.com/socialgene.
Collapse
Affiliation(s)
- Chase M. Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Jason C. Kwan
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
3
|
Ramoneda J, Fan K, Lucas JM, Chu H, Bissett A, Strickland MS, Fierer N. Ecological relevance of flagellar motility in soil bacterial communities. THE ISME JOURNAL 2024; 18:wrae067. [PMID: 38648266 PMCID: PMC11095265 DOI: 10.1093/ismejo/wrae067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/27/2024] [Accepted: 04/18/2024] [Indexed: 04/25/2024]
Abstract
Flagellar motility is a key bacterial trait as it allows bacteria to navigate their immediate surroundings. Not all bacteria are capable of flagellar motility, and the distribution of this trait, its ecological associations, and the life history strategies of flagellated taxa remain poorly characterized. We developed and validated a genome-based approach to infer the potential for flagellar motility across 12 bacterial phyla (26 192 unique genomes). The capacity for flagellar motility was associated with a higher prevalence of genes for carbohydrate metabolism and higher maximum potential growth rates, suggesting that flagellar motility is more prevalent in environments with higher carbon availability. To test this hypothesis, we applied a method to infer the prevalence of flagellar motility in whole bacterial communities from metagenomic data and quantified the prevalence of flagellar motility across four independent field studies that each captured putative gradients in soil carbon availability (148 metagenomes). We observed a positive relationship between the prevalence of bacterial flagellar motility and soil carbon availability in all datasets. Since soil carbon availability is often correlated with other factors that could influence the prevalence of flagellar motility, we validated these observations using metagenomic data from a soil incubation experiment where carbon availability was directly manipulated with glucose amendments. This confirmed that the prevalence of bacterial flagellar motility is consistently associated with soil carbon availability over other potential confounding factors. This work highlights the value of combining predictive genomic and metagenomic approaches to expand our understanding of microbial phenotypic traits and reveal their general environmental associations.
Collapse
Affiliation(s)
- Josep Ramoneda
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, 80309 Boulder, CO, United States
- Spanish Research Council (CSIC), Center for Advanced Studies of Blanes (CEAB), 17300 Blanes, Spain
| | - Kunkun Fan
- Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, 210008 Nanjing, China
| | - Jane M Lucas
- Cary Institute of Ecosystem Studies, 12545 Millbrook, NY, United States
| | - Haiyan Chu
- Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, 210008 Nanjing, China
- University of Chinese Academy of Sciences, 101408 Beijing, China
| | | | - Michael S Strickland
- Department of Soil and Water Systems, University of Idaho, 83843 Moscow, ID, United States
| | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, 80309 Boulder, CO, United States
- Department of Ecology and Evolutionary Biology, University of Colorado, 80309 Boulder, CO, United States
| |
Collapse
|
4
|
Ramoneda J, Hoffert M, Stallard-Olivera E, Casamayor EO, Fierer N. Leveraging genomic information to predict environmental preferences of bacteria. THE ISME JOURNAL 2024; 18:wrae195. [PMID: 39361898 PMCID: PMC11488383 DOI: 10.1093/ismejo/wrae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/24/2024] [Accepted: 10/02/2024] [Indexed: 10/05/2024]
Abstract
Genomic information is now available for a broad diversity of bacteria, including uncultivated taxa. However, we have corresponding knowledge on environmental preferences (i.e. bacterial growth responses across gradients in oxygen, pH, temperature, salinity, and other environmental conditions) for a relatively narrow swath of bacterial diversity. These limits to our understanding of bacterial ecologies constrain our ability to predict how assemblages will shift in response to global change factors, design effective probiotics, or guide cultivation efforts. We need innovative approaches that take advantage of expanding genome databases to accurately infer the environmental preferences of bacteria and validate the accuracy of these inferences. By doing so, we can broaden our quantitative understanding of the environmental preferences of the majority of bacterial taxa that remain uncharacterized. With this perspective, we highlight why it is important to infer environmental preferences from genomic information and discuss the range of potential strategies for doing so. In particular, we highlight concrete examples of how both cultivation-independent and cultivation-dependent approaches can be integrated with genomic data to develop predictive models. We also emphasize the limitations and pitfalls of these approaches and the specific knowledge gaps that need to be addressed to successfully expand our understanding of the environmental preferences of bacteria.
Collapse
Affiliation(s)
- Josep Ramoneda
- Department of Ecology and Complexity, Center of Advanced Studies of Blanes (CEAB), Spanish Research Council (CSIC), Blanes, Spain
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, Colorado, United States
| | - Michael Hoffert
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, United States
| | - Elias Stallard-Olivera
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, United States
| | - Emilio O Casamayor
- Department of Ecology and Complexity, Center of Advanced Studies of Blanes (CEAB), Spanish Research Council (CSIC), Blanes, Spain
| | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, Colorado, United States
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, United States
| |
Collapse
|
5
|
Dragone NB, Hoffert M, Strickland MS, Fierer N. Taxonomic and genomic attributes of oligotrophic soil bacteria. ISME COMMUNICATIONS 2024; 4:ycae081. [PMID: 38988701 PMCID: PMC11234899 DOI: 10.1093/ismeco/ycae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 03/15/2024] [Accepted: 06/11/2024] [Indexed: 07/12/2024]
Abstract
Not all bacteria are fast growers. In soil as in other environments, bacteria exist along a continuum-from copiotrophs that can grow rapidly under resource-rich conditions to oligotrophs that are adapted to life in the "slow lane." However, the field of microbiology is built almost exclusively on the study of copiotrophs due, in part, to the ease of studying them in vitro. To begin understanding the attributes of soil oligotrophs, we analyzed three independent datasets that represent contrasts in organic carbon availability. These datasets included 185 samples collected from soil profiles across the USA, 950 paired bulk soil and rhizosphere samples collected across Europe, and soils from a microcosm experiment where carbon availability was manipulated directly. Using a combination of marker gene sequencing and targeted genomic analyses, we identified specific oligotrophic taxa that were consistently more abundant in carbon-limited environments (subsurface, bulk, unamended soils) compared to the corresponding carbon-rich environment (surface, rhizosphere, glucose-amended soils), including members of the Dormibacterota and Chloroflexi phyla. In general, putative soil oligotrophs had smaller genomes, slower maximum potential growth rates, and were under-represented in culture collections. The genomes of oligotrophs were more likely to be enriched in pathways that allow oligotrophs to metabolize a range of energy sources and store carbon, while genes associated with energy-intensive functions like chemotaxis and motility were under-represented. However, few genomic attributes were shared, highlighting that oligotrophs likely use a range of different metabolic strategies and regulatory pathways to thrive in resource-limited soils.
Collapse
Affiliation(s)
- Nicholas B Dragone
- Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80309, United States
| | - Michael Hoffert
- Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80309, United States
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO 80309, United States
| | - Michael S Strickland
- Department of Soil and Water Systems, University of Idaho, Moscow, ID 83844, United States
| | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO 80309, United States
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO 80309, United States
| |
Collapse
|
6
|
Meredith LK, Ledford SM, Riemer K, Geffre P, Graves K, Honeker LK, LeBauer D, Tfaily MM, Krechmer J. Automating methods for estimating metabolite volatility. Front Microbiol 2023; 14:1267234. [PMID: 38163064 PMCID: PMC10755872 DOI: 10.3389/fmicb.2023.1267234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 11/13/2023] [Indexed: 01/03/2024] Open
Abstract
The volatility of metabolites can influence their biological roles and inform optimal methods for their detection. Yet, volatility information is not readily available for the large number of described metabolites, limiting the exploration of volatility as a fundamental trait of metabolites. Here, we adapted methods to estimate vapor pressure from the functional group composition of individual molecules (SIMPOL.1) to predict the gas-phase partitioning of compounds in different environments. We implemented these methods in a new open pipeline called volcalc that uses chemoinformatic tools to automate these volatility estimates for all metabolites in an extensive and continuously updated pathway database: the Kyoto Encyclopedia of Genes and Genomes (KEGG) that connects metabolites, organisms, and reactions. We first benchmark the automated pipeline against a manually curated data set and show that the same category of volatility (e.g., nonvolatile, low, moderate, high) is predicted for 93% of compounds. We then demonstrate how volcalc might be used to generate and test hypotheses about the role of volatility in biological systems and organisms. Specifically, we estimate that 3.4 and 26.6% of compounds in KEGG have high volatility depending on the environment (soil vs. clean atmosphere, respectively) and that a core set of volatiles is shared among all domains of life (30%) with the largest proportion of kingdom-specific volatiles identified in bacteria. With volcalc, we lay a foundation for uncovering the role of the volatilome using an approach that is easily integrated with other bioinformatic pipelines and can be continually refined to consider additional dimensions to volatility. The volcalc package is an accessible tool to help design and test hypotheses on volatile metabolites and their unique roles in biological systems.
Collapse
Affiliation(s)
- Laura K. Meredith
- School of Natural Resources and the Environment, University of Arizona, Tucson, AZ, United States
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
| | - S. Marshall Ledford
- Genetics Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, United States
| | - Kristina Riemer
- Arizona Experiment Station, University of Arizona, Tucson, AZ, United States
| | - Parker Geffre
- School of Natural Resources and the Environment, University of Arizona, Tucson, AZ, United States
| | - Kelsey Graves
- Department of Environmental Science, University of Arizona, Tucson, AZ, United States
| | - Linnea K. Honeker
- School of Natural Resources and the Environment, University of Arizona, Tucson, AZ, United States
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
| | - David LeBauer
- Arizona Experiment Station, University of Arizona, Tucson, AZ, United States
| | - Malak M. Tfaily
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
- Department of Environmental Science, University of Arizona, Tucson, AZ, United States
| | | |
Collapse
|
7
|
Ramoneda J, Jensen TBN, Price MN, Casamayor EO, Fierer N. Taxonomic and environmental distribution of bacterial amino acid auxotrophies. Nat Commun 2023; 14:7608. [PMID: 37993466 PMCID: PMC10665431 DOI: 10.1038/s41467-023-43435-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/08/2023] [Indexed: 11/24/2023] Open
Abstract
Many microorganisms are auxotrophic-unable to synthesize the compounds they require for growth. With this work, we quantify the prevalence of amino acid auxotrophies across a broad diversity of bacteria and habitats. We predicted the amino acid biosynthetic capabilities of 26,277 unique bacterial genomes spanning 12 phyla using a metabolic pathway model validated with empirical data. Amino acid auxotrophy is widespread across bacterial phyla, but we conservatively estimate that the majority of taxa (78.4%) are able to synthesize all amino acids. Our estimates indicate that amino acid auxotrophies are more prevalent among obligate intracellular parasites and in free-living taxa with genomic attributes characteristic of 'streamlined' life history strategies. We predicted the amino acid biosynthetic capabilities of bacterial communities found in 12 unique habitats to investigate environmental associations with auxotrophy, using data compiled from 3813 samples spanning major aquatic, terrestrial, and engineered environments. Auxotrophic taxa were more abundant in host-associated environments (including the human oral cavity and gut) and in fermented food products, with auxotrophic taxa being relatively rare in soil and aquatic systems. Overall, this work contributes to a more complete understanding of amino acid auxotrophy across the bacterial tree of life and the ecological contexts in which auxotrophy can be a successful strategy.
Collapse
Affiliation(s)
- Josep Ramoneda
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, CO, USA.
| | - Thomas B N Jensen
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, CO, USA
- Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Morgan N Price
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Emilio O Casamayor
- Spanish Research Council (CSIC), Center for Advanced Studies of Blanes (CEAB), Blanes, Spain
| | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, CO, USA.
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA.
| |
Collapse
|
8
|
Akaçin İ, Ersoy Ş, Doluca O, Güngörmüşler M. Using custom-built primers and nanopore sequencing to evaluate CO-utilizer bacterial and archaeal populations linked to bioH 2 production. Sci Rep 2023; 13:17025. [PMID: 37813931 PMCID: PMC10562470 DOI: 10.1038/s41598-023-44357-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/06/2023] [Indexed: 10/11/2023] Open
Abstract
The microbial community composition of five distinct thermophilic hot springs was effectively described in this work, using broad-coverage nanopore sequencing (ONT MinION sequencer). By examining environmental samples from the same source, but from locations with different temperatures, bioinformatic analysis revealed dramatic changes in microbial diversity and archaeal abundance. More specifically, no archaeal presence was reported with universal bacterial primers, whereas a significant archaea presence and also a wider variety of bacterial species were reported. These results revealed the significance of primer preference for microbiomes in extreme environments. Bioinformatic analysis was performed by aligning the reads to 16S microbial databases for identification using three different alignment methods, Epi2Me (Fastq 16S workflow), Kraken, and an in-house BLAST tool, including comparison at the genus and species levels. As a result, this approach to data analysis had a significant impact on the genera identified, and thus, it is recommended that use of multiple analysis tools to support findings on taxonomic identification using the 16S region until more precise bioinformatics tools become available. This study presents the first compilation of the ONT-based inventory of the hydrogen producers in the designated hot springs in Türkiye.
Collapse
Affiliation(s)
- İlayda Akaçin
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
| | - Şeymanur Ersoy
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
| | - Osman Doluca
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
- Department of Biomedical Engineering, Faculty of Engineering, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
| | - Mine Güngörmüşler
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye.
- Department of Genetics and Bioengineering, Faculty of Engineering, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye.
| |
Collapse
|