1
|
Cerk K, Ugalde‐Salas P, Nedjad CG, Lecomte M, Muller C, Sherman DJ, Hildebrand F, Labarthe S, Frioux C. Community-scale models of microbiomes: Articulating metabolic modelling and metagenome sequencing. Microb Biotechnol 2024; 17:e14396. [PMID: 38243750 PMCID: PMC10832553 DOI: 10.1111/1751-7915.14396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/27/2023] [Accepted: 12/20/2023] [Indexed: 01/21/2024] Open
Abstract
Building models is essential for understanding the functions and dynamics of microbial communities. Metabolic models built on genome-scale metabolic network reconstructions (GENREs) are especially relevant as a means to decipher the complex interactions occurring among species. Model reconstruction increasingly relies on metagenomics, which permits direct characterisation of naturally occurring communities that may contain organisms that cannot be isolated or cultured. In this review, we provide an overview of the field of metabolic modelling and its increasing reliance on and synergy with metagenomics and bioinformatics. We survey the means of assigning functions and reconstructing metabolic networks from (meta-)genomes, and present the variety and mathematical fundamentals of metabolic models that foster the understanding of microbial dynamics. We emphasise the characterisation of interactions and the scaling of model construction to large communities, two important bottlenecks in the applicability of these models. We give an overview of the current state of the art in metagenome sequencing and bioinformatics analysis, focusing on the reconstruction of genomes in microbial communities. Metagenomics benefits tremendously from third-generation sequencing, and we discuss the opportunities of long-read sequencing, strain-level characterisation and eukaryotic metagenomics. We aim at providing algorithmic and mathematical support, together with tool and application resources, that permit bridging the gap between metagenomics and metabolic modelling.
Collapse
Affiliation(s)
- Klara Cerk
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | | | - Chabname Ghassemi Nedjad
- Inria, University of Bordeaux, INRAETalenceFrance
- University of Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800TalenceFrance
| | - Maxime Lecomte
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE STLO¸University of RennesRennesFrance
| | | | | | - Falk Hildebrand
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Simon Labarthe
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE, University of Bordeaux, BIOGECO, UMR 1202CestasFrance
| | | |
Collapse
|
2
|
Belcour A, Got J, Aite M, Delage L, Collén J, Frioux C, Leblanc C, Dittami SM, Blanquart S, Markov GV, Siegel A. Inferring and comparing metabolism across heterogeneous sets of annotated genomes using AuCoMe. Genome Res 2023; 33:972-987. [PMID: 37468308 PMCID: PMC10629481 DOI: 10.1101/gr.277056.122] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 05/23/2023] [Indexed: 07/21/2023]
Abstract
Comparative analysis of genome-scale metabolic networks (GSMNs) may yield important information on the biology, evolution, and adaptation of species. However, it is impeded by the high heterogeneity of the quality and completeness of structural and functional genome annotations, which may bias the results of such comparisons. To address this issue, we developed AuCoMe, a pipeline to automatically reconstruct homogeneous GSMNs from a heterogeneous set of annotated genomes without discarding available manual annotations. We tested AuCoMe with three data sets, one bacterial, one fungal, and one algal, and showed that it successfully reduces technical biases while capturing the metabolic specificities of each organism. Our results also point out shared and divergent metabolic traits among evolutionarily distant algae, underlining the potential of AuCoMe to accelerate the broad exploration of metabolic evolution across the tree of life.
Collapse
Affiliation(s)
- Arnaud Belcour
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France;
| | - Jeanne Got
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Méziane Aite
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Ludovic Delage
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Jonas Collén
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | | | - Catherine Leblanc
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Simon M Dittami
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | | | - Gabriel V Markov
- Sorbonne Université, CNRS, Integrative Biology of Marine Models (LBI2M), Station Biologique de Roscoff (SBR), 29680 Roscoff, France
| | - Anne Siegel
- Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France;
| |
Collapse
|
3
|
Borer B, Magnúsdóttir S. The media composition as a crucial element in high-throughput metabolic network reconstruction. Interface Focus 2023; 13:20220070. [PMID: 36789238 PMCID: PMC9912011 DOI: 10.1098/rsfs.2022.0070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 01/11/2023] [Indexed: 02/12/2023] Open
Abstract
In recent years, metagenome-assembled genomes (MAGs) have provided glimpses into the intra- and interspecies genetic diversity and interactions that form the bases of complex microbial communities. High-throughput reconstruction of genome-scale metabolic networks (GEMs) from MAGs is a promising avenue to disentangle the myriad trophic interactions stabilizing these communities. However, high-throughput reconstruction of GEMs relies on accurate gap filling of metabolic pathways using automated algorithms. Here, we systematically explore how the composition of the media (specification of the available nutrients and metabolites) during gap filling influences the resulting GEMs concerning predicted auxotrophies for fully sequenced model organisms and environmental isolates. We expand this analysis by using 106 MAGs from the same species with differing quality. We find that although the completeness of MAGs influences the fraction of gap-filled reactions, the composition of the media plays the dominant role in the accurate prediction of auxotrophies that form the basis of myriad community interactions. We propose that constraining the media composition for gap filling through both experimental approaches and computational approaches will increase the reliability of high-throughput reconstruction of genome-scale metabolic models from MAGs and paves the way for culture independent prediction of trophic interactions in complex microbial communities.
Collapse
Affiliation(s)
- Benedict Borer
- Earth, Atmospheric and Planetary Sciences Department, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Stefanía Magnúsdóttir
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ, Leipzig 04318, Germany
| |
Collapse
|
4
|
Metabolic Modeling with MetaFlux. Methods Mol Biol 2021. [PMID: 34718999 DOI: 10.1007/978-1-0716-1585-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]
Abstract
The MetaFlux software supports creating, executing, and solving quantitative metabolic flux models using flux balance analysis (FBA). MetaFlux offers four modes of operation: (1) solving mode executes an FBA model for an individual organism or for an organism community, (2) gene knockout mode executes an FBA model with one or many gene knockouts, (3) development mode assists the user in creating and improving FBA models, and (4) flux variability analysis mode generates a report of the robustness of an FBA model. MetaFlux also solves dynamic FBA (dFBA) for both individual organisms and communities of organisms. MetaFlux can be used in two different environments: on your local computer, which requires the installation of the Pathway Tools software, or through the web, which does not require installation of Pathway Tools. On your local computer, MetaFlux offers all four modes of operation, whereas the web environment provides only the solving mode.Several visualization tools are available to analyze model solutions. The Cellular Overview tool graphically shows the reaction fluxes on an organism's metabolic map once a model is solved. The Omics Dashboard provides a hierarchical approach to visualizing reaction fluxes, organized by metabolic subsystems. For a community of organisms, plotting of accumulated biomasses and metabolites can be performed using the Gnuplot tool.In this chapter, we present eight methods using MetaFlux. Five solving mode methods illustrate execution of models for individual organisms and for organism communities. One method illustrates the gene knockout mode. Two methods for the development mode illustrate steps for developing new metabolic models.
Collapse
|
5
|
Bernstein DB, Sulheim S, Almaas E, Segrè D. Addressing uncertainty in genome-scale metabolic model reconstruction and analysis. Genome Biol 2021; 22:64. [PMID: 33602294 PMCID: PMC7890832 DOI: 10.1186/s13059-021-02289-z] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 02/04/2021] [Indexed: 02/07/2023] Open
Abstract
The reconstruction and analysis of genome-scale metabolic models constitutes a powerful systems biology approach, with applications ranging from basic understanding of genotype-phenotype mapping to solving biomedical and environmental problems. However, the biological insight obtained from these models is limited by multiple heterogeneous sources of uncertainty, which are often difficult to quantify. Here we review the major sources of uncertainty and survey existing approaches developed for representing and addressing them. A unified formal characterization of these uncertainties through probabilistic approaches and ensemble modeling will facilitate convergence towards consistent reconstruction pipelines, improved data integration algorithms, and more accurate assessment of predictive capacity.
Collapse
Affiliation(s)
- David B Bernstein
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, MA, USA
| | - Snorre Sulheim
- Bioinformatics Program, Boston University, Boston, MA, USA
- Department of Biotechnology and Food Science, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
- Department of Biotechnology and Nanomedicine, SINTEF Industry, Trondheim, Norway
| | - Eivind Almaas
- Department of Biotechnology and Food Science, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
- K.G. Jebsen Center for Genetic Epidemiology, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - Daniel Segrè
- Department of Biomedical Engineering and Biological Design Center, Boston University, Boston, MA, USA.
- Bioinformatics Program, Boston University, Boston, MA, USA.
- Department of Biology and Department of Physics, Boston University, Boston, MA, USA.
| |
Collapse
|
6
|
Frioux C, Singh D, Korcsmaros T, Hildebrand F. From bag-of-genes to bag-of-genomes: metabolic modelling of communities in the era of metagenome-assembled genomes. Comput Struct Biotechnol J 2020; 18:1722-1734. [PMID: 32670511 PMCID: PMC7347713 DOI: 10.1016/j.csbj.2020.06.028] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 06/16/2020] [Accepted: 06/17/2020] [Indexed: 12/12/2022] Open
Abstract
Metagenomic sequencing of complete microbial communities has greatly enhanced our understanding of the taxonomic composition of microbiotas. This has led to breakthrough developments in bioinformatic disciplines such as assembly, gene clustering, metagenomic binning of species genomes and the discovery of an incredible, so far undiscovered, taxonomic diversity. However, functional annotations and estimating metabolic processes from single species - or communities - is still challenging. Earlier approaches relied mostly on inferring the presence of key enzymes for metabolic pathways in the whole metagenome, ignoring the genomic context of such enzymes, resulting in the 'bag-of-genes' approach to estimate functional capacities of microbiotas. Here, we review recent developments in metagenomic bioinformatics, with a special focus on emerging technologies to simulate and estimate metabolic information, that can be derived from metagenomic assembled genomes. Genome-scale metabolic models can be used to model the emergent properties of microbial consortia and whole communities, and the progress in this area is reviewed. While this subfield of metagenomics is still in its infancy, it is becoming evident that there is a dire need for further bioinformatic tools to address the complex combinatorial problems in modelling the metabolism of large communities as a 'bag-of-genomes'.
Collapse
Affiliation(s)
- Clémence Frioux
- Inria, CNRS, INRAE Bordeaux, France
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, Norfolk, UK
| | - Dipali Singh
- Microbes in the Food Chain, Quadram Institute Bioscience, Norwich, Norfolk, UK
| | - Tamas Korcsmaros
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, Norfolk, UK
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - Falk Hildebrand
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, Norfolk, UK
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| |
Collapse
|
7
|
Ong WK, Midford PE, Karp PD. Taxonomic weighting improves the accuracy of a gap-filling algorithm for metabolic models. Bioinformatics 2020; 36:1823-1830. [PMID: 31688932 PMCID: PMC7523652 DOI: 10.1093/bioinformatics/btz813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/29/2019] [Accepted: 10/31/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The increasing availability of annotated genome sequences enables construction of genome-scale metabolic networks, which are useful tools for studying organisms of interest. However, due to incomplete genome annotations, draft metabolic models contain gaps that must be filled in a time-consuming process before they are usable. Optimization-based algorithms that fill these gaps have been developed, however, gap-filling algorithms show significant error rates and often introduce incorrect reactions. RESULTS Here, we present a new gap-filling method that computes the costs of candidate gap-filling reactions from a universal reaction database (MetaCyc) based on taxonomic information. When gap-filling a metabolic model for an organism M (such as Escherichia coli), the cost for reaction R is based on the frequency with which R occurs in other organisms within the phylum of M (in this case, Proteobacteria). The assumption behind this method is that different taxonomic groups are biased toward using different metabolic reactions. Evaluation of the new gap-filler on randomly degraded variants of the EcoCyc metabolic model for E.coli showed an increase in the average F1-score to 99.0 (when using the variable weights by frequency method at the phylum level), compared to 91.0 using the previous MetaFlux gap-filler and 80.3 using a basic gap-filler. Evaluation on two other microbial metabolic models showed similar improvements. AVAILABILITY AND IMPLEMENTATION The Pathway Tools software (including MetaFlux) is free for academic use and is available at http://pathwaytools.com. Additional code for reproducing the results presented here is available at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wai Kit Ong
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| | - Peter E Midford
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| | - Peter D Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| |
Collapse
|
8
|
Schroeder WL, Saha R. OptFill: A Tool for Infeasible Cycle-Free Gapfilling of Stoichiometric Metabolic Models. iScience 2019; 23:100783. [PMID: 31954977 PMCID: PMC6970165 DOI: 10.1016/j.isci.2019.100783] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 12/03/2019] [Accepted: 12/12/2019] [Indexed: 02/06/2023] Open
Abstract
Stoichiometric metabolic modeling, particularly genome-scale models (GSMs), is now an indispensable tool for systems biology. The model reconstruction process typically involves collecting information from public databases; however, incomplete systems knowledge leaves gaps in any reconstruction. Current tools for addressing gaps use databases of biochemical functionalities to address gaps on a per-metabolite basis and can provide multiple solutions but cannot avoid thermodynamically infeasible cycles (TICs), invariably requiring lengthy manual curation. To address these limitations, this work introduces an optimization-based multi-step method named OptFill, which performs TIC-avoiding whole-model gapfilling. We applied OptFill to three fictional prokaryotic models of increasing sizes and to a published GSM of Escherichia coli, iJR904. This application resulted in holistic and infeasible cycle-free gapfilling solutions. In addition, OptFill can be adapted to automate inherent TICs identification in any GSM. Overall, OptFill can address critical issues in automated development of high-quality GSMs. This work presents an alternative to state-of-the-art methods for gapfilling Unlike current methods, this method is holistic and infeasible cycle free This method is applied to three tests and one published model This method might also be used to address infeasible cycling
Collapse
Affiliation(s)
- Wheaton L Schroeder
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Rajib Saha
- Department of Chemical and Biomolecular Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
| |
Collapse
|
9
|
Karp PD, Midford PE, Billington R, Kothari A, Krummenacker M, Latendresse M, Ong WK, Subhraveti P, Caspi R, Fulcher C, Keseler IM, Paley SM. Pathway Tools version 23.0 update: software for pathway/genome informatics and systems biology. Brief Bioinform 2019; 22:109-126. [PMID: 31813964 DOI: 10.1093/bib/bbz104] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/23/2019] [Accepted: 07/24/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Biological systems function through dynamic interactions among genes and their products, regulatory circuits and metabolic networks. Our development of the Pathway Tools software was motivated by the need to construct biological knowledge resources that combine these many types of data, and that enable users to find and comprehend data of interest as quickly as possible through query and visualization tools. Further, we sought to support the development of metabolic flux models from pathway databases, and to use pathway information to leverage the interpretation of high-throughput data sets. RESULTS In the past 4 years we have enhanced the already extensive Pathway Tools software in several respects. It can now support metabolic-model execution through the Web, it provides a more accurate gap filler for metabolic models; it supports development of models for organism communities distributed across a spatial grid; and model results may be visualized graphically. Pathway Tools supports several new omics-data analysis tools including the Omics Dashboard, multi-pathway diagrams called pathway collages, a pathway-covering algorithm for metabolomics data analysis and an algorithm for generating mechanistic explanations of multi-omics data. We have also improved the core pathway/genome databases management capabilities of the software, providing new multi-organism search tools for organism communities, improved graphics rendering, faster performance and re-designed gene and metabolite pages. AVAILABILITY The software is free for academic use; a fee is required for commercial use. See http://pathwaytools.com. CONTACT pkarp@ai.sri.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Peter E Midford
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Richard Billington
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Anamika Kothari
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | | - Mario Latendresse
- Artificial Intelligence Center, SRI International, Menlo Park, CA 94025, USA
| | - Wai Kit Ong
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Pallavi Subhraveti
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Ron Caspi
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Carol Fulcher
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Ingrid M Keseler
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | - Suzanne M Paley
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| |
Collapse
|
10
|
Karp PD, Weaver D, Latendresse M. How accurate is automated gap filling of metabolic models? BMC SYSTEMS BIOLOGY 2018; 12:73. [PMID: 29914471 PMCID: PMC6006690 DOI: 10.1186/s12918-018-0593-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 05/31/2018] [Indexed: 12/20/2022]
Abstract
Background Reaction gap filling is a computational technique for proposing the addition of reactions to genome-scale metabolic models to permit those models to run correctly. Gap filling completes what are otherwise incomplete models that lack fully connected metabolic networks. The models are incomplete because they are derived from annotated genomes in which not all enzymes have been identified. Here we compare the results of applying an automated likelihood-based gap filler within the Pathway Tools software with the results of manually gap filling the same metabolic model. Both gap-filling exercises were applied to the same genome-derived qualitative metabolic reconstruction for Bifidobacterium longum subsp. longum JCM 1217, and to the same modeling conditions — anaerobic growth under four nutrients producing 53 biomass metabolites. Results The solution computed by the gap-filling program GenDev contained 12 reactions, but closer examination showed that solution was not minimal; two of the twelve reactions can be removed to yield a set of ten reactions that enable model growth. The manually curated solution contained 13 reactions, eight of which were shared with the 12-reaction computed solution. Thus, GenDev achieved recall of 61.5% and precision of 66.6%. These results suggest that although computational gap fillers are populating metabolic models with significant numbers of correct reactions, automatically gap-filled metabolic models also contain significant numbers of incorrect reactions. Conclusions Our conclusion is that manual curation of gap-filler results is needed to obtain high-accuracy models. Many of the differences between the manual and automatic solutions resulted from using expert biological knowledge to direct the choice of reactions within the curated solution, such as reactions specific to the anaerobic lifestyle of B. longum. Electronic supplementary material The online version of this article (10.1186/s12918-018-0593-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, 94025, USA.
| | - Daniel Weaver
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, 94025, USA
| | - Mario Latendresse
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, 94025, USA
| |
Collapse
|