1
|
Backman TWH, Schenk C, Radivojevic T, Ando D, Singh J, Czajka JJ, Costello Z, Keasling JD, Tang Y, Akhmatskaya E, Garcia Martin H. BayFlux: A Bayesian method to quantify metabolic Fluxes and their uncertainty at the genome scale. PLoS Comput Biol 2023; 19:e1011111. [PMID: 37948450 PMCID: PMC10664898 DOI: 10.1371/journal.pcbi.1011111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 11/22/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023] Open
Abstract
Metabolic fluxes, the number of metabolites traversing each biochemical reaction in a cell per unit time, are crucial for assessing and understanding cell function. 13C Metabolic Flux Analysis (13C MFA) is considered to be the gold standard for measuring metabolic fluxes. 13C MFA typically works by leveraging extracellular exchange fluxes as well as data from 13C labeling experiments to calculate the flux profile which best fit the data for a small, central carbon, metabolic model. However, the nonlinear nature of the 13C MFA fitting procedure means that several flux profiles fit the experimental data within the experimental error, and traditional optimization methods offer only a partial or skewed picture, especially in "non-gaussian" situations where multiple very distinct flux regions fit the data equally well. Here, we present a method for flux space sampling through Bayesian inference (BayFlux), that identifies the full distribution of fluxes compatible with experimental data for a comprehensive genome-scale model. This Bayesian approach allows us to accurately quantify uncertainty in calculated fluxes. We also find that, surprisingly, the genome-scale model of metabolism produces narrower flux distributions (reduced uncertainty) than the small core metabolic models traditionally used in 13C MFA. The different results for some reactions when using genome-scale models vs core metabolic models advise caution in assuming strong inferences from 13C MFA since the results may depend significantly on the completeness of the model used. Based on BayFlux, we developed and evaluated novel methods (P-13C MOMA and P-13C ROOM) to predict the biological results of a gene knockout, that improve on the traditional MOMA and ROOM methods by quantifying prediction uncertainty.
Collapse
Affiliation(s)
- Tyler W. H. Backman
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Biofuels and Bioproducts Division, Joint BioEnergy Institute, Emeryville, California, United States of America
| | - Christina Schenk
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- BCAM, Basque Center for Applied Mathematics, Bilbao, Spain
- DOE Agile BioFoundry, Emeryville, California, United States of America
| | - Tijana Radivojevic
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Biofuels and Bioproducts Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- DOE Agile BioFoundry, Emeryville, California, United States of America
| | - David Ando
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Biofuels and Bioproducts Division, Joint BioEnergy Institute, Emeryville, California, United States of America
| | - Jahnavi Singh
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, United States of America
| | - Jeffrey J. Czajka
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Zak Costello
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Biofuels and Bioproducts Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- DOE Agile BioFoundry, Emeryville, California, United States of America
| | - Jay D. Keasling
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Biofuels and Bioproducts Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California, United States of America
- Department of Bioengineering, University of California, Berkeley, California, United States of America
- QB3 Institute, University of California, Berkeley, California, United States of America
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
- Center for Synthetic Biochemistry, Institute for Synthetic Biology, Shenzhen Institutes for Advanced Technologies, Shenzhen, China
| | - Yinjie Tang
- Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Elena Akhmatskaya
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- BCAM, Basque Center for Applied Mathematics, Bilbao, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| | - Hector Garcia Martin
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Biofuels and Bioproducts Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- BCAM, Basque Center for Applied Mathematics, Bilbao, Spain
- DOE Agile BioFoundry, Emeryville, California, United States of America
| |
Collapse
|
2
|
Eng CH, Backman TWH, Bailey CB, Magnan C, García Martín H, Katz L, Baldi P, Keasling JD. ClusterCAD: a computational platform for type I modular polyketide synthase design. Nucleic Acids Res 2019; 46:D509-D515. [PMID: 29040649 PMCID: PMC5753242 DOI: 10.1093/nar/gkx893] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 09/24/2017] [Indexed: 01/10/2023] Open
Abstract
ClusterCAD is a web-based toolkit designed to leverage the collinear structure and deterministic logic of type I modular polyketide synthases (PKSs) for synthetic biology applications. The unique organization of these megasynthases, combined with the diversity of their catalytic domain building blocks, has fueled an interest in harnessing the biosynthetic potential of PKSs for the microbial production of both novel natural product analogs and industrially relevant small molecules. However, a limited theoretical understanding of the determinants of PKS fold and function poses a substantial barrier to the design of active variants, and identifying strategies to reliably construct functional PKS chimeras remains an active area of research. In this work, we formalize a paradigm for the design of PKS chimeras and introduce ClusterCAD as a computational platform to streamline and simplify the process of designing experiments to test strategies for engineering PKS variants. ClusterCAD provides chemical structures with stereochemistry for the intermediates generated by each PKS module, as well as sequence- and structure-based search tools that allow users to identify modules based either on amino acid sequence or on the chemical structure of the cognate polyketide intermediate. ClusterCAD can be accessed at https://clustercad.jbei.org and at http://clustercad.igb.uci.edu.
Collapse
Affiliation(s)
- Clara H Eng
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA
| | - Tyler W H Backman
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Energy Agile BioFoundry, Emeryville, CA 94608, USA
| | - Constance B Bailey
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Christophe Magnan
- Department of Computer Science, University of California, Irvine, CA 92697, USA.,Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Héctor García Martín
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Energy Agile BioFoundry, Emeryville, CA 94608, USA
| | - Leonard Katz
- QB3 Institute, University of California, Berkeley, CA 94720, USA
| | - Pierre Baldi
- Department of Computer Science, University of California, Irvine, CA 92697, USA.,Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| | - Jay D Keasling
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA.,Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Energy Agile BioFoundry, Emeryville, CA 94608, USA.,QB3 Institute, University of California, Berkeley, CA 94720, USA.,Department of Bioengineering, University of California, Berkeley, CA 94720, USA.,Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK2970 Horsholm, Denmark
| |
Collapse
|
3
|
Backman TWH, Ando D, Singh J, Keasling JD, García Martín H. Constraining Genome-Scale Models to Represent the Bow Tie Structure of Metabolism for 13C Metabolic Flux Analysis. Metabolites 2018; 8:metabo8010003. [PMID: 29300340 PMCID: PMC5875993 DOI: 10.3390/metabo8010003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 12/23/2017] [Accepted: 01/02/2018] [Indexed: 12/19/2022] Open
Abstract
Determination of internal metabolic fluxes is crucial for fundamental and applied biology because they map how carbon and electrons flow through metabolism to enable cell function. 13C Metabolic Flux Analysis (13C MFA) and Two-Scale 13C Metabolic Flux Analysis (2S-13C MFA) are two techniques used to determine such fluxes. Both operate on the simplifying approximation that metabolic flux from peripheral metabolism into central “core” carbon metabolism is minimal, and can be omitted when modeling isotopic labeling in core metabolism. The validity of this “two-scale” or “bow tie” approximation is supported both by the ability to accurately model experimental isotopic labeling data, and by experimentally verified metabolic engineering predictions using these methods. However, the boundaries of core metabolism that satisfy this approximation can vary across species, and across cell culture conditions. Here, we present a set of algorithms that (1) systematically calculate flux bounds for any specified “core” of a genome-scale model so as to satisfy the bow tie approximation and (2) automatically identify an updated set of core reactions that can satisfy this approximation more efficiently. First, we leverage linear programming to simultaneously identify the lowest fluxes from peripheral metabolism into core metabolism compatible with the observed growth rate and extracellular metabolite exchange fluxes. Second, we use Simulated Annealing to identify an updated set of core reactions that allow for a minimum of fluxes into core metabolism to satisfy these experimental constraints. Together, these methods accelerate and automate the identification of a biologically reasonable set of core reactions for use with 13C MFA or 2S-13C MFA, as well as provide for a substantially lower set of flux bounds for fluxes into the core as compared with previous methods. We provide an open source Python implementation of these algorithms at https://github.com/JBEI/limitfluxtocore.
Collapse
Affiliation(s)
- Tyler W H Backman
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Agile BioFoundry, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
- QB3 Institute, University of California, Berkeley, CA 94720, USA.
| | - David Ando
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Agile BioFoundry, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | - Jahnavi Singh
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
- Department of Computer Science, University of California, Berkeley, CA 94720, USA.
| | - Jay D Keasling
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
- QB3 Institute, University of California, Berkeley, CA 94720, USA.
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA.
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2970 Horsholm, Denmark.
| | - Héctor García Martín
- Joint BioEnergy Institute, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Agile BioFoundry, 5885 Hollis Street, Emeryville, CA 94608, USA.
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| |
Collapse
|
4
|
Morrell WC, Birkel GW, Forrer M, Lopez T, Backman TWH, Dussault M, Petzold CJ, Baidoo EEK, Costello Z, Ando D, Alonso-Gutierrez J, George KW, Mukhopadhyay A, Vaino I, Keasling JD, Adams PD, Hillson NJ, Garcia Martin H. The Experiment Data Depot: A Web-Based Software Tool for Biological Experimental Data Storage, Sharing, and Visualization. ACS Synth Biol 2017; 6:2248-2259. [PMID: 28826210 DOI: 10.1021/acssynbio.7b00204] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Although recent advances in synthetic biology allow us to produce biological designs more efficiently than ever, our ability to predict the end result of these designs is still nascent. Predictive models require large amounts of high-quality data to be parametrized and tested, which are not generally available. Here, we present the Experiment Data Depot (EDD), an online tool designed as a repository of experimental data and metadata. EDD provides a convenient way to upload a variety of data types, visualize these data, and export them in a standardized fashion for use with predictive algorithms. In this paper, we describe EDD and showcase its utility for three different use cases: storage of characterized synthetic biology parts, leveraging proteomics data to improve biofuel yield, and the use of extracellular metabolite concentrations to predict intracellular metabolic fluxes.
Collapse
Affiliation(s)
- William C. Morrell
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biotechnology
and Bioengineering and Biomass Science and Conversion Department, Sandia National Laboratories, Livermore, California 94550, United States
| | - Garrett W. Birkel
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Mark Forrer
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biotechnology
and Bioengineering and Biomass Science and Conversion Department, Sandia National Laboratories, Livermore, California 94550, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
| | - Teresa Lopez
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biotechnology
and Bioengineering and Biomass Science and Conversion Department, Sandia National Laboratories, Livermore, California 94550, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
| | - Tyler W. H. Backman
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Michael Dussault
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
| | - Christopher J. Petzold
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Edward E. K. Baidoo
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Zak Costello
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - David Ando
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Jorge Alonso-Gutierrez
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Kevin W. George
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Aindrila Mukhopadhyay
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Ian Vaino
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
| | - Jay D. Keasling
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Department
of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
- Department
of Bioengineering, University of California, Berkeley, California 94720, United States
| | - Paul D. Adams
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Molecular
Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Nathan J. Hillson
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- DNA
Synthesis Science Program, DOE Joint Genome Institute, Walnut Creek, California 94598, United States
| | - Hector Garcia Martin
- DOE Joint BioEnergy Institute, Emeryville, California 94608, United States
- DOE Agile BioFoundry, Emeryville, California 94608, United States
- Biological
Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- BCAM, Basque Center for Applied Mathematics, 48009 Bilbao, Spain
| |
Collapse
|
5
|
Birkel GW, Ghosh A, Kumar VS, Weaver D, Ando D, Backman TWH, Arkin AP, Keasling JD, Martín HG. Erratum to: The JBEI quantitative metabolic modeling library (jQMM): a python library for modeling microbial metabolism. BMC Bioinformatics 2017; 18:219. [PMID: 28420344 PMCID: PMC5395749 DOI: 10.1186/s12859-017-1631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 04/11/2017] [Indexed: 11/10/2022] Open
Affiliation(s)
- Garrett W Birkel
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,DOE Agile BioFoundry, Emeryville, CA, USA
| | - Amit Ghosh
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,School of Energy Science and Engineering, Indian Institute of Technology (IIT), Kharagpur, India
| | - Vinay S Kumar
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA
| | - Daniel Weaver
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA
| | - David Ando
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA
| | - Tyler W H Backman
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,DOE Agile BioFoundry, Emeryville, CA, USA
| | - Adam P Arkin
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Department of Bioengineering, University of California, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jay D Keasling
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA, USA.,Department of Bioengineering, University of California, Berkeley, CA, USA.,Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK2970, Hørsholm, Denmark
| | - Héctor García Martín
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,Joint BioEnergy Institute, Emeryville, CA, USA. .,DOE Agile BioFoundry, Emeryville, CA, USA. .,BCAM, Basque Center for Applied Mathematics, Bilbao, Spain.
| |
Collapse
|
6
|
Birkel GW, Ghosh A, Kumar VS, Weaver D, Ando D, Backman TWH, Arkin AP, Keasling JD, Martín HG. The JBEI quantitative metabolic modeling library (jQMM): a python library for modeling microbial metabolism. BMC Bioinformatics 2017; 18:205. [PMID: 28381205 PMCID: PMC5382524 DOI: 10.1186/s12859-017-1615-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 03/25/2017] [Indexed: 01/25/2023] Open
Abstract
Background Modeling of microbial metabolism is a topic of growing importance in biotechnology. Mathematical modeling helps provide a mechanistic understanding for the studied process, separating the main drivers from the circumstantial ones, bounding the outcomes of experiments and guiding engineering approaches. Among different modeling schemes, the quantification of intracellular metabolic fluxes (i.e. the rate of each reaction in cellular metabolism) is of particular interest for metabolic engineering because it describes how carbon and energy flow throughout the cell. In addition to flux analysis, new methods for the effective use of the ever more readily available and abundant -omics data (i.e. transcriptomics, proteomics and metabolomics) are urgently needed. Results The jQMM library presented here provides an open-source, Python-based framework for modeling internal metabolic fluxes and leveraging other -omics data for the scientific study of cellular metabolism and bioengineering purposes. Firstly, it presents a complete toolbox for simultaneously performing two different types of flux analysis that are typically disjoint: Flux Balance Analysis and 13C Metabolic Flux Analysis. Moreover, it introduces the capability to use 13C labeling experimental data to constrain comprehensive genome-scale models through a technique called two-scale 13C Metabolic Flux Analysis (2S-13C MFA). In addition, the library includes a demonstration of a method that uses proteomics data to produce actionable insights to increase biofuel production. Finally, the use of the jQMM library is illustrated through the addition of several Jupyter notebook demonstration files that enhance reproducibility and provide the capability to be adapted to the user’s specific needs. Conclusions jQMM will facilitate the design and metabolic engineering of organisms for biofuels and other chemicals, as well as investigations of cellular metabolism and leveraging -omics data. As an open source software project, we hope it will attract additions from the community and grow with the rapidly changing field of metabolic engineering. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1615-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Garrett W Birkel
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,DOE Agile BioFoundry, Emeryville, CA, USA
| | - Amit Ghosh
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,School of Energy Science and Engineering, Indian Institute of Technology (IIT), Kharagpur, India
| | - Vinay S Kumar
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA
| | - Daniel Weaver
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA
| | - David Ando
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA
| | - Tyler W H Backman
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,DOE Agile BioFoundry, Emeryville, CA, USA
| | - Adam P Arkin
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Department of Bioengineering, University of California, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jay D Keasling
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,Joint BioEnergy Institute, Emeryville, CA, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA, USA.,Department of Bioengineering, University of California, Berkeley, CA, USA.,Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Hørsholm, DK2970, Denmark
| | - Héctor García Martín
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,Joint BioEnergy Institute, Emeryville, CA, USA. .,DOE Agile BioFoundry, Emeryville, CA, USA. .,BCAM, Basque Center for Applied Mathematics, Bilbao, Spain.
| |
Collapse
|
7
|
Abstract
Background Next-generation sequencing (NGS) has revolutionized how research is carried out in many areas of biology and medicine. However, the analysis of NGS data remains a major obstacle to the efficient utilization of the technology, as it requires complex multi-step processing of big data demanding considerable computational expertise from users. While substantial effort has been invested on the development of software dedicated to the individual analysis steps of NGS experiments, insufficient resources are currently available for integrating the individual software components within the widely used R/Bioconductor environment into automated workflows capable of running the analysis of most types of NGS applications from start-to-finish in a time-efficient and reproducible manner. Results To address this need, we have developed the R/Bioconductor package systemPipeR. It is an extensible environment for both building and running end-to-end analysis workflows with automated report generation for a wide range of NGS applications. Its unique features include a uniform workflow interface across different NGS applications, automated report generation, and support for running both R and command-line software on local computers and computer clusters. A flexible sample annotation infrastructure efficiently handles complex sample sets and experimental designs. To simplify the analysis of widely used NGS applications, the package provides pre-configured workflows and reporting templates for RNA-Seq, ChIP-Seq, VAR-Seq and Ribo-Seq. Additional workflow templates will be provided in the future. Conclusions systemPipeR accelerates the extraction of reproducible analysis results from NGS experiments. By combining the capabilities of many R/Bioconductor and command-line tools, it makes efficient use of existing software resources without limiting the user to a set of predefined methods or environments. systemPipeR is freely available for all common operating systems from Bioconductor (http://bioconductor.org/packages/devel/systemPipeR). Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1241-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tyler W H Backman
- Institute for Integrative Genome Biology, University of California, Riverside, 1207F Genomics Building, 3401 Watkins Drive, Riverside, 92521, CA, USA
| | - Thomas Girke
- Institute for Integrative Genome Biology, University of California, Riverside, 1207F Genomics Building, 3401 Watkins Drive, Riverside, 92521, CA, USA.
| |
Collapse
|
8
|
Abstract
This article gives an overview of basic computational methods that are commonly used for analyzing small molecule screening data in the chemical genomics field. First, we introduce cheminformatic concepts for analyzing drug-like small molecule structures and their properties. Second, we introduce compound selection approaches for assembling screening libraries using compound property and diversity analyses. Finally, we discuss methods for interpreting screening hits by analyzing compound structures and induced phenotypes using similarity search and clustering approaches. These are critical steps for optimizing screening hits, and relating structure to bioactivity and phenotype.
Collapse
Affiliation(s)
- Tyler W H Backman
- Department of Bioengineering, University of California Riverside, Riverside, CA, USA
| | | |
Collapse
|
9
|
Abstract
MOTIVATION The ability to accurately measure structural similarities among small molecules is important for many analysis routines in drug discovery and chemical genomics. Algorithms used for this purpose include fragment-based fingerprint and graph-based maximum common substructure (MCS) methods. MCS approaches provide one of the most accurate similarity measures. However, their rigid matching policies limit them to the identification of perfect MCSs. To eliminate this restriction, we introduce a new mismatch tolerant search method for identifying flexible MCSs (FMCSs) containing a user-definable number of atom and/or bond mismatches. RESULTS The fmcsR package provides an R interface, with the time-consuming steps of the FMCS algorithm implemented in C++. It includes utilities for pairwise compound comparisons, structure similarity searching, clustering and visualization of MCSs. In comparison with an existing MCS tool, fmcsR shows better time performance over a wide range of compound sizes. When mismatching of atoms or bonds is turned on, the compute times increase as expected, and the resulting FMCSs are often substantially larger than their strict MCS counterparts. Based on extensive virtual screening (VS) tests, the flexible matching feature enhances the enrichment of active structures at the top of MCS-based similarity search results. With respect to overall and early enrichment performance, FMCS outperforms most of the seven other VS methods considered in these tests. AVAILABILITY fmcsR is freely available for all common operating systems from the Bioconductor site (http://www.bioconductor.org/packages/devel/bioc/html/fmcsR.html). CONTACT thomas.girke@ucr.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yan Wang
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | | | | | | |
Collapse
|
10
|
Abstract
ChemMine Tools is an online service for small molecule data analysis. It provides a web interface to a set of cheminformatics and data mining tools that are useful for various analysis routines performed in chemical genomics and drug discovery. The service also offers programmable access options via the R library ChemmineR. The primary functionalities of ChemMine Tools fall into five major application areas: data visualization, structure comparisons, similarity searching, compound clustering and prediction of chemical properties. First, users can upload compound data sets to the online Compound Workbench. Numerous utilities are provided for compound viewing, structure drawing and format interconversion. Second, pairwise structural similarities among compounds can be quantified. Third, interfaces to ultra-fast structure similarity search algorithms are available to efficiently mine the chemical space in the public domain. These include fingerprint and embedding/indexing algorithms. Fourth, the service includes a Clustering Toolbox that integrates cheminformatic algorithms with data mining utilities to enable systematic structure and activity based analyses of custom compound sets. Fifth, physicochemical property descriptors of custom compound sets can be calculated. These descriptors are important for assessing the bioactivity profile of compounds in silico and quantitative structure—activity relationship (QSAR) analyses. ChemMine Tools is available at: http://chemmine.ucr.edu.
Collapse
Affiliation(s)
- Tyler W H Backman
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
| | | | | |
Collapse
|
11
|
Fahlgren N, Sullivan CM, Kasschau KD, Chapman EJ, Cumbie JS, Montgomery TA, Gilbert SD, Dasenko M, Backman TWH, Givan SA, Carrington JC. Computational and analytical framework for small RNA profiling by high-throughput sequencing. RNA 2009; 15:992-1002. [PMID: 19307293 PMCID: PMC2673065 DOI: 10.1261/rna.1473809] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The advent of high-throughput sequencing (HTS) methods has enabled direct approaches to quantitatively profile small RNA populations. However, these methods have been limited by several factors, including representational artifacts and lack of established statistical methods of analysis. Furthermore, massive HTS data sets present new problems related to data processing and mapping to a reference genome. Here, we show that cluster-based sequencing-by-synthesis technology is highly reproducible as a quantitative profiling tool for several classes of small RNA from Arabidopsis thaliana. We introduce the use of synthetic RNA oligoribonucleotide standards to facilitate objective normalization between HTS data sets, and adapt microarray-type methods for statistical analysis of multiple samples. These methods were tested successfully using mutants with small RNA biogenesis (miRNA-defective dcl1 mutant and siRNA-defective dcl2 dcl3 dcl4 triple mutant) or effector protein (ago1 mutant) deficiencies. Computational methods were also developed to rapidly and accurately parse, quantify, and map small RNA data.
Collapse
Affiliation(s)
- Noah Fahlgren
- Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon 97331, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Backman TWH, Sullivan CM, Cumbie JS, Miller ZA, Chapman EJ, Fahlgren N, Givan SA, Carrington JC, Kasschau KD. Update of ASRP: the Arabidopsis Small RNA Project database. Nucleic Acids Res 2007; 36:D982-5. [PMID: 17999994 PMCID: PMC2238918 DOI: 10.1093/nar/gkm997] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Development of the Arabidopsis Small RNA Project (ASRP) Database, which provides information and tools for the analysis of microRNA, endogenous siRNA and other small RNA-related features, has been driven by the introduction of high-throughput sequencing technology. To accommodate the demands of increased data, numerous improvements and updates have been made to ASRP, including new ways to access data, more efficient algorithms for handling data, and increased integration with community-wide resources. New search and visualization tools have also been developed to improve access to small RNA classes and their targets. ASRP is publicly available through a web interface at http://asrp.cgrb.oregonstate.edu/db/
Collapse
Affiliation(s)
- Tyler W H Backman
- Center for Genome Research and Biocomputing, Department of Botany and Plant Pathology and Molecular and Cellular Biology Program, Oregon State University, Corvallis, OR 97331, USA
| | | | | | | | | | | | | | | | | |
Collapse
|