51
|
Khaleghi MK, Savizi ISP, Lewis NE, Shojaosadati SA. Synergisms of machine learning and constraint-based modeling of metabolism for analysis and optimization of fermentation parameters. Biotechnol J 2021; 16:e2100212. [PMID: 34390201 DOI: 10.1002/biot.202100212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 11/06/2022]
Abstract
Recent noteworthy advances in the development of high-performing microbial and mammalian strains have enabled the sustainable production of bio-economically valuable substances such as bio-compounds, biofuels, and biopharmaceuticals. However, to obtain an industrially viable mass-production scheme, much time and effort are required. The robust and rational design of fermentation processes requires analysis and optimization of different extracellular conditions and medium components, which have a massive effect on growth and productivity. In this regard, knowledge- and data-driven modeling methods have received much attention. Constraint-based modeling (CBM) is a knowledge-driven mathematical approach that has been widely used in fermentation analysis and optimization due to its capabilities of predicting the cellular phenotype from genotype through high-throughput means. On the other hand, machine learning (ML) is a data-driven statistical method that identifies the data patterns within sophisticated biological systems and processes, where there is inadequate knowledge to represent underlying mechanisms. Furthermore, ML models are becoming a viable complement to constraint-based models in a reciprocal manner when one is used as a pre-step of another. As a result, more predictable model is produced. This review highlights the applications of CBM and ML independently and the combination of these two approaches for analyzing and optimizing fermentation parameters. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mohammad Karim Khaleghi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Iman Shahidi Pour Savizi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, USA.,Department of Pediatrics, University of California, San Diego, USA
| | - Seyed Abbas Shojaosadati
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
52
|
Sahu A, Blätke MA, Szymański JJ, Töpfer N. Advances in flux balance analysis by integrating machine learning and mechanism-based models. Comput Struct Biotechnol J 2021; 19:4626-4640. [PMID: 34471504 PMCID: PMC8382995 DOI: 10.1016/j.csbj.2021.08.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 08/03/2021] [Accepted: 08/03/2021] [Indexed: 02/08/2023] Open
Abstract
The availability of multi-omics data sets and genome-scale metabolic models for various organisms provide a platform for modeling and analyzing genotype-to-phenotype relationships. Flux balance analysis is the main tool for predicting flux distributions in genome-scale metabolic models and various data-integrative approaches enable modeling context-specific network behavior. Due to its linear nature, this optimization framework is readily scalable to multi-tissue or -organ and even multi-organism models. However, both data and model size can hamper a straightforward biological interpretation of the estimated fluxes. Moreover, flux balance analysis simulates metabolism at steady-state and thus, in its most basic form, does not consider kinetics or regulatory events. The integration of flux balance analysis with complementary data analysis and modeling techniques offers the potential to overcome these challenges. In particular machine learning approaches have emerged as the tool of choice for data reduction and selection of most important variables in big data sets. Kinetic models and formal languages can be used to simulate dynamic behavior. This review article provides an overview of integrative studies that combine flux balance analysis with machine learning approaches, kinetic models, such as physiology-based pharmacokinetic models, and formal graphical modeling languages, such as Petri nets. We discuss the mathematical aspects and biological applications of these integrated approaches and outline challenges and future perspectives.
Collapse
Affiliation(s)
- Ankur Sahu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Mary-Ann Blätke
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Jędrzej Jakub Szymański
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Nadine Töpfer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| |
Collapse
|
54
|
Magazzù G, Zampieri G, Angione C. Multimodal regularised linear models with flux balance analysis for mechanistic integration of omics data. Bioinformatics 2021; 37:3546-3552. [PMID: 33974036 DOI: 10.1093/bioinformatics/btab324] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 01/06/2021] [Accepted: 04/27/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION High-throughput biological data, thanks to technological advances, have become cheaper to collect, leading to the availability of vast amounts of omic data of different types. In parallel, the in silico reconstruction and modelling of metabolic systems is now acknowledged as a key tool to complement experimental data on a large scale. The integration of these model- and data-driven information is therefore emerging as a new challenge in systems biology, with no clear guidance on how to better take advantage of the inherent multi-source and multi-omic nature of these data types while preserving mechanistic interpretation. RESULTS Here we investigate different regularisation techniques for high-dimensional data derived from the integration of gene expression profiles with metabolic flux data, extracted from strain-specific metabolic models, to improve cellular growth rate predictions. To this end, we propose ad-hoc extensions of previous regularisation frameworks including group, view-specific and principal component regularisation, and experimentally compare them using data from 1,143 Saccharomyces cerevisiae strains. We observe a divergence between methods in terms of regression accuracy and integration effectiveness based on the type of regularisation employed. In multi-omic regression tasks, when learning from experimental and model-generated omic data, our results demonstrate the competitiveness and ease of interpretation of multimodal regularised linear models compared to data-hungry methods based on neural networks. AVAILABILITY All data, models, and code produced in this work are available on GitHub at https://github.com/Angione-Lab/HybridGroupIPFLasso_pc2Lasso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Giuseppe Magazzù
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
| | - Guido Zampieri
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.,Department of Biology, University of Padova, Padova, Italy
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK.,Healthcare Innovation Centre, Teesside University, Middlesbrough, UK.,Centre for Digital Innovation, Teesside University, Middlesbrough, UK
| |
Collapse
|
55
|
Lawson CE, Martí JM, Radivojevic T, Jonnalagadda SVR, Gentz R, Hillson NJ, Peisert S, Kim J, Simmons BA, Petzold CJ, Singer SW, Mukhopadhyay A, Tanjore D, Dunn JG, Garcia Martin H. Machine learning for metabolic engineering: A review. Metab Eng 2020; 63:34-60. [PMID: 33221420 DOI: 10.1016/j.ymben.2020.10.005] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/22/2020] [Accepted: 10/31/2020] [Indexed: 12/14/2022]
Abstract
Machine learning provides researchers a unique opportunity to make metabolic engineering more predictable. In this review, we offer an introduction to this discipline in terms that are relatable to metabolic engineers, as well as providing in-depth illustrative examples leveraging omics data and improving production. We also include practical advice for the practitioner in terms of data management, algorithm libraries, computational resources, and important non-technical issues. A variety of applications ranging from pathway construction and optimization, to genetic editing optimization, cell factory testing, and production scale-up are discussed. Moreover, the promising relationship between machine learning and mechanistic models is thoroughly reviewed. Finally, the future perspectives and most promising directions for this combination of disciplines are examined.
Collapse
Affiliation(s)
- Christopher E Lawson
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA
| | - Jose Manuel Martí
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Tijana Radivojevic
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Sai Vamshi R Jonnalagadda
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Reinhard Gentz
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Nathan J Hillson
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Sean Peisert
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; University of California Davis, Davis, CA, 95616, USA
| | - Joonhoon Kim
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Pacific Northwest National Laboratory, Richland, 99354, WA, USA
| | - Blake A Simmons
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Christopher J Petzold
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Steven W Singer
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA
| | - Aindrila Mukhopadhyay
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, USA
| | - Deepti Tanjore
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Advanced Biofuels and Bioproducts Process Development Unit, Emeryville, CA, 94608, USA
| | | | - Hector Garcia Martin
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA; Basque Center for Applied Mathematics, 48009, Bilbao, Spain; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, USA.
| |
Collapse
|
56
|
Vijayakumar S, Rahman PKSM, Angione C. A Hybrid Flux Balance Analysis and Machine Learning Pipeline Elucidates Metabolic Adaptation in Cyanobacteria. iScience 2020; 23:101818. [PMID: 33354660 PMCID: PMC7744713 DOI: 10.1016/j.isci.2020.101818] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 10/23/2020] [Accepted: 11/13/2020] [Indexed: 01/20/2023] Open
Abstract
Machine learning has recently emerged as a promising tool for inferring multi-omic relationships in biological systems. At the same time, genome-scale metabolic models (GSMMs) can be integrated with such multi-omic data to refine phenotypic predictions. In this work, we use a multi-omic machine learning pipeline to analyze a GSMM of Synechococcus sp. PCC 7002, a cyanobacterium with large potential to produce renewable biofuels. We use regularized flux balance analysis to observe flux response between conditions across photosynthesis and energy metabolism. We then incorporate principal-component analysis, k-means clustering, and LASSO regularization to reduce dimensionality and extract key cross-omic features. Our results suggest that combining metabolic modeling with machine learning elucidates mechanisms used by cyanobacteria to cope with fluctuations in light intensity and salinity that cannot be detected using transcriptomics alone. Furthermore, GSMMs introduce critical mechanistic details that improve the performance of omic-based machine learning methods. A pipeline for metabolic modeling in Synechococcus sp. PCC 7002 is presented Metabolic fluxes display clear differences in pathway activity across conditions Omic-informed GSMMs provide critical mechanistic details within machine learning Combining GSMM and machine learning improves methods based on transcriptomics alone
Collapse
Affiliation(s)
- Supreeta Vijayakumar
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, North Yorkshire TS1 3BX, UK
| | - Pattanathu K S M Rahman
- Centre for Enzyme Innovation, Institute of Biological and Biomedical Sciences, School of Biological Sciences, University of Portsmouth, Portsmouth, Hampshire PO1 2UP, UK.,Tara Biologics, Woking, Surrey GU21 6BP, UK
| | - Claudio Angione
- Department of Computer Science and Information Systems, Teesside University, Middlesbrough, North Yorkshire TS1 3BX, UK.,Centre for Digital Innovation, Teesside University, Middlesbrough TS1 3BX, UK.,Healthcare Innovation Centre, Teesside University, Middlesbrough TS1 3BX, UK
| |
Collapse
|