1
|
A General Hybrid Modeling Framework for Systems Biology Applications: Combining Mechanistic Knowledge with Deep Neural Networks under the SBML Standard. AI 2023. [DOI: 10.3390/ai4010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
In this paper, a computational framework is proposed that merges mechanistic modeling with deep neural networks obeying the Systems Biology Markup Language (SBML) standard. Over the last 20 years, the systems biology community has developed a large number of mechanistic models that are currently stored in public databases in SBML. With the proposed framework, existing SBML models may be redesigned into hybrid systems through the incorporation of deep neural networks into the model core, using a freely available python tool. The so-formed hybrid mechanistic/neural network models are trained with a deep learning algorithm based on the adaptive moment estimation method (ADAM), stochastic regularization and semidirect sensitivity equations. The trained hybrid models are encoded in SBML and uploaded in model databases, where they may be further analyzed as regular SBML models. This approach is illustrated with three well-known case studies: the Escherichia coli threonine synthesis model, the P58IPK signal transduction model, and the Yeast glycolytic oscillations model. The proposed framework is expected to greatly facilitate the widespread use of hybrid modeling techniques for systems biology applications.
Collapse
|
2
|
Cuperlovic-Culf M, Nguyen-Tran T, Bennett SAL. Machine Learning and Hybrid Methods for Metabolic Pathway Modeling. Methods Mol Biol 2023; 2553:417-439. [PMID: 36227553 DOI: 10.1007/978-1-0716-2617-7_18] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Computational cell metabolism models seek to provide metabolic explanations of cell behavior under different conditions or following genetic alterations, help in the optimization of in vitro cell growth environments, or predict cellular behavior in vivo and in vitro. In the extremes, mechanistic models can include highly detailed descriptions of a small number of metabolic reactions or an approximate representation of an entire metabolic network. To date, all mechanistic models have required details of individual metabolic reactions, either kinetic parameters or metabolic flux, as well as information about extracellular and intracellular metabolite concentrations. Despite the extensive efforts and the increasing availability of high-quality data, required in vivo data are not available for the majority of known metabolic reactions; thus, mechanistic models are based primarily on ex vivo kinetic measurements and limited flux information. Machine learning approaches provide an alternative for derivation of functional dependencies from existing data. The increasing availability of metabolomic and lipidomic data, with growing feature coverage as well as sample set size, is expected to provide new data options needed for derivation of machine learning models of cell metabolic processes. Moreover, machine learning analysis of longitudinal data can lead to predictive models of cell behaviors over time. Conversely, machine learning models trained on steady-state data can provide descriptive models for the comparison of metabolic states in different environments or disease conditions. Additionally, inclusion of metabolic network knowledge in these analyses can further help in the development of models with limited data.This chapter will explore the application of machine learning to the modeling of cell metabolism. We first provide a theoretical explanation of several machine learning and hybrid mechanistic machine learning methods currently being explored to model metabolism. Next, we introduce several avenues for improving these models with machine learning. Finally, we provide protocols for specific examples of the utilization of machine learning in the development of predictive cell metabolism models using metabolomic data. We describe data preprocessing, approaches for training of machine learning models for both descriptive and predictive models, and the utilization of these models in synthetic and systems biology. Detailed protocols provide a list of software tools and libraries used for these applications, step-by-step modeling protocols, troubleshooting, as well as an overview of existing limitations to these approaches.
Collapse
Affiliation(s)
- Miroslava Cuperlovic-Culf
- Digital Technologies Research Centre, National Research Council of Canada, Ottawa, ON, Canada.
- Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, Ottawa, ON, Canada.
| | - Thao Nguyen-Tran
- Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, Ottawa, ON, Canada
- Neural Regeneration Laboratory, Ottawa Institute of Systems Biology, Brain and Mind Research Institute, University of Ottawa, Ottawa, ON, Canada
- Department of Chemistry and Biomolecular Sciences, Centre for Catalysis Research and Innovation, University of Ottawa, Ottawa, ON, Canada
| | - Steffany A L Bennett
- Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, Ottawa, ON, Canada
- Neural Regeneration Laboratory, Ottawa Institute of Systems Biology, Brain and Mind Research Institute, University of Ottawa, Ottawa, ON, Canada
- Department of Chemistry and Biomolecular Sciences, Centre for Catalysis Research and Innovation, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
3
|
Ramos JRC, Bissinger T, Genzel Y, Reichl U. Impact of Influenza A Virus Infection on Growth and Metabolism of Suspension MDCK Cells Using a Dynamic Model. Metabolites 2022; 12:metabo12030239. [PMID: 35323683 PMCID: PMC8950586 DOI: 10.3390/metabo12030239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 03/04/2022] [Accepted: 03/09/2022] [Indexed: 11/21/2022] Open
Abstract
Cell cultured-based influenza virus production is a viable option for vaccine manufacturing. In order to achieve a high concentration of viable cells, is requirement to have not only optimal process conditions, but also an active metabolism capable of intracellular synthesis of viral components. Experimental metabolic data collected in such processes are complex and difficult to interpret, for which mathematical models are an appropriate way to simulate and analyze the complex and dynamic interaction between the virus and its host cell. A dynamic model with 35 states was developed in this study to describe growth, metabolism, and influenza A virus production in shake flask cultivations of suspension Madin-Darby Canine Kidney (MDCK) cells. It considers cell growth (concentration of viable cells, mean cell diameters, volume of viable cells), concentrations of key metabolites both at the intracellular and extracellular level and virus titers. Using one set of parameters, the model accurately simulates the dynamics of mock-infected cells and correctly predicts the overall dynamics of virus-infected cells for up to 60 h post infection (hpi). The model clearly suggests that most changes observed after infection are related to cessation of cell growth and the subsequent transition to apoptosis and cell death. However, predictions do not cover late phases of infection, particularly for the extracellular concentrations of glutamate and ammonium after about 12 hpi. Results obtained from additional in silico studies performed indicated that amino acid degradation by extracellular enzymes resulting from cell lysis during late infection stages may contribute to this observed discrepancy.
Collapse
Affiliation(s)
- João Rodrigues Correia Ramos
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106 Magdeburg, Germany; (T.B.); (Y.G.); (U.R.)
- Correspondence:
| | - Thomas Bissinger
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106 Magdeburg, Germany; (T.B.); (Y.G.); (U.R.)
| | - Yvonne Genzel
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106 Magdeburg, Germany; (T.B.); (Y.G.); (U.R.)
| | - Udo Reichl
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106 Magdeburg, Germany; (T.B.); (Y.G.); (U.R.)
- Institute of Process Engineering, Faculty of Process & Systems Engineering, Otto-von-Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
| |
Collapse
|
4
|
Dai W, Mohammadi S, Cremaschi S. A hybrid modeling framework using dimensional analysis for erosion predictions. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107577] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
5
|
Lee D, Jayaraman A, Kwon JS. Development of a hybrid model for a partially known intracellular signaling pathway through correction term estimation and neural network modeling. PLoS Comput Biol 2020; 16:e1008472. [PMID: 33315899 PMCID: PMC7769624 DOI: 10.1371/journal.pcbi.1008472] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/28/2020] [Accepted: 10/26/2020] [Indexed: 12/30/2022] Open
Abstract
Developing an accurate first-principle model is an important step in employing systems biology approaches to analyze an intracellular signaling pathway. However, an accurate first-principle model is difficult to be developed since it requires in-depth mechanistic understandings of the signaling pathway. Since underlying mechanisms such as the reaction network structure are not fully understood, significant discrepancy exists between predicted and actual signaling dynamics. Motivated by these considerations, this work proposes a hybrid modeling approach that combines a first-principle model and an artificial neural network (ANN) model so that predictions of the hybrid model surpass those of the original model. First, the proposed approach determines an optimal subset of model states whose dynamics should be corrected by the ANN by examining the correlation between each state and outputs through relative order. Second, an L2-regularized least-squares problem is solved to infer values of the correction terms that are necessary to minimize the discrepancy between the model predictions and available measurements. Third, an ANN is developed to generalize relationships between the values of the correction terms and the system dynamics. Lastly, the original first-principle model is coupled with the developed ANN to finalize the hybrid model development so that the model will possess generalized prediction capabilities while retaining the model interpretability. We have successfully validated the proposed methodology with two case studies, simplified apoptosis and lipopolysaccharide-induced NFκB signaling pathways, to develop hybrid models with in silico and in vitro measurements, respectively. An intracellular signaling pathway is often represented by a set of nonlinear ordinary differential equations, which translate our current knowledge about the signaling pathway into a testable mathematical model. However, predictions from such models are often subject to high uncertainty since many signaling pathways are only partially known beforehand. In this study, we propose a systematic approach to develop a hybrid model to improve model accuracy by combining machine learning and the first-principle modeling. Specifically, model correction terms are learned from discrepancy between model predictions and measurements, and these terms are added to the first-principle model to enhance the prediction accuracy. Once these correction terms are learned from the data, an artificial neural network (ANN) model is developed to find an empirical relation between the model and the correction terms so that the developed ANN can be used to posses improved predictive capabilities even in new operating conditions (i.e., generalizability). The final hybrid model is then constructed by coupling the first-principle model with the developed ANN.
Collapse
Affiliation(s)
- Dongheon Lee
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
- Texas A&M Energy Institute, Texas A&M University, College Station, Texas, USA
| | - Arul Jayaraman
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
- Department of Biomedical Engineering, Texas A&M University, College Station, Texas, USA
| | - Joseph S. Kwon
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
- Texas A&M Energy Institute, Texas A&M University, College Station, Texas, USA
- * E-mail:
| |
Collapse
|
6
|
Richelle A, David B, Demaegd D, Dewerchin M, Kinet R, Morreale A, Portela R, Zune Q, von Stosch M. Towards a widespread adoption of metabolic modeling tools in biopharmaceutical industry: a process systems biology engineering perspective. NPJ Syst Biol Appl 2020; 6:6. [PMID: 32170148 PMCID: PMC7070029 DOI: 10.1038/s41540-020-0127-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 02/12/2020] [Indexed: 01/09/2023] Open
Abstract
In biotechnology, the emergence of high-throughput technologies challenges the interpretation of large datasets. One way to identify meaningful outcomes impacting process and product attributes from large datasets is using systems biology tools such as metabolic models. However, these tools are still not fully exploited for this purpose in industrial context due to gaps in our knowledge and technical limitations. In this paper, key aspects restraining the routine implementation of these tools are highlighted in three research fields: monitoring, network science and hybrid modeling. Advances in these fields could expand the current state of systems biology applications in biopharmaceutical industry to address existing challenges in bioprocess development and improvement.
Collapse
|
7
|
|
8
|
Pinto J, de Azevedo CR, Oliveira R, von Stosch M. A bootstrap-aggregated hybrid semi-parametric modeling framework for bioprocess development. Bioprocess Biosyst Eng 2019; 42:1853-1865. [DOI: 10.1007/s00449-019-02181-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 07/23/2019] [Indexed: 12/01/2022]
|
9
|
Portela RMC, von Stosch M, Oliveira R. Hybrid semiparametric systems for quantitative sequence-activity modeling of synthetic biological parts. Synth Biol (Oxf) 2018; 3:ysy010. [PMID: 32995518 PMCID: PMC7513808 DOI: 10.1093/synbio/ysy010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 05/21/2018] [Accepted: 06/11/2018] [Indexed: 12/20/2022] Open
Abstract
Predicting the activity of modified biological parts is difficult due to the typically large size of nucleotide sequences, resulting in combinatorial designs that suffer from the "curse of dimensionality" problem. Mechanistic design methods are often limited by knowledge availability. Empirical methods typically require large data sets, which are difficult and/or costly to obtain. In this study, we explore for the first time the combination of both approaches within a formal hybrid semiparametric framework in an attempt to overcome the limitations of the current approaches. Protein translation as a function of the 5' untranslated region sequence in Escherichia coli is taken as case study. Thermodynamic modeling, partial least squares (PLS) and hybrid parallel combinations thereof are compared for different data sets and data partitioning scenarios. The results suggest a significant and systematic reduction of both calibration and prediction errors by the hybrid approach in comparison to standalone thermodynamic or PLS modeling. Although with different magnitudes, improvements are observed irrespective of sample size and partitioning method. All in all the results suggest an increase of predictive power by the hybrid method potentially leading to a more efficient design of biological parts.
Collapse
Affiliation(s)
- Rui M C Portela
- REQUIMTE/LAQV, Departamento de Química, Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa, Caparica, Portugal
| | - Moritz von Stosch
- CEAM Faculty of Science, Agriculture and Engineering, Newcastle University, Newcastle upon Tyne, UK
| | - Rui Oliveira
- REQUIMTE/LAQV, Departamento de Química, Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa, Caparica, Portugal
| |
Collapse
|
10
|
Bardini R, Politano G, Benso A, Di Carlo S. Multi-level and hybrid modelling approaches for systems biology. Comput Struct Biotechnol J 2017; 15:396-402. [PMID: 28855977 PMCID: PMC5565741 DOI: 10.1016/j.csbj.2017.07.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 06/28/2017] [Accepted: 07/31/2017] [Indexed: 01/27/2023] Open
Abstract
During the last decades, high-throughput techniques allowed for the extraction of a huge amount of data from biological systems, unveiling more of their underling complexity. Biological systems encompass a wide range of space and time scales, functioning according to flexible hierarchies of mechanisms making an intertwined and dynamic interplay of regulations. This becomes particularly evident in processes such as ontogenesis, where regulative assets change according to process context and timing, making structural phenotype and architectural complexities emerge from a single cell, through local interactions. The information collected from biological systems are naturally organized according to the functional levels composing the system itself. In systems biology, biological information often comes from overlapping but different scientific domains, each one having its own way of representing phenomena under study. That is, the different parts of the system to be modelled may be described with different formalisms. For a model to have improved accuracy and capability for making a good knowledge base, it is good to comprise different system levels, suitably handling the relative formalisms. Models which are both multi-level and hybrid satisfy both these requirements, making a very useful tool in computational systems biology. This paper reviews some of the main contributions in this field.
Collapse
Affiliation(s)
| | | | | | - S. Di Carlo
- Politecnico di Torino, Department of Control and Computer Engineering, 10129 Torino, Italy
| |
Collapse
|
11
|
Windhager L, Zierer J, Küffner R. Refining ensembles of predicted gene regulatory networks based on characteristic interaction sets. PLoS One 2014; 9:e84596. [PMID: 24498260 PMCID: PMC3911903 DOI: 10.1371/journal.pone.0084596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 11/14/2013] [Indexed: 11/30/2022] Open
Abstract
Different ensemble voting approaches have been successfully applied for reverse-engineering of gene regulatory networks. They are based on the assumption that a good approximation of true network structure can be derived by considering the frequencies of individual interactions in a large number of predicted networks. Such approximations are typically superior in terms of prediction quality and robustness as compared to considering a single best scoring network only. Nevertheless, ensemble approaches only work well if the predicted gene regulatory networks are sufficiently similar to each other. If the topologies of predicted networks are considerably different, an ensemble of all networks obscures interesting individual characteristics. Instead, networks should be grouped according to local topological similarities and ensemble voting performed for each group separately. We argue that the presence of sets of co-occurring interactions is a suitable indicator for grouping predicted networks. A stepwise bottom-up procedure is proposed, where first mutual dependencies between pairs of interactions are derived from predicted networks. Pairs of co-occurring interactions are subsequently extended to derive characteristic interaction sets that distinguish groups of networks. Finally, ensemble voting is applied separately to the resulting topologically similar groups of networks to create distinct group-ensembles. Ensembles of topologically similar networks constitute distinct hypotheses about the reference network structure. Such group-ensembles are easier to interpret as their characteristic topology becomes clear and dependencies between interactions are known. The availability of distinct hypotheses facilitates the design of further experiments to distinguish between plausible network structures. The proposed procedure is a reasonable refinement step for non-deterministic reverse-engineering applications that produce a large number of candidate predictions for a gene regulatory network, e.g. due to probabilistic optimization or a cross-validation procedure.
Collapse
Affiliation(s)
- Lukas Windhager
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Jonas Zierer
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Robert Küffner
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
- * E-mail:
| |
Collapse
|
12
|
von Stosch M, Oliveira R, Peres J, Feyo de Azevedo S. Hybrid semi-parametric modeling in process systems engineering: Past, present and future. Comput Chem Eng 2014. [DOI: 10.1016/j.compchemeng.2013.08.008] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|