1
|
van Sluijs B, Zhou T, Helwig B, Baltussen MG, Nelissen FHT, Heus HA, Huck WTS. Iterative design of training data to control intricate enzymatic reaction networks. Nat Commun 2024; 15:1602. [PMID: 38383500 PMCID: PMC10881569 DOI: 10.1038/s41467-024-45886-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 02/06/2024] [Indexed: 02/23/2024] Open
Abstract
Kinetic modeling of in vitro enzymatic reaction networks is vital to understand and control the complex behaviors emerging from the nonlinear interactions inside. However, modeling is severely hampered by the lack of training data. Here, we introduce a methodology that combines an active learning-like approach and flow chemistry to efficiently create optimized datasets for a highly interconnected enzymatic reactions network with multiple sub-pathways. The optimal experimental design (OED) algorithm designs a sequence of out-of-equilibrium perturbations to maximize the information about the reaction kinetics, yielding a descriptive model that allows control of the output of the network towards any cost function. We experimentally validate the model by forcing the network to produce different product ratios while maintaining a minimum level of overall conversion efficiency. Our workflow scales with the complexity of the system and enables the optimization of previously unobtainable network outputs.
Collapse
Affiliation(s)
- Bob van Sluijs
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Tao Zhou
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands.
| | - Britta Helwig
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Mathieu G Baltussen
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Frank H T Nelissen
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Hans A Heus
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands
| | - Wilhelm T S Huck
- Institute for Molecules and Materials, Radboud University, Nijmegen, AJ, The Netherlands.
| |
Collapse
|
2
|
Calzone L, Noël V, Barillot E, Kroemer G, Stoll G. Modeling signaling pathways in biology with MaBoSS: From one single cell to a dynamic population of heterogeneous interacting cells. Comput Struct Biotechnol J 2022; 20:5661-5671. [PMID: 36284705 PMCID: PMC9582792 DOI: 10.1016/j.csbj.2022.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 09/30/2022] [Accepted: 10/02/2022] [Indexed: 11/24/2022] Open
Abstract
As a result of the development of experimental technologies and the accumulation of data, biological and molecular processes can be described as complex networks of signaling pathways. These networks are often directed and signed, where nodes represent entities (genes/proteins) and arrows interactions. They are translated into mathematical models by adding a dynamic layer onto them. Such mathematical models help to understand and interpret non-intuitive experimental observations and to anticipate the response to external interventions such as drug effects on phenotypes. Several frameworks for modeling signaling pathways exist. The choice of the appropriate framework is often driven by the experimental context. In this review, we present MaBoSS, a tool based on Boolean modeling using a continuous time approach, which predicts time-dependent probabilities of entities in different biological contexts. MaBoSS was initially built to model the intracellular signaling in non-interacting homogeneous cell populations. MaBoSS was then adapted to model heterogeneous cell populations (EnsembleMaBoSS) by considering families of models rather than a unique model. To account for more complex questions, MaBoSS was extended to simulate dynamical interacting populations (UPMaBoSS), with a precise spatial distribution (PhysiBoSS). To illustrate all these levels of description, we show how each of these tools can be used with a running example of a simple model of cell fate decisions. Finally, we present practical applications to cancer biology and studies of the immune response.
Collapse
Affiliation(s)
- Laurence Calzone
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Vincent Noël
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, F-75005 Paris, France
- INSERM, U900, F-75005 Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006 Paris, France
| | - Guido Kroemer
- Centre de Recherche des Cordeliers, Equipe labellisé par la Ligue contre le cancer, Université de Paris Cité, Sorbonne Université, Inserm U1138, Institut Universitaire de France, Paris, France
- Metabolomics and Cell Biology Platforms, Institut Gustave Roussy, Villejuif, France
- Institut du Cancer Paris CARPEM, Department of Biology, Hôpital Europén Georges Pompidou, AP-HP, Paris, France
| | - Gautier Stoll
- Centre de Recherche des Cordeliers, Equipe labellisé par la Ligue contre le cancer, Université de Paris Cité, Sorbonne Université, Inserm U1138, Institut Universitaire de France, Paris, France
- Metabolomics and Cell Biology Platforms, Institut Gustave Roussy, Villejuif, France
| |
Collapse
|
3
|
Erdem C, Mutsuddy A, Bensman EM, Dodd WB, Saint-Antoine MM, Bouhaddou M, Blake RC, Gross SM, Heiser LM, Feltus FA, Birtwistle MR. A scalable, open-source implementation of a large-scale mechanistic model for single cell proliferation and death signaling. Nat Commun 2022; 13:3555. [PMID: 35729113 PMCID: PMC9213456 DOI: 10.1038/s41467-022-31138-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 06/07/2022] [Indexed: 02/01/2023] Open
Abstract
Mechanistic models of how single cells respond to different perturbations can help integrate disparate big data sets or predict response to varied drug combinations. However, the construction and simulation of such models have proved challenging. Here, we developed a python-based model creation and simulation pipeline that converts a few structured text files into an SBML standard and is high-performance- and cloud-computing ready. We applied this pipeline to our large-scale, mechanistic pan-cancer signaling model (named SPARCED) and demonstrate it by adding an IFNγ pathway submodel. We then investigated whether a putative crosstalk mechanism could be consistent with experimental observations from the LINCS MCF10A Data Cube that IFNγ acts as an anti-proliferative factor. The analyses suggested this observation can be explained by IFNγ-induced SOCS1 sequestering activated EGF receptors. This work forms a foundational recipe for increased mechanistic model-based data integration on a single-cell level, an important building block for clinically-predictive mechanistic models.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, SC, USA.
| | - Arnab Mutsuddy
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Ethan M Bensman
- Computer Science, School of Computing, Clemson University, Clemson, SC, USA
| | - William B Dodd
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Michael M Saint-Antoine
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Mehdi Bouhaddou
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
| | - Robert C Blake
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - F Alex Feltus
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
- Biomedical Data Science and Informatics Program, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Clemson, SC, USA
| | - Marc R Birtwistle
- Department of Chemical & Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
4
|
Sager S, Bernhardt F, Kehrle F, Merkert M, Potschka A, Meder B, Katus H, Scholz E. Expert-enhanced machine learning for cardiac arrhythmia classification. PLoS One 2021; 16:e0261571. [PMID: 34941897 PMCID: PMC8699667 DOI: 10.1371/journal.pone.0261571] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 12/05/2021] [Indexed: 12/12/2022] Open
Abstract
We propose a new method for the classification task of distinguishing atrial fibrillation (AFib) from regular atrial tachycardias including atrial flutter (AFlu) based on a surface electrocardiogram (ECG). Recently, many approaches for an automatic classification of cardiac arrhythmia were proposed and to our knowledge none of them can distinguish between these two. We discuss reasons why deep learning may not yield satisfactory results for this task. We generate new and clinically interpretable features using mathematical optimization for subsequent use within a machine learning (ML) model. These features are generated from the same input data by solving an additional regression problem with complicated combinatorial substructures. The resultant can be seen as a novel machine learning model that incorporates expert knowledge on the pathophysiology of atrial flutter. Our approach achieves an unprecedented accuracy of 82.84% and an area under the receiver operating characteristic (ROC) curve of 0.9, which classifies as "excellent" according to the classification indicator of diagnostic tests. One additional advantage of our approach is the inherent interpretability of the classification results. Our features give insight into a possibly occurring multilevel atrioventricular blocking mechanism, which may improve treatment decisions beyond the classification itself. Our research ideally complements existing textbook cardiac arrhythmia classification methods, which cannot provide a classification for the important case of AFib↔AFlu. The main contribution is the successful use of a novel mathematical model for multilevel atrioventricular block and optimization-driven inverse simulation to enhance machine learning for classification of the arguably most difficult cases in cardiac arrhythmia. A tailored Branch-and-Bound algorithm was implemented for the domain knowledge part, while standard algorithms such as Adam could be used for training.
Collapse
Affiliation(s)
- Sebastian Sager
- Department of Mathematics, Otto-von-Guericke University, Magdeburg, Germany
- Informatics for Life, Heidelberg, Germany
| | - Felix Bernhardt
- Department of Mathematics, Otto-von-Guericke University, Magdeburg, Germany
| | - Florian Kehrle
- Informatics for Life, Heidelberg, Germany
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Maximilian Merkert
- Institute of Optimization, Technical University Braunschweig, Braunschweig, Germany
| | - Andreas Potschka
- Institute of Mathematics, Clausthal University of Technology, Clausthal-Zellerfeld, Germany
| | - Benjamin Meder
- Informatics for Life, Heidelberg, Germany
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Hugo Katus
- Informatics for Life, Heidelberg, Germany
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
- German Centre for Cardiovascular Research, Heidelberg, Germany
| | - Eberhard Scholz
- Informatics for Life, Heidelberg, Germany
- GRN Gesundheitszentren Rhein-Neckar gGmbH, Schwetzingen, Germany
| |
Collapse
|
5
|
Hemmerich J, Tenhaef N, Wiechert W, Noack S. pyFOOMB: Python framework for object oriented modeling of bioprocesses. Eng Life Sci 2021; 21:242-257. [PMID: 33716622 PMCID: PMC7923582 DOI: 10.1002/elsc.202000088] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/09/2020] [Accepted: 12/09/2020] [Indexed: 12/21/2022] Open
Abstract
Quantitative characterization of biotechnological production processes requires the determination of different key performance indicators (KPIs) such as titer, rate and yield. Classically, these KPIs can be derived by combining black-box bioprocess modeling with non-linear regression for model parameter estimation. The presented pyFOOMB package enables a guided and flexible implementation of bioprocess models in the form of ordinary differential equation systems (ODEs). By building on Python as powerful and multi-purpose programing language, ODEs can be formulated in an object-oriented manner, which facilitates their modular design, reusability, and extensibility. Once the model is implemented, seamless integration and analysis of the experimental data is supported by various Python packages that are already available. In particular, for the iterative workflow of experimental data generation and subsequent model parameter estimation we employed the concept of replicate model instances, which are linked by common sets of parameters with global or local properties. For the description of multi-stage processes, discontinuities in the right-hand sides of the differential equations are supported via event handling using the freely available assimulo package. Optimization problems can be solved by making use of a parallelized version of the generalized island approach provided by the pygmo package. Furthermore, pyFOOMB in combination with Jupyter notebooks also supports education in bioprocess engineering and the applied learning of Python as scientific programing language. Finally, the applicability and strengths of pyFOOMB will be demonstrated by a comprehensive collection of notebook examples.
Collapse
Affiliation(s)
- Johannes Hemmerich
- Institute of Bio‐ and Geosciences ‐ IBG‐1: BiotechnologyForschungszentrum Jülich GmbHJülichGermany
| | - Niklas Tenhaef
- Institute of Bio‐ and Geosciences ‐ IBG‐1: BiotechnologyForschungszentrum Jülich GmbHJülichGermany
| | - Wolfgang Wiechert
- Institute of Bio‐ and Geosciences ‐ IBG‐1: BiotechnologyForschungszentrum Jülich GmbHJülichGermany
- Computational Systems Biotechnology (AVT.CSB)RWTH Aachen UniversityAachenGermany
- Bioeconomy Science Center (BioSC)Forschungszentrum JülichJülichGermany
| | - Stephan Noack
- Institute of Bio‐ and Geosciences ‐ IBG‐1: BiotechnologyForschungszentrum Jülich GmbHJülichGermany
- Bioeconomy Science Center (BioSC)Forschungszentrum JülichJülichGermany
| |
Collapse
|
6
|
Raimúndez E, Dudkin E, Vanhoefer J, Alamoudi E, Merkt S, Fuhrmann L, Bai F, Hasenauer J. COVID-19 outbreak in Wuhan demonstrates the limitations of publicly available case numbers for epidemiological modeling. Epidemics 2021; 34:100439. [PMID: 33556763 PMCID: PMC7845523 DOI: 10.1016/j.epidem.2021.100439] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 01/19/2021] [Accepted: 01/21/2021] [Indexed: 01/12/2023] Open
Abstract
Epidemiological models are widely used to analyze the spread of diseases such as the global COVID-19 pandemic caused by SARS-CoV-2. However, all models are based on simplifying assumptions and often on sparse data. This limits the reliability of parameter estimates and predictions. In this manuscript, we demonstrate the relevance of these limitations and the pitfalls associated with the use of overly simplistic models. We considered the data for the early phase of the COVID-19 outbreak in Wuhan, China, as an example, and perform parameter estimation, uncertainty analysis and model selection for a range of established epidemiological models. Amongst others, we employ Markov chain Monte Carlo sampling, parameter and prediction profile calculation algorithms. Our results show that parameter estimates and predictions obtained for several established models on the basis of reported case numbers can be subject to substantial uncertainty. More importantly, estimates were often unrealistic and the confidence/credibility intervals did not cover plausible values of critical parameters obtained using different approaches. These findings suggest, amongst others, that standard compartmental models can be overly simplistic and that the reported case numbers provide often insufficient information for obtaining reliable and realistic parameter values, and for forecasting the evolution of epidemics.
Collapse
Affiliation(s)
- Elba Raimúndez
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany; Technische Universität München, Center for Mathematics, Garching, Germany
| | - Erika Dudkin
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Jakob Vanhoefer
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Emad Alamoudi
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Simon Merkt
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Lara Fuhrmann
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Fan Bai
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
| | - Jan Hasenauer
- Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany; Technische Universität München, Center for Mathematics, Garching, Germany; Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
| |
Collapse
|
7
|
A dual-parameter identification approach for data-based predictive modeling of hybrid gene regulatory network-growth kinetics in Pseudomonas putida mt-2. Bioprocess Biosyst Eng 2020; 43:1671-1688. [PMID: 32377941 DOI: 10.1007/s00449-020-02360-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 04/21/2020] [Indexed: 10/24/2022]
Abstract
Data integration to model-based description of biological systems incorporating gene dynamics improves the performance of microbial systems. Bioprocess performance, typically predicted using empirical Monod-type models, is essential for a sustainable bioeconomy. To replace empirical models, we updated a hybrid gene regulatory network-growth kinetic model, predicting aromatic pollutants degradation and biomass growth in Pseudomonas putida mt-2. We modeled a complex biological system including extensive information to understand the role of the regulatory elements in toluene biodegradation and biomass growth. The updated model exhibited extra complications such as the existence of oscillations and discontinuities. As parameter estimation of complex biological models remains a key challenge, we used the updated model to present a dual-parameter identification approach (the 'dual approach') combining two independent methodologies. Approach I handled the complexity by incorporation of demonstrated biological knowledge in the model-development process and combination of global sensitivity analysis and optimisation. Approach II complemented Approach I handling multimodality, ill-conditioning and overfitting through regularisation estimation, global optimisation, and identifiability analysis. To systematically quantify the biological system, we used a vast amount of high-quality time-course data. The dual approach resulted in an accurately calibrated kinetic model (NRMSE: 0.17055) efficiently handling the additional model complexity. We tested model validation using three independent experimental data sets, achieving greater predictive power (NRMSE: 0.18776) than the individual approaches (NRMSE I: 0.25322, II: 0.25227) and increasing model robustness. These results demonstrated data-driven predictive modeling potentially leading to bioprocess' model-based control and optimisation.
Collapse
|
8
|
Kreutz C. Guidelines for benchmarking of optimization-based approaches for fitting mathematical models. Genome Biol 2019; 20:281. [PMID: 31842943 PMCID: PMC6915982 DOI: 10.1186/s13059-019-1887-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 11/13/2019] [Indexed: 11/10/2022] Open
Abstract
Insufficient performance of optimization-based approaches for the fitting of mathematical models is still a major bottleneck in systems biology. In this article, the reasons and methodological challenges are summarized as well as their impact in benchmark studies. Important aspects for achieving an increased level of evidence for benchmark results are discussed. Based on general guidelines for benchmarking in computational biology, a collection of tailored guidelines is presented for performing informative and unbiased benchmarking of optimization-based fitting approaches. Comprehensive benchmark studies based on these recommendations are urgently required for the establishment of a robust and reliable methodology for the systems biology community.
Collapse
Affiliation(s)
- Clemens Kreutz
- Faculty of Medicine and Medical Center, Institute of Medical Biometry and Statistics, University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany.
- CIBSS-Centre for Integrative Biological Signalling Studies, University of Freiburg, Freiburg, Germany.
| |
Collapse
|
9
|
Mitra ED, Hlavacek WS. Parameter Estimation and Uncertainty Quantification for Systems Biology Models. CURRENT OPINION IN SYSTEMS BIOLOGY 2019; 18:9-18. [PMID: 32719822 PMCID: PMC7384601 DOI: 10.1016/j.coisb.2019.10.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Mathematical models can provide quantitative insights into immunoreceptor signaling, and other biological processes, but require parameterization and uncertainty quantification before reliable predictions become possible. We review currently available methods and software tools to address these problems. We consider gradient-based and gradient-free methods for point estimation of parameter values, and methods of profile likelihood, bootstrapping, and Bayesian inference for uncertainty quantification. We consider recent and potential future applications of these methods to systems-level modeling of immune-related phenomena.
Collapse
Affiliation(s)
- Eshan D. Mitra
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - William S. Hlavacek
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
10
|
Fischer DS, Fiedler AK, Kernfeld EM, Genga RMJ, Bastidas-Ponce A, Bakhti M, Lickert H, Hasenauer J, Maehr R, Theis FJ. Inferring population dynamics from single-cell RNA-sequencing time series data. Nat Biotechnol 2019; 37:461-468. [PMID: 30936567 PMCID: PMC7397487 DOI: 10.1038/s41587-019-0088-0] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 02/28/2019] [Indexed: 11/09/2022]
Abstract
Recent single-cell RNA-sequencing studies have suggested that cells follow continuous transcriptomic trajectories in an asynchronous fashion during development. However, observations of cell flux along trajectories are confounded with population size effects in snapshot experiments and are therefore hard to interpret. In particular, changes in proliferation and death rates can be mistaken for cell flux. Here we present pseudodynamics, a mathematical framework that reconciles population dynamics with the concepts underlying developmental trajectories inferred from time-series single-cell data. Pseudodynamics models population distribution shifts across trajectories to quantify selection pressure, population expansion, and developmental potentials. Applying this model to time-resolved single-cell RNA-sequencing of T-cell and pancreatic beta cell maturation, we characterize proliferation and apoptosis rates and identify key developmental checkpoints, data inaccessible to existing approaches.
Collapse
Affiliation(s)
- David S Fischer
- Institute of Computational Biology, Helmholz Zentrum München, Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Anna K Fiedler
- Institute of Computational Biology, Helmholz Zentrum München, Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, Garching bei München, Germany
| | - Eric M Kernfeld
- Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Medical School, Worcester, MA, USA
| | - Ryan M J Genga
- Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Medical School, Worcester, MA, USA
| | - Aimée Bastidas-Ponce
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- Institute of Stem Cell Research, Helmholtz Zentrum München, Neuherberg, Germany
- Medical Faculty, Technical University of Munich, Munich, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Mostafa Bakhti
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- Institute of Stem Cell Research, Helmholtz Zentrum München, Neuherberg, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Heiko Lickert
- Institute of Diabetes and Regeneration Research, Helmholtz Zentrum München, Neuherberg, Germany
- Institute of Stem Cell Research, Helmholtz Zentrum München, Neuherberg, Germany
- Medical Faculty, Technical University of Munich, Munich, Germany
- German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholz Zentrum München, Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, Garching bei München, Germany
| | - Rene Maehr
- Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Medical School, Worcester, MA, USA
| | - Fabian J Theis
- Institute of Computational Biology, Helmholz Zentrum München, Neuherberg, Germany.
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
- Department of Mathematics, Technical University of Munich, Garching bei München, Germany.
| |
Collapse
|
11
|
Pitt JA, Banga JR. Parameter estimation in models of biological oscillators: an automated regularised estimation approach. BMC Bioinformatics 2019; 20:82. [PMID: 30770736 PMCID: PMC6377730 DOI: 10.1186/s12859-019-2630-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 01/14/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Dynamic modelling is a core element in the systems biology approach to understanding complex biosystems. Here, we consider the problem of parameter estimation in models of biological oscillators described by deterministic nonlinear differential equations. These problems can be extremely challenging due to several common pitfalls: (i) a lack of prior knowledge about parameters (i.e. massive search spaces), (ii) convergence to local optima (due to multimodality of the cost function), (iii) overfitting (fitting the noise instead of the signal) and (iv) a lack of identifiability. As a consequence, the use of standard estimation methods (such as gradient-based local ones) will often result in wrong solutions. Overfitting can be particularly problematic, since it produces very good calibrations, giving the impression of an excellent result. However, overfitted models exhibit poor predictive power. Here, we present a novel automated approach to overcome these pitfalls. Its workflow makes use of two sequential optimisation steps incorporating three key algorithms: (1) sampling strategies to systematically tighten the parameter bounds reducing the search space, (2) efficient global optimisation to avoid convergence to local solutions, (3) an advanced regularisation technique to fight overfitting. In addition, this workflow incorporates tests for structural and practical identifiability. RESULTS We successfully evaluate this novel approach considering four difficult case studies regarding the calibration of well-known biological oscillators (Goodwin, FitzHugh-Nagumo, Repressilator and a metabolic oscillator). In contrast, we show how local gradient-based approaches, even if used in multi-start fashion, are unable to avoid the above-mentioned pitfalls. CONCLUSIONS Our approach results in more efficient estimations (thanks to the bounding strategy) which are able to escape convergence to local optima (thanks to the global optimisation approach). Further, the use of regularisation allows us to avoid overfitting, resulting in more generalisable calibrated models (i.e. models with greater predictive power).
Collapse
Affiliation(s)
- Jake Alan Pitt
- (Bio)Process Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208 Spain
- RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Aachen, Germany
| | - Julio R. Banga
- (Bio)Process Engineering Group, IIM-CSIC, Eduardo Cabello 6, Vigo, 36208 Spain
| |
Collapse
|
12
|
Fröhlich F, Reiser A, Fink L, Woschée D, Ligon T, Theis FJ, Rädler JO, Hasenauer J. Multi-experiment nonlinear mixed effect modeling of single-cell translation kinetics after transfection. NPJ Syst Biol Appl 2018; 5:1. [PMID: 30564456 PMCID: PMC6288153 DOI: 10.1038/s41540-018-0079-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 11/09/2018] [Indexed: 11/10/2022] Open
Abstract
Single-cell time-lapse studies have advanced the quantitative understanding of cellular pathways and their inherent cell-to-cell variability. However, parameters retrieved from individual experiments are model dependent and their estimation is limited, if based on solely one kind of experiment. Hence, methods to integrate data collected under different conditions are expected to improve model validation and information content. Here we present a multi-experiment nonlinear mixed effect modeling approach for mechanistic pathway models, which allows the integration of multiple single-cell perturbation experiments. We apply this approach to the translation of green fluorescent protein after transfection using a massively parallel read-out of micropatterned single-cell arrays. We demonstrate that the integration of data from perturbation experiments allows the robust reconstruction of cell-to-cell variability, i.e., parameter densities, while each individual experiment provides insufficient information. Indeed, we show that the integration of the datasets on the population level also improves the estimates for individual cells by breaking symmetries, although each of them is only measured in one experiment. Moreover, we confirmed that the suggested approach is robust with respect to batch effects across experimental replicates and can provide mechanistic insights into the nature of batch effects. We anticipate that the proposed multi-experiment nonlinear mixed effect modeling approach will serve as a basis for the analysis of cellular heterogeneity in single-cell dynamics.
Collapse
Affiliation(s)
- Fabian Fröhlich
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764 Germany
- Center for Mathematics, Technische Universität München, Garching, 85748 Germany
| | - Anita Reiser
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, 80539 Germany
| | - Laura Fink
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, 80539 Germany
| | - Daniel Woschée
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, 80539 Germany
| | - Thomas Ligon
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, 80539 Germany
| | - Fabian Joachim Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764 Germany
- Center for Mathematics, Technische Universität München, Garching, 85748 Germany
| | - Joachim Oskar Rädler
- Faculty of Physics and Center for NanoScience, Ludwig-Maximilians-Universität, München, 80539 Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764 Germany
- Center for Mathematics, Technische Universität München, Garching, 85748 Germany
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, 53115 Germany
| |
Collapse
|
13
|
Kim OD, Rocha M, Maia P. A Review of Dynamic Modeling Approaches and Their Application in Computational Strain Optimization for Metabolic Engineering. Front Microbiol 2018; 9:1690. [PMID: 30108559 PMCID: PMC6079213 DOI: 10.3389/fmicb.2018.01690] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 07/06/2018] [Indexed: 12/03/2022] Open
Abstract
Mathematical modeling is a key process to describe the behavior of biological networks. One of the most difficult challenges is to build models that allow quantitative predictions of the cells' states along time. Recently, this issue started to be tackled through novel in silico approaches, such as the reconstruction of dynamic models, the use of phenotype prediction methods, and pathway design via efficient strain optimization algorithms. The use of dynamic models, which include detailed kinetic information of the biological systems, potentially increases the scope of the applications and the accuracy of the phenotype predictions. New efforts in metabolic engineering aim at bridging the gap between this approach and other different paradigms of mathematical modeling, as constraint-based approaches. These strategies take advantage of the best features of each method, and deal with the most remarkable limitation—the lack of available experimental information—which affects the accuracy and feasibility of solutions. Parameter estimation helps to solve this problem, but adding more computational cost to the overall process. Moreover, the existing approaches include limitations such as their scalability, flexibility, convergence time of the simulations, among others. The aim is to establish a trade-off between the size of the model and the level of accuracy of the solutions. In this work, we review the state of the art of dynamic modeling and related methods used for metabolic engineering applications, including approaches based on hybrid modeling. We describe approaches developed to undertake issues regarding the mathematical formulation and the underlying optimization algorithms, and that address the phenotype prediction by including available kinetic rate laws of metabolic processes. Then, we discuss how these have been used and combined as the basis to build computational strain optimization methods for metabolic engineering purposes, how they lead to bi-level schemes that can be used in the industry, including a consideration of their limitations.
Collapse
Affiliation(s)
- Osvaldo D Kim
- SilicoLife Lda, Braga, Portugal.,Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, Universidade do Minho, Braga, Portugal
| | | |
Collapse
|
14
|
Ballnus B, Schaper S, Theis FJ, Hasenauer J. Bayesian parameter estimation for biochemical reaction networks using region-based adaptive parallel tempering. Bioinformatics 2018; 34:i494-i501. [PMID: 29949983 PMCID: PMC6022572 DOI: 10.1093/bioinformatics/bty229] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Motivation Mathematical models have become standard tools for the investigation of cellular processes and the unraveling of signal processing mechanisms. The parameters of these models are usually derived from the available data using optimization and sampling methods. However, the efficiency of these methods is limited by the properties of the mathematical model, e.g. non-identifiabilities, and the resulting posterior distribution. In particular, multi-modal distributions with long valleys or pronounced tails are difficult to optimize and sample. Thus, the developement or improvement of optimization and sampling methods is subject to ongoing research. Results We suggest a region-based adaptive parallel tempering algorithm which adapts to the problem-specific posterior distributions, i.e. modes and valleys. The algorithm combines several established algorithms to overcome their individual shortcomings and to improve sampling efficiency. We assessed its properties for established benchmark problems and two ordinary differential equation models of biochemical reaction networks. The proposed algorithm outperformed state-of-the-art methods in terms of calculation efficiency and mixing. Since the algorithm does not rely on a specific problem structure, but adapts to the posterior distribution, it is suitable for a variety of model classes. Availability and implementation The code is available both as Supplementary Material and in a Git repository written in MATLAB. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin Ballnus
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Steffen Schaper
- Bayer AG, Engineering and Technologies, Applied Mathematics, Leverkusen, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, Garching, Germany
| |
Collapse
|