1
|
Friedrich S, Friede T. On the role of benchmarking data sets and simulations in method comparison studies. Biom J 2024; 66:e2200212. [PMID: 36810737 DOI: 10.1002/bimj.202200212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 01/26/2023] [Accepted: 02/01/2023] [Indexed: 02/24/2023]
Abstract
Method comparisons are essential to provide recommendations and guidance for applied researchers, who often have to choose from a plethora of available approaches. While many comparisons exist in the literature, these are often not neutral but favor a novel method. Apart from the choice of design and a proper reporting of the findings, there are different approaches concerning the underlying data for such method comparison studies. Most manuscripts on statistical methodology rely on simulation studies and provide a single real-world data set as an example to motivate and illustrate the methodology investigated. In the context of supervised learning, in contrast, methods are often evaluated using so-called benchmarking data sets, that is, real-world data that serve as gold standard in the community. Simulation studies, on the other hand, are much less common in this context. The aim of this paper is to investigate differences and similarities between these approaches, to discuss their advantages and disadvantages, and ultimately to develop new approaches to the evaluation of methods picking the best of both worlds. To this aim, we borrow ideas from different contexts such as mixed methods research and Clinical Scenario Evaluation.
Collapse
Affiliation(s)
- Sarah Friedrich
- Institute of Mathematics, University of Augsburg, Augsburg, Germany
- Centre for Advanced Analytics and Predictive Sciences, University of Augsburg, Augsburg, Germany
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Humboldtallee, Göttingen, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Göttingen, Göttingen, Germany
| |
Collapse
|
2
|
Fröhlich F. A Practical Guide for the Efficient Formulation and Calibration of Large, Energy- and Rule-Based Models of Cellular Signal Transduction. Methods Mol Biol 2023; 2634:59-86. [PMID: 37074574 DOI: 10.1007/978-1-0716-3008-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
Aberrant signal transduction leads to complex diseases such as cancer. To rationally design treatment strategies with small molecule inhibitors, computational models have to be employed. Energy- and rule-based models allow the construction of mechanistic ordinary differential equation models based on structural insights. The detailed, energy-based description often generates large models, which are difficult to calibrate on experimental data. In this chapter, we provide a detailed, interactive protocol for the programmatic formulation and calibration of such large, energy- and rule-based models of cellular signal transduction based on an example model describing the action of RAF inhibitors on MAPK signaling. An interactive version of this chapter is available as Jupyter Notebook at github.com/FFroehlich/energy_modeling_chapter .
Collapse
Affiliation(s)
- Fabian Fröhlich
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
3
|
Fröhlich F, Sorger PK. Fides: Reliable trust-region optimization for parameter estimation of ordinary differential equation models. PLoS Comput Biol 2022; 18:e1010322. [PMID: 35830470 PMCID: PMC9312381 DOI: 10.1371/journal.pcbi.1010322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 07/25/2022] [Accepted: 06/21/2022] [Indexed: 11/18/2022] Open
Abstract
Ordinary differential equation (ODE) models are widely used to study biochemical reactions in cellular networks since they effectively describe the temporal evolution of these networks using mass action kinetics. The parameters of these models are rarely known a priori and must instead be estimated by calibration using experimental data. Optimization-based calibration of ODE models on is often challenging, even for low-dimensional problems. Multiple hypotheses have been advanced to explain why biochemical model calibration is challenging, including non-identifiability of model parameters, but there are few comprehensive studies that test these hypotheses, likely because tools for performing such studies are also lacking. Nonetheless, reliable model calibration is essential for uncertainty analysis, model comparison, and biological interpretation.
We implemented an established trust-region method as a modular Python framework (fides) to enable systematic comparison of different approaches to ODE model calibration involving a variety of Hessian approximation schemes. We evaluated fides on a recently developed corpus of biologically realistic benchmark problems for which real experimental data are available. Unexpectedly, we observed high variability in optimizer performance among different implementations of the same mathematical instructions (algorithms). Analysis of possible sources of poor optimizer performance identified limitations in the widely used Gauss-Newton, BFGS and SR1 Hessian approximation schemes. We addressed these drawbacks with a novel hybrid Hessian approximation scheme that enhances optimizer performance and outperforms existing hybrid approaches. When applied to the corpus of test models, we found that fides was on average more reliable and efficient than existing methods using a variety of criteria. We expect fides to be broadly useful for ODE constrained optimization problems in biochemical models and to be a foundation for future methods development.
Collapse
Affiliation(s)
- Fabian Fröhlich
- Laboratory of Systems Pharmacology and Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (FF); (PKS)
| | - Peter K. Sorger
- Laboratory of Systems Pharmacology and Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail: (FF); (PKS)
| |
Collapse
|
4
|
Dray KE, Muldoon JJ, Mangan NM, Bagheri N, Leonard JN. GAMES: A Dynamic Model Development Workflow for Rigorous Characterization of Synthetic Genetic Systems. ACS Synth Biol 2022; 11:1009-1029. [PMID: 35023730 PMCID: PMC9097825 DOI: 10.1021/acssynbio.1c00528] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Mathematical modeling is invaluable for advancing understanding and design of synthetic biological systems. However, the model development process is complicated and often unintuitive, requiring iteration on various computational tasks and comparisons with experimental data. Ad hoc model development can pose a barrier to reproduction and critical analysis of the development process itself, reducing the potential impact and inhibiting further model development and collaboration. To help practitioners manage these challenges, we introduce the Generation and Analysis of Models for Exploring Synthetic Systems (GAMES) workflow, which includes both automated and human-in-the-loop processes. We systematically consider the process of developing dynamic models, including model formulation, parameter estimation, parameter identifiability, experimental design, model reduction, model refinement, and model selection. We demonstrate the workflow with a case study on a chemically responsive transcription factor. The generalizable workflow presented in this tutorial can enable biologists to more readily build and analyze models for various applications.
Collapse
Affiliation(s)
- Kate E. Dray
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
| | - Joseph J. Muldoon
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.,Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL 60208, USA
| | - Niall M. Mangan
- Engineering Sciences and Applied Mathematics Program, Northwestern University, Evanston, IL 60208, USA.,Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA
| | - Neda Bagheri
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.,Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL 60208, USA.,Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA.,Departments of Biology and Chemical Engineering, University of Washington, Seattle, WA 98195, USA.,Co-corresponding authors: Joshua N. Leonard, , Neda Bagheri,
| | - Joshua N. Leonard
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA.,Interdisciplinary Biological Sciences Program, Northwestern University, Evanston, IL 60208, USA.,Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA.,Chemistry of Life Processes Institute, and Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Evanston, IL 60208, USA.,Co-corresponding authors: Joshua N. Leonard, , Neda Bagheri,
| |
Collapse
|
5
|
Stapor P, Schmiester L, Wierling C, Merkt S, Pathirana D, Lange BMH, Weindl D, Hasenauer J. Mini-batch optimization enables training of ODE models on large-scale datasets. Nat Commun 2022; 13:34. [PMID: 35013141 PMCID: PMC8748893 DOI: 10.1038/s41467-021-27374-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 11/11/2021] [Indexed: 11/09/2022] Open
Abstract
Quantitative dynamic models are widely used to study cellular signal processing. A critical step in modelling is the estimation of unknown model parameters from experimental data. As model sizes and datasets are steadily growing, established parameter optimization approaches for mechanistic models become computationally extremely challenging. Mini-batch optimization methods, as employed in deep learning, have better scaling properties. In this work, we adapt, apply, and benchmark mini-batch optimization for ordinary differential equation (ODE) models, thereby establishing a direct link between dynamic modelling and machine learning. On our main application example, a large-scale model of cancer signaling, we benchmark mini-batch optimization against established methods, achieving better optimization results and reducing computation by more than an order of magnitude. We expect that our work will serve as a first step towards mini-batch optimization tailored to ODE models and enable modelling of even larger and more complex systems than what is currently possible.
Collapse
Affiliation(s)
- Paul Stapor
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, 85764, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, 85748, Garching, Germany
| | - Leonard Schmiester
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, 85764, Neuherberg, Germany
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, 85748, Garching, Germany
| | | | - Simon Merkt
- Universität Bonn, Faculty of Mathematics and Natural Sciences, 53115, Bonn, Germany
| | - Dilan Pathirana
- Universität Bonn, Faculty of Mathematics and Natural Sciences, 53115, Bonn, Germany
| | | | - Daniel Weindl
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, 85764, Neuherberg, Germany
| | - Jan Hasenauer
- Helmholtz Zentrum München - German Research Center for Environmental Health, Institute of Computational Biology, 85764, Neuherberg, Germany.
- Technische Universität München, Center for Mathematics, Chair of Mathematical Modeling of Biological Systems, 85748, Garching, Germany.
- Universität Bonn, Faculty of Mathematics and Natural Sciences, 53115, Bonn, Germany.
| |
Collapse
|
6
|
Städter P, Schälte Y, Schmiester L, Hasenauer J, Stapor PL. Benchmarking of numerical integration methods for ODE models of biological systems. Sci Rep 2021; 11:2696. [PMID: 33514831 PMCID: PMC7846608 DOI: 10.1038/s41598-021-82196-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 01/08/2021] [Indexed: 11/09/2022] Open
Abstract
Ordinary differential equation (ODE) models are a key tool to understand complex mechanisms in systems biology. These models are studied using various approaches, including stability and bifurcation analysis, but most frequently by numerical simulations. The number of required simulations is often large, e.g., when unknown parameters need to be inferred. This renders efficient and reliable numerical integration methods essential. However, these methods depend on various hyperparameters, which strongly impact the ODE solution. Despite this, and although hundreds of published ODE models are freely available in public databases, a thorough study that quantifies the impact of hyperparameters on the ODE solver in terms of accuracy and computation time is still missing. In this manuscript, we investigate which choices of algorithms and hyperparameters are generally favorable when dealing with ODE models arising from biological processes. To ensure a representative evaluation, we considered 142 published models. Our study provides evidence that most ODEs in computational biology are stiff, and we give guidelines for the choice of algorithms and hyperparameters. We anticipate that our results will help researchers in systems biology to choose appropriate numerical methods when dealing with ODE models.
Collapse
Affiliation(s)
- Philipp Städter
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, 85748, Garching, Germany
| | - Yannik Schälte
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, 85748, Garching, Germany
| | - Leonard Schmiester
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, 85748, Garching, Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany.
- Center for Mathematics, Technische Universität München, 85748, Garching, Germany.
- Faculty of Mathematics and Natural Sciences, University of Bonn, 53113, Bonn, Germany.
| | - Paul L Stapor
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, 85748, Garching, Germany
| |
Collapse
|