1
|
Yu R, Wang R. Learning dynamical systems from data: An introduction to physics-guided deep learning. Proc Natl Acad Sci U S A 2024; 121:e2311808121. [PMID: 38913886 PMCID: PMC11228478 DOI: 10.1073/pnas.2311808121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Modeling complex physical dynamics is a fundamental task in science and engineering. Traditional physics-based models are first-principled, explainable, and sample-efficient. However, they often rely on strong modeling assumptions and expensive numerical integration, requiring significant computational resources and domain expertise. While deep learning (DL) provides efficient alternatives for modeling complex dynamics, they require a large amount of labeled training data. Furthermore, its predictions may disobey the governing physical laws and are difficult to interpret. Physics-guided DL aims to integrate first-principled physical knowledge into data-driven methods. It has the best of both worlds and is well equipped to better solve scientific problems. Recently, this field has gained great progress and has drawn considerable interest across discipline Here, we introduce the framework of physics-guided DL with a special emphasis on learning dynamical systems. We describe the learning pipeline and categorize state-of-the-art methods under this framework. We also offer our perspectives on the open challenges and emerging opportunities.
Collapse
Affiliation(s)
- Rose Yu
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093
| | - Rui Wang
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
2
|
Prokop B, Gelens L. From biological data to oscillator models using SINDy. iScience 2024; 27:109316. [PMID: 38523784 PMCID: PMC10959654 DOI: 10.1016/j.isci.2024.109316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 01/18/2024] [Accepted: 02/18/2024] [Indexed: 03/26/2024] Open
Abstract
Periodic changes in the concentration or activity of different molecules regulate vital cellular processes such as cell division and circadian rhythms. Developing mathematical models is essential to better understand the mechanisms underlying these oscillations. Recent data-driven methods like SINDy have fundamentally changed model identification, yet their application to experimental biological data remains limited. This study investigates SINDy's constraints by directly applying it to biological oscillatory data. We identify insufficient resolution, noise, dimensionality, and limited prior knowledge as primary limitations. Using various generic oscillator models of different complexity and/or dimensionality, we systematically analyze these factors. We then propose a comprehensive guide for inferring models from biological data, addressing these challenges step by step. Our approach is validated using glycolytic oscillation data from yeast.
Collapse
Affiliation(s)
- Bartosz Prokop
- Laboratory of Dynamics in Biological Systems, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Lendert Gelens
- Laboratory of Dynamics in Biological Systems, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| |
Collapse
|
3
|
Konur S, Gheorghe M, Krasnogor N. Verifiable biology. J R Soc Interface 2023; 20:20230019. [PMID: 37160165 PMCID: PMC10169095 DOI: 10.1098/rsif.2023.0019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
The formalization of biological systems using computational modelling approaches as an alternative to mathematical-based methods has recently received much interest because computational models provide a deeper mechanistic understanding of biological systems. In particular, formal verification, complementary approach to standard computational techniques such as simulation, is used to validate the system correctness and obtain critical information about system behaviour. In this study, we survey the most frequently used computational modelling approaches and formal verification techniques for computational biology. We compare a number of verification tools and software suites used to analyse biological systems and biochemical networks, and to verify a wide range of biological properties. For users who have no expertise in formal verification, we present a novel methodology that allows them to easily apply formal verification techniques to analyse their biological or biochemical system of interest.
Collapse
Affiliation(s)
- Savas Konur
- Department of Computer Science, University of Bradford, Richmond Building, Bradford BD7 1DP, UK
| | - Marian Gheorghe
- Department of Computer Science, University of Bradford, Richmond Building, Bradford BD7 1DP, UK
| | - Natalio Krasnogor
- School of Computing Science, Newcastle University, Science Square, Newcastle upon Tyne NE4 5TG, UK
| |
Collapse
|
4
|
Johnston ST, Faria M. Equation learning to identify nano-engineered particle-cell interactions: an interpretable machine learning approach. NANOSCALE 2022; 14:16502-16515. [PMID: 36314284 DOI: 10.1039/d2nr04668g] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Designing nano-engineered particles capable of the delivery of therapeutic and diagnostic agents to a specific target remains a significant challenge. Understanding how interactions between particles and cells are impacted by the physicochemical properties of the particle will help inform rational design choices. Mathematical and computational techniques allow for details regarding particle-cell interactions to be isolated from the interwoven set of biological, chemical, and physical phenomena involved in the particle delivery process. Here we present a machine learning framework capable of elucidating particle-cell interactions from experimental data. This framework employs a data-driven modelling approach, augmented by established biological knowledge. Crucially, the model of particle-cell interactions learned by the framework can be interpreted and analysed, in contrast to the 'black box' models inherent to other machine learning approaches. We apply the framework to association data for thirty different particle-cell pairs. This library of data contains both adherent and suspension cell lines, as well as a diverse collection of particles. We consider hyperbranched polymer and poly(methacrylic acid) particles, from 6 nm to 1032 nm in diameter, with small molecule, monoclonal antibody, and peptide surface functionalisations. Despite the diverse nature of the experiments, the learned models of particle-cell interactions for each particle-cell pair are remarkably consistent: out of 2048 potential models, only four unique models are learned. The models reveal that nonlinear saturation effects are a key feature governing particle-cell interactions. Further, the framework provides robust estimates of particle performance, which facilitates quantitative evaluation of particle design choices.
Collapse
Affiliation(s)
- Stuart T Johnston
- School of Mathematics and Statistics, The University of Melbourne, Victoria, Australia.
| | - Matthew Faria
- Department of Biomedical Engineering, The University of Melbourne, Victoria, Australia
| |
Collapse
|
5
|
Computationally efficient mechanism discovery for cell invasion with uncertainty quantification. PLoS Comput Biol 2022; 18:e1010599. [DOI: 10.1371/journal.pcbi.1010599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 11/30/2022] [Accepted: 09/23/2022] [Indexed: 11/17/2022] Open
Abstract
Parameter estimation for mathematical models of biological processes is often difficult and depends significantly on the quality and quantity of available data. We introduce an efficient framework using Gaussian processes to discover mechanisms underlying delay, migration, and proliferation in a cell invasion experiment. Gaussian processes are leveraged with bootstrapping to provide uncertainty quantification for the mechanisms that drive the invasion process. Our framework is efficient, parallelisable, and can be applied to other biological problems. We illustrate our methods using a canonical scratch assay experiment, demonstrating how simply we can explore different functional forms and develop and test hypotheses about underlying mechanisms, such as whether delay is present. All code and data to reproduce this work are available at https://github.com/DanielVandH/EquationLearning.jl.
Collapse
|
6
|
Messenger DA, Bortz DM. Learning mean-field equations from particle data using WSINDy. PHYSICA D. NONLINEAR PHENOMENA 2022; 439:133406. [PMID: 37476028 PMCID: PMC10358825 DOI: 10.1016/j.physd.2022.133406] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/22/2023]
Abstract
We develop a weak-form sparse identification method for interacting particle systems (IPS) with the primary goals of reducing computational complexity for large particle number N and offering robustness to either intrinsic or extrinsic noise. In particular, we use concepts from mean-field theory of IPS in combination with the weak-form sparse identification of nonlinear dynamics algorithm (WSINDy) to provide a fast and reliable system identification scheme for recovering the governing stochastic differential equations for an IPS when the number of particles per experiment N is on the order of several thousands and the number of experiments M is less than 100. This is in contrast to existing work showing that system identification for N less than 100 and M on the order of several thousand is feasible using strong-form methods. We prove that under some standard regularity assumptions the scheme converges with rate O ( N - 1 ∕ 2 ) in the ordinary least squares setting and we demonstrate the convergence rate numerically on several systems in one and two spatial dimensions. Our examples include a canonical problem from homogenization theory (as a first step towards learning coarse-grained models), the dynamics of an attractive-repulsive swarm, and the IPS description of the parabolic-elliptic Keller-Segel model for chemotaxis. Code is available at https://github.com/MathBioCU/WSINDy_IPS.
Collapse
Affiliation(s)
- Daniel A. Messenger
- Department of Applied Mathematics, University of Colorado Boulder, 11 Engineering Dr, Boulder, CO 80309, USA
| | - David M. Bortz
- Department of Applied Mathematics, University of Colorado Boulder, 11 Engineering Dr, Boulder, CO 80309, USA
| |
Collapse
|
7
|
Jain A, Kumar A, Kumar Gupta A. A theoretical framework to analyse the flow of particles in a dynamical system with stochastic transition rates and site capacities. ROYAL SOCIETY OPEN SCIENCE 2022; 9:220698. [PMID: 36277836 PMCID: PMC9579774 DOI: 10.1098/rsos.220698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
We study the stochasticity in a dynamical model: ribosome flow model with different site sizes that models the unidirectional movement of particles controlled by transition rates along a lattice having different site sizes. Our work models the parameters as random variables with known distributions and investigates the steady-state flow rate under this notion by using tools from the random matrix theory. Some closed-form theoretical results are derived for the steady-state flow rate under some restrictive assumptions such as random variables being independent and identically distributed. Furthermore, for arbitrary but bounded stochastic transition rates, stochastic site capacities, or both, we establish bounds for the steady-state flow rate. Our analysis can be generalized and applied to study the flow of particles in numerous transport systems in the stochastic environment.
Collapse
Affiliation(s)
- Aditi Jain
- Department of Mathematics, Indian Institute of Technology Ropar, Rupnagar, 140001 Punjab, India
| | - Arun Kumar
- Department of Mathematics, Indian Institute of Technology Ropar, Rupnagar, 140001 Punjab, India
| | - Arvind Kumar Gupta
- Department of Mathematics, Indian Institute of Technology Ropar, Rupnagar, 140001 Punjab, India
| |
Collapse
|
8
|
Messenger DA, Wheeler GE, Liu X, Bortz DM. Learning anisotropic interaction rules from individual trajectories in a heterogeneous cellular population. J R Soc Interface 2022; 19:20220412. [PMCID: PMC9554727 DOI: 10.1098/rsif.2022.0412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Interacting particle system (IPS) models have proven to be highly successful for describing the spatial movement of organisms. However, it is challenging to infer the interaction rules directly from data. In the field of equation discovery, the weak-form sparse identification of nonlinear dynamics (WSINDy) methodology has been shown to be computationally efficient for identifying the governing equations of complex systems from noisy data. Motivated by the success of IPS models to describe the spatial movement of organisms, we develop WSINDy for the second-order IPS to learn equations for communities of cells. Our approach learns the directional interaction rules for each individual cell that in aggregate govern the dynamics of a heterogeneous population of migrating cells. To sort a cell according to the active classes present in its model, we also develop a novel ad hoc classification scheme (which accounts for the fact that some cells do not have enough evidence to accurately infer a model). Aggregated models are then constructed hierarchically to simultaneously identify different species of cells present in the population and determine best-fit models for each species. We demonstrate the efficiency and proficiency of the method on several test scenarios, motivated by common cell migration experiments.
Collapse
Affiliation(s)
- Daniel A. Messenger
- Department of Applied Mathematics, University of Colorado, Boulder, CO 80309-0526, USA
| | - Graycen E. Wheeler
- Department of Biochemistry, University of Colorado, Boulder, CO 80309-0526, USA
| | - Xuedong Liu
- Department of Biochemistry, University of Colorado, Boulder, CO 80309-0526, USA
| | - David M. Bortz
- Department of Applied Mathematics, University of Colorado, Boulder, CO 80309-0526, USA
| |
Collapse
|
9
|
Han L, He C, Dinh H, Fricks J, Kuang Y. Learning Biological Dynamics From Spatio-Temporal Data by Gaussian Processes. Bull Math Biol 2022; 84:69. [DOI: 10.1007/s11538-022-01022-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 01/18/2022] [Indexed: 11/02/2022]
|
10
|
Chen Z, Liu Y, Sun H. Physics-informed learning of governing equations from scarce data. Nat Commun 2021; 12:6136. [PMID: 34675223 PMCID: PMC8531004 DOI: 10.1038/s41467-021-26434-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/04/2021] [Indexed: 11/23/2022] Open
Abstract
Harnessing data to discover the underlying governing laws or equations that describe the behavior of complex physical systems can significantly advance our modeling, simulation and understanding of such systems in various science and engineering disciplines. This work introduces a novel approach called physics-informed neural network with sparse regression to discover governing partial differential equations from scarce and noisy data for nonlinear spatiotemporal systems. In particular, this discovery approach seamlessly integrates the strengths of deep neural networks for rich representation learning, physics embedding, automatic differentiation and sparse regression to approximate the solution of system variables, compute essential derivatives, as well as identify the key derivative terms and parameters that form the structure and explicit expression of the equations. The efficacy and robustness of this method are demonstrated, both numerically and experimentally, on discovering a variety of partial differential equation systems with different levels of data scarcity and noise accounting for different initial/boundary conditions. The resulting computational framework shows the potential for closed-form model discovery in practical applications where large and accurate datasets are intractable to capture.
Collapse
Affiliation(s)
- Zhao Chen
- grid.261112.70000 0001 2173 3359Department of Civil and Environmental Engineering, Northeastern University, Boston, MA 02115 USA
| | - Yang Liu
- grid.261112.70000 0001 2173 3359Department of Mechanical and Industrial Engineering, Northeastern University, Boston, MA 02115 USA
| | - Hao Sun
- grid.24539.390000 0004 0368 8103Gaoling School of Artificial Intelligence, Renmin University of China, 100872 Beijing, China ,grid.24539.390000 0004 0368 8103Beijing Key Laboratory of Big Data Management and Analysis Methods, 100872 Beijing, China ,grid.116068.80000 0001 2341 2786Department of Civil and Environmental Engineering, MIT, Cambridge, MA 02139 USA
| |
Collapse
|
11
|
Messenger DA, Bortz DM. WEAK SINDY FOR PARTIAL DIFFERENTIAL EQUATIONS. JOURNAL OF COMPUTATIONAL PHYSICS 2021; 443:110525. [PMID: 34744183 PMCID: PMC8570254 DOI: 10.1016/j.jcp.2021.110525] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Sparse Identification of Nonlinear Dynamics (SINDy) is a method of system discovery that has been shown to successfully recover governing dynamical systems from data [6, 39]. Recently, several groups have independently discovered that the weak formulation provides orders of magnitude better robustness to noise. Here we extend our Weak SINDy (WSINDy) framework introduced in [28] to the setting of partial differential equations (PDEs). The elimination of pointwise derivative approximations via the weak form enables effective machine-precision recovery of model coefficients from noise-free data (i.e. below the tolerance of the simulation scheme) as well as robust identification of PDEs in the large noise regime (with signal-to-noise ratio approaching one in many well-known cases). This is accomplished by discretizing a convolutional weak form of the PDE and exploiting separability of test functions for efficient model identification using the Fast Fourier Transform. The resulting WSINDy algorithm for PDEs has a worst-case computational complexity of O ( N D + 1 log ( N ) ) for datasets with N points in each of D + 1 dimensions. Furthermore, our Fourier-based implementation reveals a connection between robustness to noise and the spectra of test functions, which we utilize in an a priori selection algorithm for test functions. Finally, we introduce a learning algorithm for the threshold in sequential-thresholding least-squares (STLS) that enables model identification from large libraries, and we utilize scale invariance at the continuum level to identify PDEs from poorly-scaled datasets. We demonstrate WSINDy's robustness, speed and accuracy on several challenging PDEs. Code is publicly available on GitHub at https://github.com/MathBioCU/WSINDy_PDE.
Collapse
Affiliation(s)
- Daniel A Messenger
- Department of Applied Mathematics, University of Colorado Boulder, 11 Engineering Dr., Boulder, CO 80309, USA
| | - David M Bortz
- Department of Applied Mathematics, University of Colorado Boulder, 11 Engineering Dr., Boulder, CO 80309, USA
| |
Collapse
|
12
|
Martina-Perez S, Simpson MJ, Baker RE. Bayesian uncertainty quantification for data-driven equation learning. Proc Math Phys Eng Sci 2021; 477:20210426. [PMID: 35153587 PMCID: PMC8548080 DOI: 10.1098/rspa.2021.0426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 09/30/2021] [Indexed: 12/27/2022] Open
Abstract
Equation learning aims to infer differential equation models from data. While a number of studies have shown that differential equation models can be successfully identified when the data are sufficiently detailed and corrupted with relatively small amounts of noise, the relationship between observation noise and uncertainty in the learned differential equation models remains unexplored. We demonstrate that for noisy datasets there exists great variation in both the structure of the learned differential equation models and their parameter values. We explore how to exploit multiple datasets to quantify uncertainty in the learned models, and at the same time draw mechanistic conclusions about the target differential equations. We showcase our results using simulation data from a relatively straightforward agent-based model (ABM) which has a well-characterized partial differential equation description that provides highly accurate predictions of averaged ABM behaviours in relevant regions of parameter space. Our approach combines equation learning methods with Bayesian inference approaches so that a quantification of uncertainty can be given by the posterior parameter distribution of the learned model.
Collapse
Affiliation(s)
| | - Matthew J Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Ruth E Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| |
Collapse
|
13
|
Messenger DA, Bortz DM. WEAK SINDy: GALERKIN-BASED DATA-DRIVEN MODEL SELECTION. MULTISCALE MODELING & SIMULATION : A SIAM INTERDISCIPLINARY JOURNAL 2021; 19:1474-1497. [PMID: 38239761 PMCID: PMC10795802 DOI: 10.1137/20m1343166] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2024]
Abstract
We present a novel weak formulation and discretization for discovering governing equations from noisy measurement data. This method of learning differential equations from data fits into a new class of algorithms that replace pointwise derivative approximations with linear transformations and variance reduction techniques. Compared to the standard SINDy algorithm presented in [S. L. Brunton, J. L. Proctor, and J. N. Kutz, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. 3932-3937], our so-called weak SINDy (WSINDy) algorithm allows for reliable model identification from data with large noise (often with ratios greater than 0.1) and reduces the error in the recovered coefficients to enable accurate prediction. Moreover, the coefficient error scales linearly with the noise level, leading to high-accuracy recovery in the low-noise regime. Altogether, WSINDy combines the simplicity and efficiency of the SINDy algorithm with the natural noise reduction of integration, as demonstrated in [H. Schaeffer and S. G. McCalla, Phys. Rev. E, 96 (2017), 023302], to arrive at a robust and accurate method of sparse recovery.
Collapse
Affiliation(s)
- Daniel A Messenger
- Department of Applied Mathematics, University of Colorado, Boulder, CO 80309-0526 USA
| | - David M Bortz
- Department of Applied Mathematics, University of Colorado, Boulder, CO 80309-0526 USA
| |
Collapse
|
14
|
Nardini JT, Baker RE, Simpson MJ, Flores KB. Learning differential equation models from stochastic agent-based model simulations. J R Soc Interface 2021; 18:20200987. [PMID: 33726540 PMCID: PMC8086865 DOI: 10.1098/rsif.2020.0987] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 02/22/2021] [Indexed: 12/15/2022] Open
Abstract
Agent-based models provide a flexible framework that is frequently used for modelling many biological systems, including cell migration, molecular dynamics, ecology and epidemiology. Analysis of the model dynamics can be challenging due to their inherent stochasticity and heavy computational requirements. Common approaches to the analysis of agent-based models include extensive Monte Carlo simulation of the model or the derivation of coarse-grained differential equation models to predict the expected or averaged output from the agent-based model. Both of these approaches have limitations, however, as extensive computation of complex agent-based models may be infeasible, and coarse-grained differential equation models can fail to accurately describe model dynamics in certain parameter regimes. We propose that methods from the equation learning field provide a promising, novel and unifying approach for agent-based model analysis. Equation learning is a recent field of research from data science that aims to infer differential equation models directly from data. We use this tutorial to review how methods from equation learning can be used to learn differential equation models from agent-based model simulations. We demonstrate that this framework is easy to use, requires few model simulations, and accurately predicts model dynamics in parameter regions where coarse-grained differential equation models fail to do so. We highlight these advantages through several case studies involving two agent-based models that are broadly applicable to biological phenomena: a birth-death-migration model commonly used to explore cell biology experiments and a susceptible-infected-recovered model of infectious disease spread.
Collapse
Affiliation(s)
- John T. Nardini
- North Carolina State University, Mathematics, Raleigh, NC, USA
| | - Ruth E. Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane 4001, Australia
| | - Kevin B. Flores
- North Carolina State University, Mathematics, Raleigh, NC, USA
| |
Collapse
|
15
|
Hywood JD, Rice G, Pageon SV, Read MN, Biro M. Detection and characterization of chemotaxis without cell tracking. J R Soc Interface 2021; 18:20200879. [PMID: 33715400 PMCID: PMC8086846 DOI: 10.1098/rsif.2020.0879] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 02/15/2021] [Indexed: 02/06/2023] Open
Abstract
Swarming has been observed in various biological systems from collective animal movements to immune cells. In the cellular context, swarming is driven by the secretion of chemotactic factors. Despite the critical role of chemotactic swarming, few methods to robustly identify and quantify this phenomenon exist. Here, we present a novel method for the analysis of time series of positional data generated from realizations of agent-based processes. We convert the positional data for each individual time point to a function measuring agent aggregation around a given area of interest, hence generating a functional time series. The functional time series, and a more easily visualized swarming metric of agent aggregation derived from these functions, provide useful information regarding the evolution of the underlying process over time. We extend our method to build upon the modelling of collective motility using drift-diffusion partial differential equations (PDEs). Using a functional linear model, we are able to use the functional time series to estimate the drift and diffusivity terms associated with the underlying PDE. By producing an accurate estimate for the drift coefficient, we can infer the strength and range of attraction or repulsion exerted on agents, as in chemotaxis. Our approach relies solely on using agent positional data. The spatial distribution of diffusing chemokines is not required, nor do individual agents need to be tracked over time. We demonstrate our approach using random walk simulations of chemotaxis and experiments investigating cytotoxic T cells interacting with tumouroids.
Collapse
Affiliation(s)
- Jack D. Hywood
- Sydney Medical School, The University of Sydney, Sydney, Australia
| | - Gregory Rice
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Canada
| | - Sophie V. Pageon
- EMBL Australia, Single Molecule Science node, School of Medical Sciences, University of New South Wales, Sydney, Australia
| | - Mark N. Read
- School of Computer Science & Charles Perkins Centre, University of Sydney, Sydney, Australia
| | - Maté Biro
- EMBL Australia, Single Molecule Science node, School of Medical Sciences, University of New South Wales, Sydney, Australia
| |
Collapse
|
16
|
Lee D, Jayaraman A, Kwon JS. Development of a hybrid model for a partially known intracellular signaling pathway through correction term estimation and neural network modeling. PLoS Comput Biol 2020; 16:e1008472. [PMID: 33315899 PMCID: PMC7769624 DOI: 10.1371/journal.pcbi.1008472] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/28/2020] [Accepted: 10/26/2020] [Indexed: 12/30/2022] Open
Abstract
Developing an accurate first-principle model is an important step in employing systems biology approaches to analyze an intracellular signaling pathway. However, an accurate first-principle model is difficult to be developed since it requires in-depth mechanistic understandings of the signaling pathway. Since underlying mechanisms such as the reaction network structure are not fully understood, significant discrepancy exists between predicted and actual signaling dynamics. Motivated by these considerations, this work proposes a hybrid modeling approach that combines a first-principle model and an artificial neural network (ANN) model so that predictions of the hybrid model surpass those of the original model. First, the proposed approach determines an optimal subset of model states whose dynamics should be corrected by the ANN by examining the correlation between each state and outputs through relative order. Second, an L2-regularized least-squares problem is solved to infer values of the correction terms that are necessary to minimize the discrepancy between the model predictions and available measurements. Third, an ANN is developed to generalize relationships between the values of the correction terms and the system dynamics. Lastly, the original first-principle model is coupled with the developed ANN to finalize the hybrid model development so that the model will possess generalized prediction capabilities while retaining the model interpretability. We have successfully validated the proposed methodology with two case studies, simplified apoptosis and lipopolysaccharide-induced NFκB signaling pathways, to develop hybrid models with in silico and in vitro measurements, respectively. An intracellular signaling pathway is often represented by a set of nonlinear ordinary differential equations, which translate our current knowledge about the signaling pathway into a testable mathematical model. However, predictions from such models are often subject to high uncertainty since many signaling pathways are only partially known beforehand. In this study, we propose a systematic approach to develop a hybrid model to improve model accuracy by combining machine learning and the first-principle modeling. Specifically, model correction terms are learned from discrepancy between model predictions and measurements, and these terms are added to the first-principle model to enhance the prediction accuracy. Once these correction terms are learned from the data, an artificial neural network (ANN) model is developed to find an empirical relation between the model and the correction terms so that the developed ANN can be used to posses improved predictive capabilities even in new operating conditions (i.e., generalizability). The final hybrid model is then constructed by coupling the first-principle model with the developed ANN.
Collapse
Affiliation(s)
- Dongheon Lee
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
- Texas A&M Energy Institute, Texas A&M University, College Station, Texas, USA
| | - Arul Jayaraman
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
- Department of Biomedical Engineering, Texas A&M University, College Station, Texas, USA
| | - Joseph S. Kwon
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas, USA
- Texas A&M Energy Institute, Texas A&M University, College Station, Texas, USA
- * E-mail:
| |
Collapse
|
17
|
Lagergren JH, Nardini JT, Baker RE, Simpson MJ, Flores KB. Biologically-informed neural networks guide mechanistic modeling from sparse experimental data. PLoS Comput Biol 2020; 16:e1008462. [PMID: 33259472 PMCID: PMC7732115 DOI: 10.1371/journal.pcbi.1008462] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 12/11/2020] [Accepted: 10/22/2020] [Indexed: 11/18/2022] Open
Abstract
Biologically-informed neural networks (BINNs), an extension of physics-informed neural networks [1], are introduced and used to discover the underlying dynamics of biological systems from sparse experimental data. In the present work, BINNs are trained in a supervised learning framework to approximate in vitro cell biology assay experiments while respecting a generalized form of the governing reaction-diffusion partial differential equation (PDE). By allowing the diffusion and reaction terms to be multilayer perceptrons (MLPs), the nonlinear forms of these terms can be learned while simultaneously converging to the solution of the governing PDE. Further, the trained MLPs are used to guide the selection of biologically interpretable mechanistic forms of the PDE terms which provides new insights into the biological and physical mechanisms that govern the dynamics of the observed system. The method is evaluated on sparse real-world data from wound healing assays with varying initial cell densities [2]. In this work we extend equation learning methods to be feasible for biological applications with nonlinear dynamics and where data are often sparse and noisy. Physics-informed neural networks have recently been shown to approximate solutions of PDEs from simulated noisy data while simultaneously optimizing the PDE parameters. However, the success of this method requires the correct specification of the governing PDE, which may not be known in practice. Here, we present an extension of the algorithm that allows neural networks to learn the nonlinear terms of the governing system without the need to specify the mechanistic form of the PDE. Our method is demonstrated on real-world biological data from scratch assay experiments and used to discover a previously unconsidered biological mechanism that describes delayed population response to the scratch.
Collapse
Affiliation(s)
- John H. Lagergren
- Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA
- Center for Research and Scientific Computation, North Carolina State University, Raleigh, North Carolina, USA
- * E-mail: (JHL); (KBF)
| | - John T. Nardini
- Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA
- Statistical and Applied Mathematical Sciences Institute, Durham, North Carolina, USA
| | - Ruth E. Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Kevin B. Flores
- Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA
- Center for Research and Scientific Computation, North Carolina State University, Raleigh, North Carolina, USA
- * E-mail: (JHL); (KBF)
| |
Collapse
|
18
|
Nardini JT, Lagergren JH, Hawkins-Daarud A, Curtin L, Morris B, Rutter EM, Swanson KR, Flores KB. Learning Equations from Biological Data with Limited Time Samples. Bull Math Biol 2020; 82:119. [PMID: 32909137 PMCID: PMC8409251 DOI: 10.1007/s11538-020-00794-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 08/16/2020] [Indexed: 01/25/2023]
Abstract
Equation learning methods present a promising tool to aid scientists in the modeling process for biological data. Previous equation learning studies have demonstrated that these methods can infer models from rich datasets; however, the performance of these methods in the presence of common challenges from biological data has not been thoroughly explored. We present an equation learning methodology comprised of data denoising, equation learning, model selection and post-processing steps that infers a dynamical systems model from noisy spatiotemporal data. The performance of this methodology is thoroughly investigated in the face of several common challenges presented by biological data, namely, sparse data sampling, large noise levels, and heterogeneity between datasets. We find that this methodology can accurately infer the correct underlying equation and predict unobserved system dynamics from a small number of time samples when the data are sampled over a time interval exhibiting both linear and nonlinear dynamics. Our findings suggest that equation learning methods can be used for model discovery and selection in many areas of biology when an informative dataset is used. We focus on glioblastoma multiforme modeling as a case study in this work to highlight how these results are informative for data-driven modeling-based tumor invasion predictions.
Collapse
Affiliation(s)
- John T Nardini
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.
- The Statistical and Applied Mathematical Sciences Institute, Durham, NC, USA.
| | - John H Lagergren
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Andrea Hawkins-Daarud
- Mathematical NeuroOncology Laboratory, Precision Neurotherapeutics Innovation Program, Mayo Clinic, Phoenix, AZ, USA
| | - Lee Curtin
- Mathematical NeuroOncology Laboratory, Precision Neurotherapeutics Innovation Program, Mayo Clinic, Phoenix, AZ, USA
| | - Bethan Morris
- Centre for Mathematical Medicine and Biology, University of Nottingham, Nottingham, UK
| | - Erica M Rutter
- Department of Applied Mathematics, University of California, Merced, Merced, CA, USA
| | - Kristin R Swanson
- Mathematical NeuroOncology Laboratory, Precision Neurotherapeutics Innovation Program, Mayo Clinic, Phoenix, AZ, USA
| | - Kevin B Flores
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|