1
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring Stochastic Rates from Heterogeneous Snapshots of Particle Positions. Bull Math Biol 2024; 86:74. [PMID: 38740619 DOI: 10.1007/s11538-024-01301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/20/2024] [Indexed: 05/16/2024]
Abstract
Many imaging techniques for biological systems-like fixation of cells coupled with fluorescence microscopy-provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | - Scott A McKinley
- Department of Mathematics, Tulane University, New Orleans, LA, USA
| | - Fangyuan Ding
- Departments of Biomedical Engineering, Developmental and Cell Biology, University of California, Irvine, Irvine, USA
| | - Richard B Lehoucq
- Discrete Math and Optimization, Sandia National Laboratories, Albuquerque, NM, USA
| |
Collapse
|
2
|
Kilic Z, Schweiger M, Moyer C, Pressé S. Monte Carlo samplers for efficient network inference. PLoS Comput Biol 2023; 19:e1011256. [PMID: 37463156 DOI: 10.1371/journal.pcbi.1011256] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 06/09/2023] [Indexed: 07/20/2023] Open
Abstract
Accessing information on an underlying network driving a biological process often involves interrupting the process and collecting snapshot data. When snapshot data are stochastic, the data's structure necessitates a probabilistic description to infer underlying reaction networks. As an example, we may imagine wanting to learn gene state networks from the type of data collected in single molecule RNA fluorescence in situ hybridization (RNA-FISH). In the networks we consider, nodes represent network states, and edges represent biochemical reaction rates linking states. Simultaneously estimating the number of nodes and constituent parameters from snapshot data remains a challenging task in part on account of data uncertainty and timescale separations between kinetic parameters mediating the network. While parametric Bayesian methods learn parameters given a network structure (with known node numbers) with rigorously propagated measurement uncertainty, learning the number of nodes and parameters with potentially large timescale separations remain open questions. Here, we propose a Bayesian nonparametric framework and describe a hybrid Bayesian Markov Chain Monte Carlo (MCMC) sampler directly addressing these challenges. In particular, in our hybrid method, Hamiltonian Monte Carlo (HMC) leverages local posterior geometries in inference to explore the parameter space; Adaptive Metropolis Hastings (AMH) learns correlations between plausible parameter sets to efficiently propose probable models; and Parallel Tempering takes into account multiple models simultaneously with tempered information content to augment sampling efficiency. We apply our method to synthetic data mimicking single molecule RNA-FISH, a popular snapshot method in probing transcriptional networks to illustrate the identified challenges inherent to learning dynamical models from these snapshots and how our method addresses them.
Collapse
Affiliation(s)
- Zeliha Kilic
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America
| | - Max Schweiger
- Center for Biological Physics, ASU, Tempe, Arizona, United States of America
- Department of Physics ASU, Tempe, Arizona, United States of America
| | - Camille Moyer
- Center for Biological Physics, ASU, Tempe, Arizona, United States of America
- School of Mathematics and Statistical Sciences, ASU, Tempe, Arizona, United States of America
| | - Steve Pressé
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America
- Center for Biological Physics, ASU, Tempe, Arizona, United States of America
- School of Molecular Sciences, ASU, Tempe, Arizona, United States of America
| |
Collapse
|
3
|
Vo HD, Forero-Quintero LS, Aguilera LU, Munsky B. Analysis and design of single-cell experiments to harvest fluctuation information while rejecting measurement noise. Front Cell Dev Biol 2023; 11:1133994. [PMID: 37305680 PMCID: PMC10250612 DOI: 10.3389/fcell.2023.1133994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 05/10/2023] [Indexed: 06/13/2023] Open
Abstract
Introduction: Despite continued technological improvements, measurement errors always reduce or distort the information that any real experiment can provide to quantify cellular dynamics. This problem is particularly serious for cell signaling studies to quantify heterogeneity in single-cell gene regulation, where important RNA and protein copy numbers are themselves subject to the inherently random fluctuations of biochemical reactions. Until now, it has not been clear how measurement noise should be managed in addition to other experiment design variables (e.g., sampling size, measurement times, or perturbation levels) to ensure that collected data will provide useful insights on signaling or gene expression mechanisms of interest. Methods: We propose a computational framework that takes explicit consideration of measurement errors to analyze single-cell observations, and we derive Fisher Information Matrix (FIM)-based criteria to quantify the information value of distorted experiments. Results and Discussion: We apply this framework to analyze multiple models in the context of simulated and experimental single-cell data for a reporter gene controlled by an HIV promoter. We show that the proposed approach quantitatively predicts how different types of measurement distortions affect the accuracy and precision of model identification, and we demonstrate that the effects of these distortions can be mitigated through explicit consideration during model inference. We conclude that this reformulation of the FIM could be used effectively to design single-cell experiments to optimally harvest fluctuation information while mitigating the effects of image distortion.
Collapse
Affiliation(s)
- Huy D. Vo
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Linda S. Forero-Quintero
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Luis U. Aguilera
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Brian Munsky
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
4
|
Kilic Z, Schweiger M, Moyer C, Shepherd D, Pressé S. Gene expression model inference from snapshot RNA data using Bayesian non-parametrics. NATURE COMPUTATIONAL SCIENCE 2023; 3:174-183. [PMID: 38125199 PMCID: PMC10732567 DOI: 10.1038/s43588-022-00392-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2023]
Abstract
Gene expression models, which are key towards understanding cellular regulatory response, underlie observations of single-cell transcriptional dynamics. Although RNA expression data encode information on gene expression models, existing computational frameworks do not perform simultaneous Bayesian inference of gene expression models and parameters from such data. Rather, gene expression models-composed of gene states, their connectivities and associated parameters-are currently deduced by pre-specifying gene state numbers and connectivity before learning associated rate parameters. Here we propose a method to learn full distributions over gene states, state connectivities and associated rate parameters, simultaneously and self-consistently from single-molecule RNA counts. We propagate noise from fluctuating RNA counts over models by treating models themselves as random variables. We achieve this within a Bayesian non-parametric paradigm. We demonstrate our method on the Escherichia coli lacZ pathway and the Saccharomyces cerevisiae STL1 pathway, and verify its robustness on synthetic data.
Collapse
Affiliation(s)
- Zeliha Kilic
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN, USA
- These authors contributed equally: Zeliha Kilic, Max Schweiger
| | - Max Schweiger
- Center for Biological Physics, ASU, Tempe, AZ, USA
- Department of Physics, ASU, Tempe, AZ, USA
- These authors contributed equally: Zeliha Kilic, Max Schweiger
| | - Camille Moyer
- Center for Biological Physics, ASU, Tempe, AZ, USA
- School of Mathematics and Statistical Sciences, ASU, Tempe, AZ, USA
| | - Douglas Shepherd
- Center for Biological Physics, ASU, Tempe, AZ, USA
- Department of Physics, ASU, Tempe, AZ, USA
| | - Steve Pressé
- Center for Biological Physics, ASU, Tempe, AZ, USA
- Department of Physics, ASU, Tempe, AZ, USA
- School of Molecular Sciences, ASU, Tempe, AZ, USA
| |
Collapse
|
5
|
Analytic solutions for stochastic hybrid models of gene regulatory networks. J Math Biol 2021; 82:9. [PMID: 33496854 DOI: 10.1007/s00285-021-01549-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 09/16/2020] [Accepted: 10/16/2020] [Indexed: 10/22/2022]
Abstract
Discrete-state stochastic models are a popular approach to describe the inherent stochasticity of gene expression in single cells. The analysis of such models is hindered by the fact that the underlying discrete state space is extremely large. Therefore hybrid models, in which protein counts are replaced by average protein concentrations, have become a popular alternative. The evolution of the corresponding probability density functions is given by a coupled system of hyperbolic PDEs. This system has Markovian nature but its hyperbolic structure makes it difficult to apply standard functional analytical methods. We are able to prove convergence towards the stationary solution and determine such equilibrium explicitly by combining abstract methods from the theory of positive operators and elementary ideas from potential analysis.
Collapse
|
6
|
Catanach TA, Vo HD, Munsky B. BAYESIAN INFERENCE OF STOCHASTIC REACTION NETWORKS USING MULTIFIDELITY SEQUENTIAL TEMPERED MARKOV CHAIN MONTE CARLO. INTERNATIONAL JOURNAL FOR UNCERTAINTY QUANTIFICATION 2020; 10:515-542. [PMID: 34007522 PMCID: PMC8127724 DOI: 10.1615/int.j.uncertaintyquantification.2020033241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Stochastic reaction network models are often used to explain and predict the dynamics of gene regulation in single cells. These models usually involve several parameters, such as the kinetic rates of chemical reactions, that are not directly measurable and must be inferred from experimental data. Bayesian inference provides a rigorous probabilistic framework for identifying these parameters by finding a posterior parameter distribution that captures their uncertainty. Traditional computational methods for solving inference problems such as Markov Chain Monte Carlo methods based on classical Metropolis-Hastings algorithm involve numerous serial evaluations of the likelihood function, which in turn requires expensive forward solutions of the chemical master equation (CME). We propose an alternate approach based on a multifidelity extension of the Sequential Tempered Markov Chain Monte Carlo (ST-MCMC) sampler. This algorithm is built upon Sequential Monte Carlo and solves the Bayesian inference problem by decomposing it into a sequence of efficiently solved subproblems that gradually increase both model fidelity and the influence of the observed data. We reformulate the finite state projection (FSP) algorithm, a well-known method for solving the CME, to produce a hierarchy of surrogate master equations to be used in this multifidelity scheme. To determine the appropriate fidelity, we introduce a novel information-theoretic criteria that seeks to extract the most information about the ultimate Bayesian posterior from each model in the hierarchy without inducing significant bias. This novel sampling scheme is tested with high performance computing resources using biologically relevant problems.
Collapse
Affiliation(s)
- Thomas A. Catanach
- Sandia National Laboratories, Livermore, CA, 94550
- Address all correspondence to: Thomas A. Catanach,
| | - Huy D. Vo
- Dept. of Chemical and Biological Engr., Colorado State University, Fort Collins, CO, 80521
| | - Brian Munsky
- Dept. of Chemical and Biological Engr., Colorado State University, Fort Collins, CO, 80521
| |
Collapse
|
7
|
Mitra ED, Hlavacek WS. Parameter Estimation and Uncertainty Quantification for Systems Biology Models. CURRENT OPINION IN SYSTEMS BIOLOGY 2019; 18:9-18. [PMID: 32719822 PMCID: PMC7384601 DOI: 10.1016/j.coisb.2019.10.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Mathematical models can provide quantitative insights into immunoreceptor signaling, and other biological processes, but require parameterization and uncertainty quantification before reliable predictions become possible. We review currently available methods and software tools to address these problems. We consider gradient-based and gradient-free methods for point estimation of parameter values, and methods of profile likelihood, bootstrapping, and Bayesian inference for uncertainty quantification. We consider recent and potential future applications of these methods to systems-level modeling of immune-related phenomena.
Collapse
Affiliation(s)
- Eshan D. Mitra
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - William S. Hlavacek
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|