1
|
Gorin G, Carilli M, Chari T, Pachter L. Spectral neural approximations for models of transcriptional dynamics. Biophys J 2024; 123:2892-2901. [PMID: 38715358 PMCID: PMC11393700 DOI: 10.1016/j.bpj.2024.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 03/22/2024] [Accepted: 04/30/2024] [Indexed: 05/18/2024] Open
Abstract
The advent of high-throughput transcriptomics provides an opportunity to advance mechanistic understanding of transcriptional processes and their connections to cellular function at an unprecedented, genome-wide scale. These transcriptional systems, which involve discrete stochastic events, are naturally modeled using chemical master equations (CMEs), which can be solved for probability distributions to fit biophysical rates that govern system dynamics. While CME models have been used as standards in fluorescence transcriptomics for decades to analyze single-species RNA distributions, there are often no closed-form solutions to CMEs that model multiple species, such as nascent and mature RNA transcript counts. This has prevented the application of standard likelihood-based statistical methods for analyzing high-throughput, multi-species transcriptomic datasets using biophysical models. Inspired by recent work in machine learning to learn solutions to complex dynamical systems, we leverage neural networks and statistical understanding of system distributions to produce accurate approximations to a steady-state bivariate distribution for a model of the RNA life cycle that includes nascent and mature molecules. The steady-state distribution to this simple model has no closed-form solution and requires intensive numerical solving techniques: our approach reduces likelihood evaluation time by several orders of magnitude. We demonstrate two approaches, whereby solutions are approximated by 1) learning the weights of kernel distributions with constrained parameters or 2) learning both weights and scaling factors for parameters of kernel distributions. We show that our strategies, denoted by kernel weight regression and parameter-scaled kernel weight regression, respectively, enable broad exploration of parameter space and can be used in existing likelihood frameworks to infer transcriptional burst sizes, RNA splicing rates, and mRNA degradation rates from experimental transcriptomic data.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California; Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California.
| |
Collapse
|
2
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring Stochastic Rates from Heterogeneous Snapshots of Particle Positions. Bull Math Biol 2024; 86:74. [PMID: 38740619 DOI: 10.1007/s11538-024-01301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/20/2024] [Indexed: 05/16/2024]
Abstract
Many imaging techniques for biological systems-like fixation of cells coupled with fluorescence microscopy-provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | - Scott A McKinley
- Department of Mathematics, Tulane University, New Orleans, LA, USA
| | - Fangyuan Ding
- Departments of Biomedical Engineering, Developmental and Cell Biology, University of California, Irvine, Irvine, USA
| | - Richard B Lehoucq
- Discrete Math and Optimization, Sandia National Laboratories, Albuquerque, NM, USA
| |
Collapse
|
3
|
Miles CE, McKinley SA, Ding F, Lehoucq RB. Inferring stochastic rates from heterogeneous snapshots of particle positions. ARXIV 2023:arXiv:2311.04880v1. [PMID: 37986720 PMCID: PMC10659442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Many imaging techniques for biological systems - like fixation of cells coupled with fluorescence microscopy - provide sharp spatial resolution in reporting locations of individuals at a single moment in time but also destroy the dynamics they intend to capture. These snapshot observations contain no information about individual trajectories, but still encode information about movement and demographic dynamics, especially when combined with a well-motivated biophysical model. The relationship between spatially evolving populations and single-moment representations of their collective locations is well-established with partial differential equations (PDEs) and their inverse problems. However, experimental data is commonly a set of locations whose number is insufficient to approximate a continuous-in-space PDE solution. Here, motivated by popular subcellular imaging data of gene expression, we embrace the stochastic nature of the data and investigate the mathematical foundations of parametrically inferring demographic rates from snapshots of particles undergoing birth, diffusion, and death in a nuclear or cellular domain. Toward inference, we rigorously derive a connection between individual particle paths and their presentation as a Poisson spatial process. Using this framework, we investigate the properties of the resulting inverse problem and study factors that affect quality of inference. One pervasive feature of this experimental regime is the presence of cell-to-cell heterogeneity. Rather than being a hindrance, we show that cell-to-cell geometric heterogeneity can increase the quality of inference on dynamics for certain parameter regimes. Altogether, the results serve as a basis for more detailed investigations of subcellular spatial patterns of RNA molecules and other stochastically evolving populations that can only be observed for single instants in their time evolution.
Collapse
Affiliation(s)
| | | | - Fangyuan Ding
- Department of Biomedical Engineering, University of California, Irvine
| | | |
Collapse
|
4
|
Gorin G, Vastola JJ, Pachter L. Studying stochastic systems biology of the cell with single-cell genomics data. Cell Syst 2023; 14:822-843.e22. [PMID: 37751736 PMCID: PMC10725240 DOI: 10.1016/j.cels.2023.08.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 08/16/2023] [Accepted: 08/25/2023] [Indexed: 09/28/2023]
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - John J Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125, USA.
| |
Collapse
|
5
|
Sahbani M, Das S, Green JR. Classical Fisher information for differentiable dynamical systems. CHAOS (WOODBURY, N.Y.) 2023; 33:103139. [PMID: 37889952 DOI: 10.1063/5.0165484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 10/04/2023] [Indexed: 10/29/2023]
Abstract
Fisher information is a lower bound on the uncertainty in the statistical estimation of classical and quantum mechanical parameters. While some deterministic dynamical systems are not subject to random fluctuations, they do still have a form of uncertainty. Infinitesimal perturbations to the initial conditions can grow exponentially in time, a signature of deterministic chaos. As a measure of this uncertainty, we introduce another classical information, specifically for the deterministic dynamics of isolated, closed, or open classical systems not subject to noise. This classical measure of information is defined with Lyapunov vectors in tangent space, making it less akin to the classical Fisher information and more akin to the quantum Fisher information defined with wavevectors in Hilbert space. Our analysis of the local state space structure and linear stability leads to upper and lower bounds on this information, giving it an interpretation as the net stretching action of the flow. Numerical calculations of this information for illustrative mechanical examples show that it depends directly on the phase space curvature and speed of the flow.
Collapse
Affiliation(s)
- Mohamed Sahbani
- Department of Chemistry, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
- Department of Physics, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
| | - Swetamber Das
- Department of Chemistry, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
- Department of Physics, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
| | - Jason R Green
- Department of Chemistry, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
- Department of Physics, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
| |
Collapse
|
6
|
Grabowski F, Nałęcz-Jawecki P, Lipniacki T. Predictive power of non-identifiable models. Sci Rep 2023; 13:11143. [PMID: 37429934 DOI: 10.1038/s41598-023-37939-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 06/29/2023] [Indexed: 07/12/2023] Open
Abstract
Resolving practical non-identifiability of computational models typically requires either additional data or non-algorithmic model reduction, which frequently results in models containing parameters lacking direct interpretation. Here, instead of reducing models, we explore an alternative, Bayesian approach, and quantify the predictive power of non-identifiable models. We considered an example biochemical signalling cascade model as well as its mechanical analogue. For these models, we demonstrated that by measuring a single variable in response to a properly chosen stimulation protocol, the dimensionality of the parameter space is reduced, which allows for predicting the measured variable's trajectory in response to different stimulation protocols even if all model parameters remain unidentified. Moreover, one can predict how such a trajectory will transform in the case of a multiplicative change of an arbitrary model parameter. Successive measurements of remaining variables further reduce the dimensionality of the parameter space and enable new predictions. We analysed potential pitfalls of the proposed approach that can arise when the investigated model is oversimplified, incorrect, or when the training protocol is inadequate. The main advantage of the suggested iterative approach is that the predictive power of the model can be assessed and practically utilised at each step.
Collapse
Affiliation(s)
- Frederic Grabowski
- Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland
| | - Paweł Nałęcz-Jawecki
- Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland
| | - Tomasz Lipniacki
- Institute of Fundamental Technological Research, Polish Academy of Sciences, Warsaw, Poland.
| |
Collapse
|
7
|
Gorin G, Vastola JJ, Pachter L. Studying stochastic systems biology of the cell with single-cell genomics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.17.541250. [PMID: 37292934 PMCID: PMC10245677 DOI: 10.1101/2023.05.17.541250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125
| | - John J. Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA, 02115
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125
| |
Collapse
|
8
|
Vo HD, Forero-Quintero LS, Aguilera LU, Munsky B. Analysis and design of single-cell experiments to harvest fluctuation information while rejecting measurement noise. Front Cell Dev Biol 2023; 11:1133994. [PMID: 37305680 PMCID: PMC10250612 DOI: 10.3389/fcell.2023.1133994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 05/10/2023] [Indexed: 06/13/2023] Open
Abstract
Introduction: Despite continued technological improvements, measurement errors always reduce or distort the information that any real experiment can provide to quantify cellular dynamics. This problem is particularly serious for cell signaling studies to quantify heterogeneity in single-cell gene regulation, where important RNA and protein copy numbers are themselves subject to the inherently random fluctuations of biochemical reactions. Until now, it has not been clear how measurement noise should be managed in addition to other experiment design variables (e.g., sampling size, measurement times, or perturbation levels) to ensure that collected data will provide useful insights on signaling or gene expression mechanisms of interest. Methods: We propose a computational framework that takes explicit consideration of measurement errors to analyze single-cell observations, and we derive Fisher Information Matrix (FIM)-based criteria to quantify the information value of distorted experiments. Results and Discussion: We apply this framework to analyze multiple models in the context of simulated and experimental single-cell data for a reporter gene controlled by an HIV promoter. We show that the proposed approach quantitatively predicts how different types of measurement distortions affect the accuracy and precision of model identification, and we demonstrate that the effects of these distortions can be mitigated through explicit consideration during model inference. We conclude that this reformulation of the FIM could be used effectively to design single-cell experiments to optimally harvest fluctuation information while mitigating the effects of image distortion.
Collapse
Affiliation(s)
- Huy D. Vo
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Linda S. Forero-Quintero
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Luis U. Aguilera
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
| | - Brian Munsky
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, United States
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
9
|
Coulier A, Singh P, Sturrock M, Hellander A. Systematic comparison of modeling fidelity levels and parameter inference settings applied to negative feedback gene regulation. PLoS Comput Biol 2022; 18:e1010683. [PMID: 36520957 PMCID: PMC9799300 DOI: 10.1371/journal.pcbi.1010683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 12/29/2022] [Accepted: 10/25/2022] [Indexed: 12/23/2022] Open
Abstract
Quantitative stochastic models of gene regulatory networks are important tools for studying cellular regulation. Such models can be formulated at many different levels of fidelity. A practical challenge is to determine what model fidelity to use in order to get accurate and representative results. The choice is important, because models of successively higher fidelity come at a rapidly increasing computational cost. In some situations, the level of detail is clearly motivated by the question under study. In many situations however, many model options could qualitatively agree with available data, depending on the amount of data and the nature of the observations. Here, an important distinction is whether we are interested in inferring the true (but unknown) physical parameters of the model or if it is sufficient to be able to capture and explain available data. The situation becomes complicated from a computational perspective because inference needs to be approximate. Most often it is based on likelihood-free Approximate Bayesian Computation (ABC) and here determining which summary statistics to use, as well as how much data is needed to reach the desired level of accuracy, are difficult tasks. Ultimately, all of these aspects-the model fidelity, the available data, and the numerical choices for inference-interplay in a complex manner. In this paper we develop a computational pipeline designed to systematically evaluate inference accuracy for a wide range of true known parameters. We then use it to explore inference settings for negative feedback gene regulation. In particular, we compare a detailed spatial stochastic model, a coarse-grained compartment-based multiscale model, and the standard well-mixed model, across several data-scenarios and for multiple numerical options for parameter inference. Practically speaking, this pipeline can be used as a preliminary step to guide modelers prior to gathering experimental data. By training Gaussian processes to approximate the distance function values, we are able to substantially reduce the computational cost of running the pipeline.
Collapse
Affiliation(s)
- Adrien Coulier
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Prashant Singh
- Science for Life Laboratory, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Marc Sturrock
- Department of Physiology, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
10
|
Gorin G, Vastola JJ, Fang M, Pachter L. Interpretable and tractable models of transcriptional noise for the rational design of single-molecule quantification experiments. Nat Commun 2022; 13:7620. [PMID: 36494337 PMCID: PMC9734650 DOI: 10.1038/s41467-022-34857-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 11/09/2022] [Indexed: 12/13/2022] Open
Abstract
The question of how cell-to-cell differences in transcription rate affect RNA count distributions is fundamental for understanding biological processes underlying transcription. Answering this question requires quantitative models that are both interpretable (describing concrete biophysical phenomena) and tractable (amenable to mathematical analysis). This enables the identification of experiments which best discriminate between competing hypotheses. As a proof of principle, we introduce a simple but flexible class of models involving a continuous stochastic transcription rate driving a discrete RNA transcription and splicing process, and compare and contrast two biologically plausible hypotheses about transcription rate variation. One assumes variation is due to DNA experiencing mechanical strain, while the other assumes it is due to regulator number fluctuations. We introduce a framework for numerically and analytically studying such models, and apply Bayesian model selection to identify candidate genes that show signatures of each model in single-cell transcriptomic data from mouse glutamatergic neurons.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - John J Vastola
- Department of Neurobiology, Harvard Medical School, Boston, MA, 02115, USA
| | - Meichen Fang
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA.
| |
Collapse
|
11
|
Browning AP, Drovandi C, Turner IW, Jenner AL, Simpson MJ. Efficient inference and identifiability analysis for differential equation models with random parameters. PLoS Comput Biol 2022; 18:e1010734. [PMID: 36441811 PMCID: PMC9731444 DOI: 10.1371/journal.pcbi.1010734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 12/08/2022] [Accepted: 11/14/2022] [Indexed: 11/29/2022] Open
Abstract
Heterogeneity is a dominant factor in the behaviour of many biological processes. Despite this, it is common for mathematical and statistical analyses to ignore biological heterogeneity as a source of variability in experimental data. Therefore, methods for exploring the identifiability of models that explicitly incorporate heterogeneity through variability in model parameters are relatively underdeveloped. We develop a new likelihood-based framework, based on moment matching, for inference and identifiability analysis of differential equation models that capture biological heterogeneity through parameters that vary according to probability distributions. As our novel method is based on an approximate likelihood function, it is highly flexible; we demonstrate identifiability analysis using both a frequentist approach based on profile likelihood, and a Bayesian approach based on Markov-chain Monte Carlo. Through three case studies, we demonstrate our method by providing a didactic guide to inference and identifiability analysis of hyperparameters that relate to the statistical moments of model parameters from independent observed data. Our approach has a computational cost comparable to analysis of models that neglect heterogeneity, a significant improvement over many existing alternatives. We demonstrate how analysis of random parameter models can aid better understanding of the sources of heterogeneity from biological data.
Collapse
Affiliation(s)
- Alexander P. Browning
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Christopher Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia
| | - Ian W. Turner
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Adrianne L. Jenner
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia
- * E-mail:
| |
Collapse
|
12
|
Mikhalychev A, Zhevno K, Vlasenko S, Benediktovitch A, Ulyanenkova T, Ulyanenkov A. Fisher information for optimal planning of X-ray diffraction experiments. J Appl Crystallogr 2021. [DOI: 10.1107/s1600576721009869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Fisher information is a powerful mathematical tool suitable for quantification of data `informativity' and optimization of the experimental setup and measurement conditions. Here, it is applied to X-ray diffraction and an informational approach to choosing the optimal measurement configuration is proposed. The core idea is maximization of the information which can be extracted from the measured data set by the selected analysis technique, over the sets of accessible reflections and measurement geometries. The developed approach is applied to high-resolution X-ray diffraction measurements and microstructure analysis of multilayer samples, and its efficiency and consistency are demonstrated with the results of more straightforward Monte Carlo simulations.
Collapse
|
13
|
Jashnsaz H, Fox ZR, Hughes JJ, Li G, Munsky B, Neuert G. Diverse Cell Stimulation Kinetics Identify Predictive Signal Transduction Models. iScience 2020; 23:101565. [PMID: 33083733 PMCID: PMC7549069 DOI: 10.1016/j.isci.2020.101565] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 08/18/2020] [Accepted: 09/11/2020] [Indexed: 11/28/2022] Open
Abstract
Computationally understanding the molecular mechanisms that give rise to cell signaling responses upon different environmental, chemical, and genetic perturbations is a long-standing challenge that requires models that fit and predict quantitative responses for new biological conditions. Overcoming this challenge depends not only on good models and detailed experimental data but also on the rigorous integration of both. We propose a quantitative framework to perturb and model generic signaling networks using multiple and diverse changing environments (hereafter "kinetic stimulations") resulting in distinct pathway activation dynamics. We demonstrate that utilizing multiple diverse kinetic stimulations better constrains model parameters and enables predictions of signaling dynamics that would be impossible using traditional dose-response or individual kinetic stimulations. To demonstrate our approach, we use experimentally identified models to predict signaling dynamics in normal, mutated, and drug-treated conditions upon multitudes of kinetic stimulations and quantify which proteins and reaction rates are most sensitive to which extracellular stimulations.
Collapse
Affiliation(s)
- Hossein Jashnsaz
- Department of Molecular Physiology and Biophysics, School of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Zachary R. Fox
- Inria Saclay Ile-de-France, Palaiseau 91120, France
- Institut Pasteur, USR 3756 IP CNRS, Paris 75015, France
- Keck Scholars, School of Biomedical Engineering, Colorado State University, Fort Collins, CO 80523, USA
| | - Jason J. Hughes
- Department of Molecular Physiology and Biophysics, School of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Guoliang Li
- Department of Molecular Physiology and Biophysics, School of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Brian Munsky
- Keck Scholars, School of Biomedical Engineering, Colorado State University, Fort Collins, CO 80523, USA
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523, USA
| | - Gregor Neuert
- Department of Molecular Physiology and Biophysics, School of Medicine, Vanderbilt University, Nashville, TN 37232, USA
- Department of Biomedical Engineering, School of Engineering, Vanderbilt University, Nashville, TN 37232, USA
- Department of Pharmacology, School of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| |
Collapse
|
14
|
Fox ZR, Neuert G, Munsky B. Optimal Design of Single-Cell Experiments within Temporally Fluctuating Environments. COMPLEXITY 2020; 2020:8536365. [PMID: 32982137 PMCID: PMC7515449 DOI: 10.1155/2020/8536365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Modern biological experiments are becoming increasingly complex, and designing these experiments to yield the greatest possible quantitative insight is an open challenge. Increasingly, computational models of complex stochastic biological systems are being used to understand and predict biological behaviors or to infer biological parameters. Such quantitative analyses can also help to improve experiment designs for particular goals, such as to learn more about specific model mechanisms or to reduce prediction errors in certain situations. A classic approach to experiment design is to use the Fisher information matrix (FIM), which quantifies the expected information a particular experiment will reveal about model parameters. The Finite State Projection based FIM (FSP-FIM) was recently developed to compute the FIM for discrete stochastic gene regulatory systems, whose complex response distributions do not satisfy standard assumptions of Gaussian variations. In this work, we develop the FSP-FIM analysis for a stochastic model of stress response genes in S. cerevisae under time-varying MAPK induction. We verify this FSP-FIM analysis and use it to optimize the number of cells that should be quantified at particular times to learn as much as possible about the model parameters. We then extend the FSP-FIM approach to explore how different measurement times or genetic modifications help to minimize uncertainty in the sensing of extracellular environments, and we experimentally validate the FSP-FIM to rank single-cell experiments for their abilities to minimize estimation uncertainty of NaCl concentrations during yeast osmotic shock. This work demonstrates the potential of quantitative models to not only make sense of modern biological data sets, but to close the loop between quantitative modeling and experimental data collection.
Collapse
Affiliation(s)
- Zachary R Fox
- Inria Saclay Ile-de-France, Palaiseau 91120, France Institut Pasteur, USR 3756 IP CNRS Paris, 75015, France School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA
| | - Gregor Neuert
- Department of Molecular Physiology and Biophysics, School of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Brian Munsky
- Department of Chemical and Biological Engineering, Colorado State University Fort Collins, CO 80523, USA School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA
| |
Collapse
|
15
|
Catanach TA, Vo HD, Munsky B. BAYESIAN INFERENCE OF STOCHASTIC REACTION NETWORKS USING MULTIFIDELITY SEQUENTIAL TEMPERED MARKOV CHAIN MONTE CARLO. INTERNATIONAL JOURNAL FOR UNCERTAINTY QUANTIFICATION 2020; 10:515-542. [PMID: 34007522 PMCID: PMC8127724 DOI: 10.1615/int.j.uncertaintyquantification.2020033241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Stochastic reaction network models are often used to explain and predict the dynamics of gene regulation in single cells. These models usually involve several parameters, such as the kinetic rates of chemical reactions, that are not directly measurable and must be inferred from experimental data. Bayesian inference provides a rigorous probabilistic framework for identifying these parameters by finding a posterior parameter distribution that captures their uncertainty. Traditional computational methods for solving inference problems such as Markov Chain Monte Carlo methods based on classical Metropolis-Hastings algorithm involve numerous serial evaluations of the likelihood function, which in turn requires expensive forward solutions of the chemical master equation (CME). We propose an alternate approach based on a multifidelity extension of the Sequential Tempered Markov Chain Monte Carlo (ST-MCMC) sampler. This algorithm is built upon Sequential Monte Carlo and solves the Bayesian inference problem by decomposing it into a sequence of efficiently solved subproblems that gradually increase both model fidelity and the influence of the observed data. We reformulate the finite state projection (FSP) algorithm, a well-known method for solving the CME, to produce a hierarchy of surrogate master equations to be used in this multifidelity scheme. To determine the appropriate fidelity, we introduce a novel information-theoretic criteria that seeks to extract the most information about the ultimate Bayesian posterior from each model in the hierarchy without inducing significant bias. This novel sampling scheme is tested with high performance computing resources using biologically relevant problems.
Collapse
Affiliation(s)
- Thomas A. Catanach
- Sandia National Laboratories, Livermore, CA, 94550
- Address all correspondence to: Thomas A. Catanach,
| | - Huy D. Vo
- Dept. of Chemical and Biological Engr., Colorado State University, Fort Collins, CO, 80521
| | - Brian Munsky
- Dept. of Chemical and Biological Engr., Colorado State University, Fort Collins, CO, 80521
| |
Collapse
|
16
|
Vo HD, Fox Z, Baetica A, Munsky B. Bayesian Estimation for Stochastic Gene Expression Using Multifidelity Models. J Phys Chem B 2019; 123:2217-2234. [PMID: 30777763 DOI: 10.1021/acs.jpcb.8b10946] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The finite state projection (FSP) approach to solving the chemical master equation has enabled successful inference of discrete stochastic models to predict single-cell gene regulation dynamics. Unfortunately, the FSP approach is highly computationally intensive for all but the simplest models, an issue that is highly problematic when parameter inference and uncertainty quantification takes enormous numbers of parameter evaluations. To address this issue, we propose two new computational methods for the Bayesian inference of stochastic gene expression parameters given single-cell experiments. We formulate and verify an adaptive delayed acceptance Metropolis-Hastings (ADAMH) algorithm to utilize with reduced Krylov-basis projections of the FSP. We then introduce an extension of the ADAMH into a hybrid scheme that consists of an initial phase to construct a reduced model and a faster second phase to sample from the approximate posterior distribution determined by the constructed model. We test and compare both algorithms to an adaptive Metropolis algorithm with full FSP-based likelihood evaluations on three example models and simulated data to show that the new ADAMH variants achieve substantial speedup in comparison to the full FSP approach. By reducing the computational costs of parameter estimation, we expect the ADAMH approach to enable efficient data-driven estimation for more complex gene regulation models.
Collapse
Affiliation(s)
- Huy D Vo
- Department of Chemical and Biological Engineering , Colorado State University , Fort Collins , Colorado 80523 , United States
| | - Zachary Fox
- Keck Scholars, School of Biomedical Engineering , Colorado State University , Fort Collins , Colorado 80523 , United States
| | - Ania Baetica
- Department of Biochemistry and Biophysics , University of California San Francisco , San Francisco , California 94158 , United States
| | - Brian Munsky
- Department of Chemical and Biological Engineering , Colorado State University , Fort Collins , Colorado 80523 , United States.,Keck Scholars, School of Biomedical Engineering , Colorado State University , Fort Collins , Colorado 80523 , United States
| |
Collapse
|