1
|
Kilic Z, Schweiger M, Moyer C, Pressé S. Monte Carlo samplers for efficient network inference. PLoS Comput Biol 2023; 19:e1011256. [PMID: 37463156 DOI: 10.1371/journal.pcbi.1011256] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 06/09/2023] [Indexed: 07/20/2023] Open
Abstract
Accessing information on an underlying network driving a biological process often involves interrupting the process and collecting snapshot data. When snapshot data are stochastic, the data's structure necessitates a probabilistic description to infer underlying reaction networks. As an example, we may imagine wanting to learn gene state networks from the type of data collected in single molecule RNA fluorescence in situ hybridization (RNA-FISH). In the networks we consider, nodes represent network states, and edges represent biochemical reaction rates linking states. Simultaneously estimating the number of nodes and constituent parameters from snapshot data remains a challenging task in part on account of data uncertainty and timescale separations between kinetic parameters mediating the network. While parametric Bayesian methods learn parameters given a network structure (with known node numbers) with rigorously propagated measurement uncertainty, learning the number of nodes and parameters with potentially large timescale separations remain open questions. Here, we propose a Bayesian nonparametric framework and describe a hybrid Bayesian Markov Chain Monte Carlo (MCMC) sampler directly addressing these challenges. In particular, in our hybrid method, Hamiltonian Monte Carlo (HMC) leverages local posterior geometries in inference to explore the parameter space; Adaptive Metropolis Hastings (AMH) learns correlations between plausible parameter sets to efficiently propose probable models; and Parallel Tempering takes into account multiple models simultaneously with tempered information content to augment sampling efficiency. We apply our method to synthetic data mimicking single molecule RNA-FISH, a popular snapshot method in probing transcriptional networks to illustrate the identified challenges inherent to learning dynamical models from these snapshots and how our method addresses them.
Collapse
Affiliation(s)
- Zeliha Kilic
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America
| | - Max Schweiger
- Center for Biological Physics, ASU, Tempe, Arizona, United States of America
- Department of Physics ASU, Tempe, Arizona, United States of America
| | - Camille Moyer
- Center for Biological Physics, ASU, Tempe, Arizona, United States of America
- School of Mathematics and Statistical Sciences, ASU, Tempe, Arizona, United States of America
| | - Steve Pressé
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, United States of America
- Center for Biological Physics, ASU, Tempe, Arizona, United States of America
- School of Molecular Sciences, ASU, Tempe, Arizona, United States of America
| |
Collapse
|
2
|
Kilic Z, Schweiger M, Moyer C, Shepherd D, Pressé S. Gene expression model inference from snapshot RNA data using Bayesian non-parametrics. NATURE COMPUTATIONAL SCIENCE 2023; 3:174-183. [PMID: 38125199 PMCID: PMC10732567 DOI: 10.1038/s43588-022-00392-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2023]
Abstract
Gene expression models, which are key towards understanding cellular regulatory response, underlie observations of single-cell transcriptional dynamics. Although RNA expression data encode information on gene expression models, existing computational frameworks do not perform simultaneous Bayesian inference of gene expression models and parameters from such data. Rather, gene expression models-composed of gene states, their connectivities and associated parameters-are currently deduced by pre-specifying gene state numbers and connectivity before learning associated rate parameters. Here we propose a method to learn full distributions over gene states, state connectivities and associated rate parameters, simultaneously and self-consistently from single-molecule RNA counts. We propagate noise from fluctuating RNA counts over models by treating models themselves as random variables. We achieve this within a Bayesian non-parametric paradigm. We demonstrate our method on the Escherichia coli lacZ pathway and the Saccharomyces cerevisiae STL1 pathway, and verify its robustness on synthetic data.
Collapse
Affiliation(s)
- Zeliha Kilic
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN, USA
- These authors contributed equally: Zeliha Kilic, Max Schweiger
| | - Max Schweiger
- Center for Biological Physics, ASU, Tempe, AZ, USA
- Department of Physics, ASU, Tempe, AZ, USA
- These authors contributed equally: Zeliha Kilic, Max Schweiger
| | - Camille Moyer
- Center for Biological Physics, ASU, Tempe, AZ, USA
- School of Mathematics and Statistical Sciences, ASU, Tempe, AZ, USA
| | - Douglas Shepherd
- Center for Biological Physics, ASU, Tempe, AZ, USA
- Department of Physics, ASU, Tempe, AZ, USA
| | - Steve Pressé
- Center for Biological Physics, ASU, Tempe, AZ, USA
- Department of Physics, ASU, Tempe, AZ, USA
- School of Molecular Sciences, ASU, Tempe, AZ, USA
| |
Collapse
|
3
|
Yasui K. Merits and Demerits of ODE Modeling of Physicochemical Systems for Numerical Simulations. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27185860. [PMID: 36144593 PMCID: PMC9505051 DOI: 10.3390/molecules27185860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/02/2022] [Accepted: 09/07/2022] [Indexed: 11/25/2022]
Abstract
In comparison with the first-principles calculations mostly using partial differential equations (PDEs), numerical simulations with modeling by ordinary differential equations (ODEs) are sometimes superior in that they are computationally more economical and that important factors are more easily traced. However, a demerit of ODE modeling is the need of model validation through comparison with experimental data or results of the first-principles calculations. In the present review, examples of ODE modeling are reviewed such as sonochemical reactions inside a cavitation bubble, oriented attachment of nanocrystals, dynamic response of flexoelectric polarization, ultrasound-assisted sintering, and dynamics of a gas parcel in a thermoacoustic engine.
Collapse
Affiliation(s)
- Kyuichi Yasui
- National Institute of Advanced Industrial Science and Technology (AIST), Nagoya 463-8560, Japan
| |
Collapse
|
4
|
Solving the chemical master equation for monomolecular reaction systems and beyond: a Doi-Peliti path integral view. J Math Biol 2021; 83:48. [PMID: 34635944 DOI: 10.1007/s00285-021-01670-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 09/01/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
The chemical master equation (CME) is a fundamental description of interacting molecules commonly used to model chemical kinetics and noisy gene regulatory networks. Exact time-dependent solutions of the CME-which typically consists of infinitely many coupled differential equations-are rare, and are valuable for numerical benchmarking and getting intuition for the behavior of more complicated systems. Jahnke and Huisinga's landmark calculation of the exact time-dependent solution of the CME for monomolecular reaction systems is one of the most general analytic results known; however, it is hard to generalize, because it relies crucially on special properties of monomolecular reactions. In this paper, we rederive Jahnke and Huisinga's result on the time-dependent probability distribution and moments of monomolecular reaction systems using the Doi-Peliti path integral approach, which reduces solving the CME to evaluating many integrals. While the Doi-Peliti approach is less intuitive, it is also more mechanical, and hence easier to generalize. To illustrate how the Doi-Peliti approach can go beyond the method of Jahnke and Huisinga, we also find an explicit and exact time-dependent solution to a problem involving an autocatalytic reaction that Jahnke and Huisinga identified as not solvable using their method. Most interestingly, we are able to find a formal exact time-dependent solution for any CME whose list of reactions involves only zero and first order reactions, which may be the most general result currently known. This formal solution also yields a useful algorithm for efficiently computing numerical solutions to CMEs of this type.
Collapse
|
5
|
Dinh T, Sidje RB. An adaptive solution to the chemical master equation using quantized tensor trains with sliding windows. Phys Biol 2020; 17:065014. [PMID: 32610302 DOI: 10.1088/1478-3975/aba1d2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
To cope with an extremely large or even infinite state space when solving the chemical master equation in biological problems, a potent strategy is to restrict to a finite state projection (FSP) and represent the transition matrix and probability vector in quantized tensor train (QTT) format, leading to savings in storage while retaining accuracy. In an earlier adaptive FSP-QTT algorithm, the multidimensional state space was downsized and kept in the form of a hyper rectangle that was updated when needed by selectively doubling some of its side dimensions. However, this could result in a much larger state space than necessary, with the effect of hampering both the execution time and stepping scheme. In this work, we improve the algorithm by enabling sliding windows that can dynamically slide, shrink or expand, with updates driven by a number of stochastic simulation algorithm trajectories. The ensuing state space is a considerably reduced hyper rectangle containing only the most probable states at each time step. Three numerical experiments of varying difficulty are performed to compare our approach with the original adaptive FSP-QTT algorithm.
Collapse
|
6
|
Gorin G, Pachter L. Special function methods for bursty models of transcription. Phys Rev E 2020; 102:022409. [PMID: 32942485 DOI: 10.1103/physreve.102.022409] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 08/10/2020] [Indexed: 11/07/2022]
Abstract
We explore a Markov model used in the analysis of gene expression, involving the bursty production of pre-mRNA, its conversion to mature mRNA, and its consequent degradation. We demonstrate that the integration used to compute the solution of the stochastic system can be approximated by the evaluation of special functions. Furthermore, the form of the special function solution generalizes to a broader class of burst distributions. In light of the broader goal of biophysical parameter inference from transcriptomics data, we apply the method to simulated data, demonstrating effective control of precision and runtime. Finally, we propose and validate a non-Bayesian approach for parameter estimation based on the characteristic function of the target joint distribution of pre-mRNA and mRNA.
Collapse
Affiliation(s)
- Gennady Gorin
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering & Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125, USA
| |
Collapse
|
7
|
Catanach TA, Vo HD, Munsky B. BAYESIAN INFERENCE OF STOCHASTIC REACTION NETWORKS USING MULTIFIDELITY SEQUENTIAL TEMPERED MARKOV CHAIN MONTE CARLO. INTERNATIONAL JOURNAL FOR UNCERTAINTY QUANTIFICATION 2020; 10:515-542. [PMID: 34007522 PMCID: PMC8127724 DOI: 10.1615/int.j.uncertaintyquantification.2020033241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Stochastic reaction network models are often used to explain and predict the dynamics of gene regulation in single cells. These models usually involve several parameters, such as the kinetic rates of chemical reactions, that are not directly measurable and must be inferred from experimental data. Bayesian inference provides a rigorous probabilistic framework for identifying these parameters by finding a posterior parameter distribution that captures their uncertainty. Traditional computational methods for solving inference problems such as Markov Chain Monte Carlo methods based on classical Metropolis-Hastings algorithm involve numerous serial evaluations of the likelihood function, which in turn requires expensive forward solutions of the chemical master equation (CME). We propose an alternate approach based on a multifidelity extension of the Sequential Tempered Markov Chain Monte Carlo (ST-MCMC) sampler. This algorithm is built upon Sequential Monte Carlo and solves the Bayesian inference problem by decomposing it into a sequence of efficiently solved subproblems that gradually increase both model fidelity and the influence of the observed data. We reformulate the finite state projection (FSP) algorithm, a well-known method for solving the CME, to produce a hierarchy of surrogate master equations to be used in this multifidelity scheme. To determine the appropriate fidelity, we introduce a novel information-theoretic criteria that seeks to extract the most information about the ultimate Bayesian posterior from each model in the hierarchy without inducing significant bias. This novel sampling scheme is tested with high performance computing resources using biologically relevant problems.
Collapse
Affiliation(s)
- Thomas A. Catanach
- Sandia National Laboratories, Livermore, CA, 94550
- Address all correspondence to: Thomas A. Catanach,
| | - Huy D. Vo
- Dept. of Chemical and Biological Engr., Colorado State University, Fort Collins, CO, 80521
| | - Brian Munsky
- Dept. of Chemical and Biological Engr., Colorado State University, Fort Collins, CO, 80521
| |
Collapse
|
8
|
Pichon X, Lagha M, Mueller F, Bertrand E. A Growing Toolbox to Image Gene Expression in Single Cells: Sensitive Approaches for Demanding Challenges. Mol Cell 2018; 71:468-480. [DOI: 10.1016/j.molcel.2018.07.022] [Citation(s) in RCA: 112] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 07/19/2018] [Accepted: 07/20/2018] [Indexed: 12/21/2022]
|