1
|
Bobrovnikov M, Chai JT, Dinov ID. Interactive Visualization and Computation of 2D and 3D Probability Distributions. SN COMPUTER SCIENCE 2022; 3:327. [PMID: 37483660 PMCID: PMC10361712 DOI: 10.1007/s42979-022-01206-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 05/13/2022] [Indexed: 07/25/2023]
Abstract
Purpose Mathematical modeling, probability estimation, and statistical inference represent core elements of modern artificial intelligence (AI) approaches for data-driven prediction, forecasting, classification, risk-estimation, and prognosis. Currently there are many tools that help calculate and visualize univariate probability distributions, however, very few resources venture beyond into multivariate distributions, which are commonly used in advanced statistical inference and AI decision-making. This article presents a new web-calculator that enables some calculation and visualization of bivariate and trivariate probability distributions. Methods Several methods are explored to compute the joint bivariate and trivariate probability densities, including the optimal multivariate modeling using Gaussian copula. We developed an interactive webapp to visually illustrate the parallels between the mathematical formulation, computational implementation, and graphical depiction of multivariate probability density and cumulative distribution functions. To ensure the interface and functionality are hardware platform independent, scalable, and functional, the app and its component widgets are implemented using HTML5 and JavaScript. Results We validated the webapp by testing the multivariate copula models under different experimental conditions and inspecting the performance in terms of accuracy and reliability of the estimated multivariate probability densities and distribution function values. Conclusion This article demonstrates the construction, implementation, and utilization of multivariate probability calculators. The proposed webapp implementation is freely available online (https://socr.umich.edu/HTML5/BivariateNormal/BVN2/) and can be used to assist with education and research of a diverse array of data scientists, STEM instructors, and AI learners.
Collapse
Affiliation(s)
- Mark Bobrovnikov
- Statistics Online Computational Resource (SOCR) University of Michigan, Ann Arbor, MI 48109, USA https://socr.umich.edu
| | - Jared Tianyi Chai
- Statistics Online Computational Resource (SOCR) University of Michigan, Ann Arbor, MI 48109, USA https://socr.umich.edu
| | - Ivo D. Dinov
- Statistics Online Computational Resource (SOCR) University of Michigan, Ann Arbor, MI 48109, USA https://socr.umich.edu
| |
Collapse
|
2
|
Quantifying biochemical reaction rates from static population variability within incompletely observed complex networks. PLoS Comput Biol 2022; 18:e1010183. [PMID: 35731728 PMCID: PMC9216546 DOI: 10.1371/journal.pcbi.1010183] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 05/07/2022] [Indexed: 11/19/2022] Open
Abstract
Quantifying biochemical reaction rates within complex cellular processes remains a key challenge of systems biology even as high-throughput single-cell data have become available to characterize snapshots of population variability. That is because complex systems with stochastic and non-linear interactions are difficult to analyze when not all components can be observed simultaneously and systems cannot be followed over time. Instead of using descriptive statistical models, we show that incompletely specified mechanistic models can be used to translate qualitative knowledge of interactions into reaction rate functions from covariability data between pairs of components. This promises to turn a globally intractable problem into a sequence of solvable inference problems to quantify complex interaction networks from incomplete snapshots of their stochastic fluctuations.
Collapse
|
3
|
Davidović A, Chait R, Batt G, Ruess J. Parameter inference for stochastic biochemical models from perturbation experiments parallelised at the single cell level. PLoS Comput Biol 2022; 18:e1009950. [PMID: 35303737 PMCID: PMC8967023 DOI: 10.1371/journal.pcbi.1009950] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 03/30/2022] [Accepted: 02/21/2022] [Indexed: 01/30/2023] Open
Abstract
Understanding and characterising biochemical processes inside single cells requires experimental platforms that allow one to perturb and observe the dynamics of such processes as well as computational methods to build and parameterise models from the collected data. Recent progress with experimental platforms and optogenetics has made it possible to expose each cell in an experiment to an individualised input and automatically record cellular responses over days with fine time resolution. However, methods to infer parameters of stochastic kinetic models from single-cell longitudinal data have generally been developed under the assumption that experimental data is sparse and that responses of cells to at most a few different input perturbations can be observed. Here, we investigate and compare different approaches for calculating parameter likelihoods of single-cell longitudinal data based on approximations of the chemical master equation (CME) with a particular focus on coupling the linear noise approximation (LNA) or moment closure methods to a Kalman filter. We show that, as long as cells are measured sufficiently frequently, coupling the LNA to a Kalman filter allows one to accurately approximate likelihoods and to infer model parameters from data even in cases where the LNA provides poor approximations of the CME. Furthermore, the computational cost of filtering-based iterative likelihood evaluation scales advantageously in the number of measurement times and different input perturbations and is thus ideally suited for data obtained from modern experimental platforms. To demonstrate the practical usefulness of these results, we perform an experiment in which single cells, equipped with an optogenetic gene expression system, are exposed to various different light-input sequences and measured at several hundred time points and use parameter inference based on iterative likelihood evaluation to parameterise a stochastic model of the system.
Collapse
Affiliation(s)
- Anđela Davidović
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Remy Chait
- Biosciences, Living Systems Institute, University of Exeter, Exeter, The United Kingdom
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Gregory Batt
- Department of Computational Biology, Institut Pasteur, Paris, France
- Inria Paris, Paris, France
| | - Jakob Ruess
- Department of Computational Biology, Institut Pasteur, Paris, France
- Inria Paris, Paris, France
| |
Collapse
|
4
|
Unosson M, Brancaccio M, Hastings M, Johansen AM, Finkenstädt B. A spatio-temporal model to reveal oscillator phenotypes in molecular clocks: Parameter estimation elucidates circadian gene transcription dynamics in single-cells. PLoS Comput Biol 2021; 17:e1009698. [PMID: 34919546 PMCID: PMC8719734 DOI: 10.1371/journal.pcbi.1009698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 12/31/2021] [Accepted: 11/29/2021] [Indexed: 11/19/2022] Open
Abstract
We propose a stochastic distributed delay model together with a Markov random field prior and a measurement model for bioluminescence-reporting to analyse spatio-temporal gene expression in intact networks of cells. The model describes the oscillating time evolution of molecular mRNA counts through a negative transcriptional-translational feedback loop encoded in a chemical Langevin equation with a probabilistic delay distribution. The model is extended spatially by means of a multiplicative random effects model with a first order Markov random field prior distribution. Our methodology effectively separates intrinsic molecular noise, measurement noise, and extrinsic noise and phenotypic variation driving cell heterogeneity, while being amenable to parameter identification and inference. Based on the single-cell model we propose a novel computational stability analysis that allows us to infer two key characteristics, namely the robustness of the oscillations, i.e. whether the reaction network exhibits sustained or damped oscillations, and the profile of the regulation, i.e. whether the inhibition occurs over time in a more distributed versus a more direct manner, which affects the cells' ability to phase-shift to new schedules. We show how insight into the spatio-temporal characteristics of the circadian feedback loop in the suprachiasmatic nucleus (SCN) can be gained by applying the methodology to bioluminescence-reported expression of the circadian core clock gene Cry1 across mouse SCN tissue. We find that while (almost) all SCN neurons exhibit robust cell-autonomous oscillations, the parameters that are associated with the regulatory transcription profile give rise to a spatial division of the tissue between the central region whose oscillations are resilient to perturbation in the sense that they maintain a high degree of synchronicity, and the dorsal region which appears to phase shift in a more diversified way as a response to large perturbations and thus could be more amenable to entrainment.
Collapse
Affiliation(s)
- Måns Unosson
- Department of Statistics, University of Warwick, Coventry, United Kingdom
| | - Marco Brancaccio
- UK Dementia Research Institute at Imperial College London, Department of Brain Sciences, Faculty of Medicine, London, United Kingdom
| | - Michael Hastings
- MRC Laboratory of Molecular Biology, Division of Neurobiology, Cambridge, United Kingdom
| | - Adam M. Johansen
- Department of Statistics, University of Warwick, Coventry, United Kingdom
| | - Bärbel Finkenstädt
- Department of Statistics, University of Warwick, Coventry, United Kingdom
- The Zeeman Institute for Systems Biology & Infectious Disease Epidemiology Research, University of Warwick, Coventry, United Kingdom
- * E-mail:
| |
Collapse
|
5
|
Ion IG, Wildner C, Loukrezis D, Koeppl H, De Gersem H. Tensor-train approximation of the chemical master equation and its application for parameter inference. J Chem Phys 2021; 155:034102. [PMID: 34293878 DOI: 10.1063/5.0045521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In this work, we perform Bayesian inference tasks for the chemical master equation in the tensor-train format. The tensor-train approximation has been proven to be very efficient in representing high-dimensional data arising from the explicit representation of the chemical master equation solution. An additional advantage of representing the probability mass function in the tensor-train format is that parametric dependency can be easily incorporated by introducing a tensor product basis expansion in the parameter space. Time is treated as an additional dimension of the tensor and a linear system is derived to solve the chemical master equation in time. We exemplify the tensor-train method by performing inference tasks such as smoothing and parameter inference using the tensor-train framework. A very high compression ratio is observed for storing the probability mass function of the solution. Since all linear algebra operations are performed in the tensor-train format, a significant reduction in the computational time is observed as well.
Collapse
Affiliation(s)
- Ion Gabriel Ion
- Centre for Computational Engineering, Technische Universität Darmstadt, Darmstadt, Germany
| | - Christian Wildner
- Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Darmstadt, Germany
| | - Dimitrios Loukrezis
- Centre for Computational Engineering, Technische Universität Darmstadt, Darmstadt, Germany
| | - Heinz Koeppl
- Centre for Computational Engineering, Technische Universität Darmstadt, Darmstadt, Germany
| | - Herbert De Gersem
- Centre for Computational Engineering, Technische Universität Darmstadt, Darmstadt, Germany
| |
Collapse
|
6
|
Mikelson J, Khammash M. Likelihood-free nested sampling for parameter inference of biochemical reaction networks. PLoS Comput Biol 2020; 16:e1008264. [PMID: 33035218 PMCID: PMC7577508 DOI: 10.1371/journal.pcbi.1008264] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 10/21/2020] [Accepted: 08/16/2020] [Indexed: 12/03/2022] Open
Abstract
The development of mechanistic models of biological systems is a central part of Systems Biology. One major challenge in developing these models is the accurate inference of model parameters. In recent years, nested sampling methods have gained increased attention in the Systems Biology community due to the fact that they are parallelizable and provide error estimates with no additional computations. One drawback that severely limits the usability of these methods, however, is that they require the likelihood function to be available, and thus cannot be applied to systems with intractable likelihoods, such as stochastic models. Here we present a likelihood-free nested sampling method for parameter inference which overcomes these drawbacks. This method gives an unbiased estimator of the Bayesian evidence as well as samples from the posterior. We derive a lower bound on the estimators variance which we use to formulate a novel termination criterion for nested sampling. The presented method enables not only the reliable inference of the posterior of parameters for stochastic systems of a size and complexity that is challenging for traditional methods, but it also provides an estimate of the obtained variance. We illustrate our approach by applying it to several realistically sized models with simulated data as well as recently published biological data. We also compare our developed method with the two most popular other likeliood-free approaches: pMCMC and ABC-SMC. The C++ code of the proposed methods, together with test data, is available at the github web page https://github.com/Mijan/LFNS_paper. The behaviour of mathematical models of biochemical reactions is governed by model parameters encoding for various reaction rates, molecule concentrations and other biochemical quantities. As the general purpose of these models is to reproduce and predict the true biological response to different stimuli, the inference of these parameters, given experimental observations, is a crucial part of Systems Biology. While plenty of methods have been published for the inference of model parameters, most of them require the availability of the likelihood function and thus cannot be applied to models that do not allow for the computation of the likelihood. Further, most established methods do not provide an estimate of the variance of the obtained estimator. In this paper, we present a novel inference method that accurately approximates the posterior distribution of parameters and does not require the evaluation of the likelihood function. Our method is based on the nested sampling algorithm and approximates the likelihood with a particle filter. We show that the resulting posterior estimates are unbiased and provide a way to estimate not just the posterior distribution, but also an error estimate of the final estimator. We illustrate our method on several stochastic models with simulated data as well as one model of transcription with real biological data.
Collapse
|
7
|
Holehouse J, Cao Z, Grima R. Stochastic Modeling of Autoregulatory Genetic Feedback Loops: A Review and Comparative Study. Biophys J 2020; 118:1517-1525. [PMID: 32155410 PMCID: PMC7136347 DOI: 10.1016/j.bpj.2020.02.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 01/27/2020] [Accepted: 02/11/2020] [Indexed: 02/08/2023] Open
Abstract
Autoregulatory feedback loops are one of the most common network motifs. A wide variety of stochastic models have been constructed to understand how the fluctuations in protein numbers in these loops are influenced by the kinetic parameters of the main biochemical steps. These models differ according to 1) which subcellular processes are explicitly modeled, 2) the modeling methodology employed (discrete, continuous, or hybrid), and 3) whether they can be analytically solved for the steady-state distribution of protein numbers. We discuss the assumptions and properties of the main models in the literature, summarize our current understanding of the relationship between them, and highlight some of the insights gained through modeling.
Collapse
Affiliation(s)
- James Holehouse
- School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Zhixing Cao
- School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom; The Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Ramon Grima
- School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom.
| |
Collapse
|
8
|
Calderazzo S, Brancaccio M, Finkenstädt B. Filtering and inference for stochastic oscillators with distributed delays. Bioinformatics 2020; 35:1380-1387. [PMID: 30202930 PMCID: PMC6477979 DOI: 10.1093/bioinformatics/bty782] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Revised: 08/08/2018] [Accepted: 09/06/2018] [Indexed: 01/30/2023] Open
Abstract
Motivation The time evolution of molecular species involved in biochemical reaction networks often arises from complex stochastic processes involving many species and reaction events. Inference for such systems is profoundly challenged by the relative sparseness of experimental data, as measurements are often limited to a small subset of the participating species measured at discrete time points. The need for model reduction can be realistically achieved for oscillatory dynamics resulting from negative translational and transcriptional feedback loops by the introduction of probabilistic time-delays. Although this approach yields a simplified model, inference is challenging and subject to ongoing research. The linear noise approximation (LNA) has recently been proposed to address such systems in stochastic form and will be exploited here. Results We develop a novel filtering approach for the LNA in stochastic systems with distributed delays, which allows the parameter values and unobserved states of a stochastic negative feedback model to be inferred from univariate time-series data. The performance of the methods is tested for simulated data. Results are obtained for real data when the model is fitted to imaging data on Cry1, a key gene involved in the mammalian central circadian clock, observed via a luciferase reporter construct in a mouse suprachiasmatic nucleus. Availability and implementation Programmes are written in MATLAB and Statistics Toolbox Release 2016 b, The MathWorks, Inc., Natick, Massachusetts, USA. Sample code and Cry1 data are available on GitHub https://github.com/scalderazzo/FLNADD. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Silvia Calderazzo
- Department of Statistics, University of Warwick, Coventry, UK.,Division of Biostatistics, German Cancer Research Center, Heidelberg, Germany
| | - Marco Brancaccio
- Division of Neurobiology, Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | | |
Collapse
|
9
|
Analytical distributions for detailed models of stochastic gene expression in eukaryotic cells. Proc Natl Acad Sci U S A 2020; 117:4682-4692. [PMID: 32071224 PMCID: PMC7060679 DOI: 10.1073/pnas.1910888117] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The stochasticity of gene expression presents significant challenges to the modeling of genetic networks. A two-state model describing promoter switching, transcription, and messenger RNA (mRNA) decay is the standard model of stochastic mRNA dynamics in eukaryotic cells. Here, we extend this model to include mRNA maturation, cell division, gene replication, dosage compensation, and growth-dependent transcription. We derive expressions for the time-dependent distributions of nascent mRNA and mature mRNA numbers, provided two assumptions hold: 1) nascent mRNA dynamics are much faster than those of mature mRNA; and 2) gene-inactivation events occur far more frequently than gene-activation events. We confirm that thousands of eukaryotic genes satisfy these assumptions by using data from yeast, mouse, and human cells. We use the expressions to perform a sensitivity analysis of the coefficient of variation of mRNA fluctuations averaged over the cell cycle, for a large number of genes in mouse embryonic stem cells, identifying degradation and gene-activation rates as the most sensitive parameters. Furthermore, it is shown that, despite the model's complexity, the time-dependent distributions predicted by our model are generally well approximated by the negative binomial distribution. Finally, we extend our model to include translation, protein decay, and auto-regulatory feedback, and derive expressions for the approximate time-dependent protein-number distributions, assuming slow protein decay. Our expressions enable us to study how complex biological processes contribute to the fluctuations of gene products in eukaryotic cells, as well as allowing a detailed quantitative comparison with experimental data via maximum-likelihood methods.
Collapse
|
10
|
Lötstedt P. The Linear Noise Approximation for Spatially Dependent Biochemical Networks. Bull Math Biol 2019; 81:2873-2901. [PMID: 29644520 PMCID: PMC6677697 DOI: 10.1007/s11538-018-0428-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 03/29/2018] [Indexed: 10/26/2022]
Abstract
An algorithm for computing the linear noise approximation (LNA) of the reaction-diffusion master equation (RDME) is developed and tested. The RDME is often used as a model for biochemical reaction networks. The LNA is derived for a general discretization of the spatial domain of the problem. If M is the number of chemical species in the network and N is the number of nodes in the discretization in space, then the computational work to determine approximations of the mean and the covariances of the probability distributions is proportional to [Formula: see text] in a straightforward implementation. In our LNA algorithm, the work is proportional to [Formula: see text]. Since N usually is larger than M, this is a significant reduction. The accuracy of the approximation in the algorithm is estimated analytically and evaluated in numerical experiments.
Collapse
Affiliation(s)
- Per Lötstedt
- Division of Scientific Computing, Department of Information Technology, Uppsala University, SE-75105, Uppsala, Sweden.
| |
Collapse
|
11
|
Cao Z, Grima R. Accuracy of parameter estimation for auto-regulatory transcriptional feedback loops from noisy data. J R Soc Interface 2019; 16:20180967. [PMID: 30940028 PMCID: PMC6505555 DOI: 10.1098/rsif.2018.0967] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Bayesian and non-Bayesian moment-based inference methods are commonly used to estimate the parameters defining stochastic models of gene regulatory networks from noisy single cell or population snapshot data. However, a systematic investigation of the accuracy of the predictions of these methods remains missing. Here, we present the results of such a study using synthetic noisy data of a negative auto-regulatory transcriptional feedback loop, one of the most common building blocks of complex gene regulatory networks. We study the error in parameter estimation as a function of (i) number of cells in each sample; (ii) the number of time points; (iii) the highest-order moment of protein fluctuations used for inference; (iv) the moment-closure method used for likelihood approximation. We find that for sample sizes typical of flow cytometry experiments, parameter estimation by maximizing the likelihood is as accurate as using Bayesian methods but with a much reduced computational time. We also show that the choice of moment-closure method is the crucial factor determining the maximum achievable accuracy of moment-based inference methods. Common likelihood approximation methods based on the linear noise approximation or the zero cumulants closure perform poorly for feedback loops with large protein-DNA binding rates or large protein bursts; this is exacerbated for highly heterogeneous cell populations. By contrast, approximating the likelihood using the linear-mapping approximation or conditional derivative matching leads to highly accurate parameter estimates for a wide range of conditions.
Collapse
|
12
|
Kuzmanovska I, Milias-Argeitis A, Mikelson J, Zechner C, Khammash M. Parameter inference for stochastic single-cell dynamics from lineage tree data. BMC SYSTEMS BIOLOGY 2017; 11:52. [PMID: 28446158 PMCID: PMC5406901 DOI: 10.1186/s12918-017-0425-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Accepted: 04/12/2017] [Indexed: 11/21/2022]
Abstract
Background With the advance of experimental techniques such as time-lapse fluorescence microscopy, the availability of single-cell trajectory data has vastly increased, and so has the demand for computational methods suitable for parameter inference with this type of data. Most of currently available methods treat single-cell trajectories independently, ignoring the mother-daughter relationships and the information provided by the population structure. However, this information is essential if a process of interest happens at cell division, or if it evolves slowly compared to the duration of the cell cycle. Results In this work, we propose a Bayesian framework for parameter inference on single-cell time-lapse data from lineage trees. Our method relies on a combination of Sequential Monte Carlo for approximating the parameter likelihood function and Markov Chain Monte Carlo for parameter exploration. We demonstrate our inference framework on two simple examples in which the lineage tree information is crucial: one in which the cell phenotype can only switch at cell division and another where the cell state fluctuates slowly over timescales that extend well beyond the cell-cycle duration. Conclusion There exist several examples of biological processes, such as stem cell fate decisions or epigenetically controlled phase variation in bacteria, where the cell ancestry is expected to contain important information about the underlying system dynamics. Parameter inference methods that discard this information are expected to perform poorly for such type of processes. Our method provides a simple and computationally efficient way to take into account single-cell lineage tree data for the purpose of parameter inference and serves as a starting point for the development of more sophisticated and powerful approaches in the future. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0425-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irena Kuzmanovska
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel, 4058, Switzerland
| | - Andreas Milias-Argeitis
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel, 4058, Switzerland.,Groningen Biomolecular Sciences and Biotechnology, University of Groningen, Nijenborgh 4, Groningen, 9747, AG, Netherlands
| | - Jan Mikelson
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel, 4058, Switzerland
| | - Christoph Zechner
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel, 4058, Switzerland.,Max Planck Institute of Molecular Cell Biology and Genetics and Center for Systems Biology, Pfotenhauerstrasse 108, Dresden, 01307, Germany
| | - Mustafa Khammash
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, Basel, 4058, Switzerland.
| |
Collapse
|
13
|
Ocone A, Haghverdi L, Mueller NS, Theis FJ. Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics 2015; 31:i89-96. [PMID: 26072513 PMCID: PMC4765871 DOI: 10.1093/bioinformatics/btv257] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Motivation: High-dimensional single-cell snapshot data are becoming widespread in the systems biology community, as a mean to understand biological processes at the cellular level. However, as temporal information is lost with such data, mathematical models have been limited to capture only static features of the underlying cellular mechanisms. Results: Here, we present a modular framework which allows to recover the temporal behaviour from single-cell snapshot data and reverse engineer the dynamics of gene expression. The framework combines a dimensionality reduction method with a cell time-ordering algorithm to generate pseudo time-series observations. These are in turn used to learn transcriptional ODE models and do model selection on structural network features. We apply it on synthetic data and then on real hematopoietic stem cells data, to reconstruct gene expression dynamics during differentiation pathways and infer the structure of a key gene regulatory network. Availability and implementation: C++ and Matlab code available at https://www.helmholtz-muenchen.de/fileadmin/ICB/software/inferenceSnapshot.zip. Contact:fabian.theis@helmholtz-muenchen.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrea Ocone
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany and Department of Mathematics, Technical University Munich, 85747 Garching, Germany
| | - Laleh Haghverdi
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany and Department of Mathematics, Technical University Munich, 85747 Garching, Germany
| | - Nikola S Mueller
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany and Department of Mathematics, Technical University Munich, 85747 Garching, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany and Department of Mathematics, Technical University Munich, 85747 Garching, Germany Institute of Computational Biology, Helmholtz Zentrum München, 85764 Neuherberg, Germany and Department of Mathematics, Technical University Munich, 85747 Garching, Germany
| |
Collapse
|
14
|
Bronstein L, Zechner C, Koeppl H. Bayesian inference of reaction kinetics from single-cell recordings across a heterogeneous cell population. Methods 2015; 85:22-35. [PMID: 25986935 DOI: 10.1016/j.ymeth.2015.05.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Revised: 04/19/2015] [Accepted: 05/10/2015] [Indexed: 11/30/2022] Open
Abstract
Single-cell experimental techniques provide informative data to help uncover dynamical processes inside a cell. Making full use of such data requires dedicated computational methods to estimate biophysical process parameters and states in a model-based manner. In particular, the treatment of heterogeneity or cell-to-cell variability deserves special attention. The present article provides an introduction to one particular class of algorithms which employ marginalization in order to take heterogeneity into account. An overview of alternative approaches is provided for comparison. We treat two frequently encountered scenarios in single-cell experiments, namely, single-cell trajectory data and single-cell distribution data.
Collapse
Affiliation(s)
- L Bronstein
- Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Darmstadt, Germany
| | - C Zechner
- Department of Biosystems Sciences and Engineering, ETH Zürich, Basel, Switzerland
| | - H Koeppl
- Department of Electrical Engineering and Information Technology, Technische Universität Darmstadt, Darmstadt, Germany.
| |
Collapse
|
15
|
Hey KL, Momiji H, Featherstone K, Davis JRE, White MRH, Rand DA, Finkenstädt B. A stochastic transcriptional switch model for single cell imaging data. Biostatistics 2015; 16:655-69. [PMID: 25819987 PMCID: PMC4570576 DOI: 10.1093/biostatistics/kxv010] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 02/21/2015] [Indexed: 12/03/2022] Open
Abstract
Gene expression is made up of inherently stochastic processes within single cells and can be modeled through stochastic reaction networks (SRNs). In particular, SRNs capture the features of intrinsic variability arising from intracellular biochemical processes. We extend current models for gene expression to allow the transcriptional process within an SRN to follow a random step or switch function which may be estimated using reversible jump Markov chain Monte Carlo (MCMC). This stochastic switch model provides a generic framework to capture many different dynamic features observed in single cell gene expression. Inference for such SRNs is challenging due to the intractability of the transition densities. We derive a model-specific birth–death approximation and study its use for inference in comparison with the linear noise approximation where both approximations are considered within the unifying framework of state-space models. The methodology is applied to synthetic as well as experimental single cell imaging data measuring expression of the human prolactin gene in pituitary cells.
Collapse
Affiliation(s)
- Kirsty L Hey
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Hiroshi Momiji
- Warwick Systems Biology, University of Warwick, Coventry CV4 7AL, UK
| | - Karen Featherstone
- Centre for Endocrinology and Diabetes, University of Manchester, Manchester M13 9PT, UK
| | - Julian R E Davis
- Centre for Endocrinology and Diabetes, University of Manchester, Manchester M13 9PT, UK
| | - Michael R H White
- Systems Biology Centre, University of Manchester, Manchester M13 9PL, UK
| | - David A Rand
- Warwick Systems Biology, University of Warwick, Coventry CV4 7AL, UK
| | | |
Collapse
|
16
|
|
17
|
Scalable inference of heterogeneous reaction kinetics from pooled single-cell recordings. Nat Methods 2014; 11:197-202. [PMID: 24412977 DOI: 10.1038/nmeth.2794] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Accepted: 11/08/2013] [Indexed: 01/08/2023]
Abstract
Mathematical methods combined with measurements of single-cell dynamics provide a means to reconstruct intracellular processes that are only partly or indirectly accessible experimentally. To obtain reliable reconstructions, the pooling of measurements from several cells of a clonal population is mandatory. However, cell-to-cell variability originating from diverse sources poses computational challenges for such process reconstruction. We introduce a scalable Bayesian inference framework that properly accounts for population heterogeneity. The method allows inference of inaccessible molecular states and kinetic parameters; computation of Bayes factors for model selection; and dissection of intrinsic, extrinsic and technical noise. We show how additional single-cell readouts such as morphological features can be included in the analysis. We use the method to reconstruct the expression dynamics of a gene under an inducible promoter in yeast from time-lapse microscopy data.
Collapse
|
18
|
Finkenstädt B, Woodcock DJ, Komorowski M, Harper CV, Davis JRE, White MRH, Rand DA. Quantifying intrinsic and extrinsic noise in gene transcription using the linear noise approximation: An application to single cell data. Ann Appl Stat 2013. [DOI: 10.1214/13-aoas669] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Jones NS, Maccarone TJ. Inference for the physical sciences. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2013; 371:20120493. [PMID: 23277613 PMCID: PMC3538443 DOI: 10.1098/rsta.2012.0493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
There is a disconnect between developments in modern data analysis and some parts of the physical sciences in which they could find ready use. This introduction, and this issue, provides resources to help experimental researchers access modern data analysis tools and exposure for analysts to extant challenges in physical science. We include a table of resources connecting statistical and physical disciplines and point to appropriate books, journals, videos and articles. We conclude by highlighting the relevance of each of the articles in the associated issue.
Collapse
Affiliation(s)
- Nick S Jones
- Department of Mathematics, Imperial College, London SW7 2AZ, UK.
| | | |
Collapse
|