1
|
Shuttleworth JG, Lei CL, Whittaker DG, Windley MJ, Hill AP, Preston SP, Mirams GR. Empirical Quantification of Predictive Uncertainty Due to Model Discrepancy by Training with an Ensemble of Experimental Designs: An Application to Ion Channel Kinetics. Bull Math Biol 2023; 86:2. [PMID: 37999811 PMCID: PMC10673765 DOI: 10.1007/s11538-023-01224-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 10/09/2023] [Indexed: 11/25/2023]
Abstract
When using mathematical models to make quantitative predictions for clinical or industrial use, it is important that predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises-models fail to perfectly recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for accurately quantifying uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data used to train models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments that investigate the properties of hERG potassium channels. Here, 'information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. In this case, we simulate data with one model and fit it with a different (discrepant) one. For any individual experimental protocol, parameter estimates vary little under repeated samples from the assumed additive independent Gaussian noise model. Yet parameter sets arising from the same model applied to different experiments conflict-highlighting model discrepancy. Our methods will help select more suitable ion channel models for future studies, and will be widely applicable to a range of biological modelling problems.
Collapse
Affiliation(s)
- Joseph G Shuttleworth
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Chon Lok Lei
- Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau, China
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Macau, China
| | - Dominic G Whittaker
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
- 4 Systems Modeling & Translational Biology, Stevenage, GSK, UK
| | - Monique J Windley
- Computational Cardiology Laboratory, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
| | - Adam P Hill
- Computational Cardiology Laboratory, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
| | - Simon P Preston
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Gary R Mirams
- Centre for Mathematical Medicine & Biology, School of Mathematical Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.
| |
Collapse
|
2
|
Wu Y, Judge MT, Edison AS, Arnold J. Uncovering in vivo biochemical patterns from time-series metabolic dynamics. PLoS One 2022; 17:e0268394. [PMID: 35550643 PMCID: PMC9098013 DOI: 10.1371/journal.pone.0268394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 04/28/2022] [Indexed: 11/19/2022] Open
Abstract
System biology relies on holistic biomolecule measurements, and untangling biochemical networks requires time-series metabolomics profiling. With current metabolomic approaches, time-series measurements can be taken for hundreds of metabolic features, which decode underlying metabolic regulation. Such a metabolomic dataset is untargeted with most features unannotated and inaccessible to statistical analysis and computational modeling. The high dimensionality of the metabolic space also causes mechanistic modeling to be rather cumbersome computationally. We implemented a faster exploratory workflow to visualize and extract chemical and biochemical dependencies. Time-series metabolic features (about 300 for each dataset) were extracted by Ridge Tracking-based Extract (RTExtract) on measurements from continuous in vivo monitoring of metabolism by NMR (CIVM-NMR) in Neurospora crassa under different conditions. The metabolic profiles were then smoothed and projected into lower dimensions, enabling a comparison of metabolic trends in the cultures. Next, we expanded incomplete metabolite annotation using a correlation network. Lastly, we uncovered meaningful metabolic clusters by estimating dependencies between smoothed metabolic profiles. We thus sidestepped the processes of time-consuming mechanistic modeling, difficult global optimization, and labor-intensive annotation. Multiple clusters guided insights into central energy metabolism and membrane synthesis. Dense connections with glucose 1-phosphate indicated its central position in metabolism in N. crassa. Our approach was benchmarked on simulated random network dynamics and provides a novel exploratory approach to analyzing high-dimensional metabolic dynamics.
Collapse
Affiliation(s)
- Yue Wu
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Michael T. Judge
- Department of Genetics, University of Georgia, Athens, GA, United States of America
| | - Arthur S. Edison
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
- Department of Genetics, University of Georgia, Athens, GA, United States of America
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, United States of America
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States of America
- * E-mail: (ASE); (JA)
| | - Jonathan Arnold
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
- Department of Genetics, University of Georgia, Athens, GA, United States of America
- Department of Statistics, University of Georgia, Athens, GA, United States of America
- Department of Physics and Astronomy, University of Georgia, Athens, GA, United States of America
- * E-mail: (ASE); (JA)
| |
Collapse
|
3
|
Raharinirina NA, Peppert F, von Kleist M, Schütte C, Sunkara V. Inferring gene regulatory networks from single-cell RNA-seq temporal snapshot data requires higher-order moments. PATTERNS 2021; 2:100332. [PMID: 34553172 PMCID: PMC8441581 DOI: 10.1016/j.patter.2021.100332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 02/23/2021] [Accepted: 07/22/2021] [Indexed: 11/30/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become ubiquitous in biology. Recently, there has been a push for using scRNA-seq snapshot data to infer the underlying gene regulatory networks (GRNs) steering cellular function. To date, this aspiration remains unrealized due to technical and computational challenges. In this work we focus on the latter, which is under-represented in the literature. We took a systemic approach by subdividing the GRN inference into three fundamental components: data pre-processing, feature extraction, and inference. We observed that the regulatory signature is captured in the statistical moments of scRNA-seq data and requires computationally intensive minimization solvers to extract it. Furthermore, current data pre-processing might not conserve these statistical moments. Although our moment-based approach is a didactic tool for understanding the different compartments of GRN inference, this line of thinking—finding computationally feasible multi-dimensional statistics of data—is imperative for designing GRN inference methods. Single-cell RNA-seq temporal snapshot data for detecting regulation Challenges in data pre-processing, feature extraction, and network inference for GRNs Encoding of regulatory information in higher-order raw moments Non-linear least-squares inference for temporal scRNA-seq snapshot data
Single-cell RNA sequencing (scRNA-seq) has become ubiquitous in biology. Recently, there has been a push for using scRNA-seq snapshot data to infer the underlying gene regulatory networks (GRNs) steering cellular function. A recent benchmark of 12 GRN methods demonstrated that the algorithms struggled to predict the ground-truth GRNs and speculated that the low performance was due to the insufficient resolution in the scRNA-seq data. Rather than proposing another method, this paper focuses on how to decompose a GRN problem into three subproblems (pre-processing, feature extraction, and inference), so that the gene regulatory information is preserved in each step. Subsequently, we discuss how to best approach each of the three subproblems.
Collapse
Affiliation(s)
| | - Felix Peppert
- Explainable A.I. for Biology, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Max von Kleist
- MF1 Bioinformatics, Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Christof Schütte
- Mathematics of Complex Systems, Zuse Institute Berlin, 14195 Berlin, Germany.,Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
| | - Vikram Sunkara
- Mathematics of Complex Systems, Zuse Institute Berlin, 14195 Berlin, Germany.,Explainable A.I. for Biology, Zuse Institute Berlin, 14195 Berlin, Germany
| |
Collapse
|
4
|
|
5
|
Taylor-King JP, Riseth AN, Macnair W, Claassen M. Dynamic distribution decomposition for single-cell snapshot time series identifies subpopulations and trajectories during iPSC reprogramming. PLoS Comput Biol 2020; 16:e1007491. [PMID: 31923173 PMCID: PMC6953770 DOI: 10.1371/journal.pcbi.1007491] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 10/14/2019] [Indexed: 11/24/2022] Open
Abstract
Recent high-dimensional single-cell technologies such as mass cytometry are enabling time series experiments to monitor the temporal evolution of cell state distributions and to identify dynamically important cell states, such as fate decision states in differentiation. However, these technologies are destructive, and require analysis approaches that temporally map between cell state distributions across time points. Current approaches to approximate the single-cell time series as a dynamical system suffer from too restrictive assumptions about the type of kinetics, or link together pairs of sequential measurements in a discontinuous fashion. We propose Dynamic Distribution Decomposition (DDD), an operator approximation approach to infer a continuous distribution map between time points. On the basis of single-cell snapshot time series data, DDD approximates the continuous time Perron-Frobenius operator by means of a finite set of basis functions. This procedure can be interpreted as a continuous time Markov chain over a continuum of states. By only assuming a memoryless Markov (autonomous) process, the types of dynamics represented are more general than those represented by other common models, e.g., chemical reaction networks, stochastic differential equations. Furthermore, we can a posteriori check whether the autonomy assumptions are valid by calculation of prediction error-which we show gives a measure of autonomy within the studied system. The continuity and autonomy assumptions ensure that the same dynamical system maps between all time points, not arbitrarily changing at each time point. We demonstrate the ability of DDD to reconstruct dynamically important cell states and their transitions both on synthetic data, as well as on mass cytometry time series of iPSC reprogramming of a fibroblast system. We use DDD to find previously identified subpopulations of cells and to visualise differentiation trajectories. Dynamic Distribution Decomposition allows interpretation of high-dimensional snapshot time series data as a low-dimensional Markov process, thereby enabling an interpretable dynamics analysis for a variety of biological processes by means of identifying their dynamically important cell states.
Collapse
Affiliation(s)
- Jake P. Taylor-King
- Institute of Molecular Systems Biology, Department of Biology, ETHZ, Zurich, Switzerland
- Juvenescence AI, Viking House, Nelson Street, Douglas, Isle of Man, United Kingdom
| | - Asbjørn N. Riseth
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Will Macnair
- Institute of Molecular Systems Biology, Department of Biology, ETHZ, Zurich, Switzerland
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Manfred Claassen
- Institute of Molecular Systems Biology, Department of Biology, ETHZ, Zurich, Switzerland
| |
Collapse
|
6
|
Pantazis Y, Tsamardinos I. A unified approach for sparse dynamical system inference from temporal measurements. Bioinformatics 2019; 35:3387-3396. [PMID: 30715136 PMCID: PMC6748758 DOI: 10.1093/bioinformatics/btz065] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 01/17/2019] [Accepted: 01/28/2019] [Indexed: 11/12/2022] Open
Abstract
Motivation Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously. Results In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL’s accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway. Availability and implementation Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yannis Pantazis
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
- To whom correspondence should be addressed. E-mail: or
| | - Ioannis Tsamardinos
- Institute of Applied and Computational Mathematics, Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
- Department of Computer Science, University of Crete, Heraklion, Greece
- Gnosis Data Analysis PC, Heraklion, Greece
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
7
|
Hoang DT, Song J, Periwal V, Jo J. Network inference in stochastic systems from neurons to currencies: Improved performance at small sample size. Phys Rev E 2019; 99:023311. [PMID: 30934224 PMCID: PMC7459391 DOI: 10.1103/physreve.99.023311] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Indexed: 12/13/2022]
Abstract
The fundamental problem in modeling complex phenomena such as human perception using probabilistic methods is that of deducing a stochastic model of interactions between the constituents of a system from observed configurations. Even in this era of big data, the complexity of the systems being modeled implies that inference methods must be effective in the difficult regimes of small sample sizes and large coupling variability. Thus, model inference by means of minimization of a cost function requires additional assumptions such as sparsity of interactions to avoid overfitting. In this paper, we completely divorce iterative model updates from the value of a cost function quantifying goodness of fit. This separation enables the use of goodness of fit as a natural rationale for terminating model updates, thereby avoiding overfitting. We do this within the mathematical formalism of statistical physics by defining a formal free energy of observations from a partition function with an energy function chosen precisely to enable an iterative model update. Minimizing this free energy, we demonstrate coupling strength inference in nonequilibrium kinetic Ising models, and show that our method outperforms other existing methods in the regimes of interest. Our method has no tunable learning rate, scales to large system sizes, and has a systematic expansion to obtain higher-order interactions. As applications, we infer a functional connectivity network in the salamander retina and a currency exchange rate network from time-series data of neuronal spiking and currency exchange rates, respectively. Accurate small sample size inference is critical for devising a profitable currency hedging strategy.
Collapse
Affiliation(s)
- Danh-Tai Hoang
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Natural Sciences, Quang Binh University, Dong Hoi, Quang Binh 510000, Vietnam
| | - Juyong Song
- Asia Pacific Center for Theoretical Physics, Pohang, Gyeongbuk 37673, Korea
- Department of Physics, Pohang University of Science and Technology, Pohang, Gyeongbuk 37673, Korea
- Abdus Salam International Centre for Theoretical Physics, Strada Costiera 11, 34014 Trieste, Italy
| | - Vipul Periwal
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Junghyo Jo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Statistics, Keimyung University, Daegu 42601, Korea
| |
Collapse
|
8
|
Fröhlich F, Loos C, Hasenauer J. Scalable Inference of Ordinary Differential Equation Models of Biochemical Processes. Methods Mol Biol 2019; 1883:385-422. [PMID: 30547409 DOI: 10.1007/978-1-4939-8882-2_16] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Ordinary differential equation models have become a standard tool for the mechanistic description of biochemical processes. If parameters are inferred from experimental data, such mechanistic models can provide accurate predictions about the behavior of latent variables or the process under new experimental conditions. Complementarily, inference of model structure can be used to identify the most plausible model structure from a set of candidates, and, thus, gain novel biological insight. Several toolboxes can infer model parameters and structure for small- to medium-scale mechanistic models out of the box. However, models for highly multiplexed datasets can require hundreds to thousands of state variables and parameters. For the analysis of such large-scale models, most algorithms require intractably high computation times. This chapter provides an overview of the state-of-the-art methods for parameter and model inference, with an emphasis on scalability.
Collapse
Affiliation(s)
- Fabian Fröhlich
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Carolin Loos
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Jan Hasenauer
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- Center for Mathematics, Technische Universität München, Garching, Germany.
| |
Collapse
|
9
|
Backenkohler M, Bortolussi L, Wolf V. Moment-Based Parameter Estimation for Stochastic Reaction Networks in Equilibrium. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1180-1192. [PMID: 29990108 DOI: 10.1109/tcbb.2017.2775219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Calibrating parameters is a crucial problem within quantitative modeling approaches to reaction networks. Existing methods for stochastic models rely either on statistical sampling or can only be applied to small systems. Here, we present an inference procedure for stochastic models in equilibrium that is based on a moment matching scheme with optimal weighting and that can be used with high-throughput data like the one collected by flow cytometry. Our method does not require an approximation of the underlying equilibrium probability distribution and, if reaction rate constants have to be learned, the optimal values can be computed by solving a linear system of equations. We discuss important practical issues such as the selection of the moments and evaluate the effectiveness of the proposed approach on three case studies.
Collapse
|