1
|
Sherlock BD, Boon MAA, Vlasiou M, Coster ACF. The Distance Between: An Algorithmic Approach to Comparing Stochastic Models to Time-Series Data. Bull Math Biol 2024; 86:111. [PMID: 39060776 PMCID: PMC11282162 DOI: 10.1007/s11538-024-01331-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/21/2024] [Indexed: 07/28/2024]
Abstract
While mean-field models of cellular operations have identified dominant processes at the macroscopic scale, stochastic models may provide further insight into mechanisms at the molecular scale. In order to identify plausible stochastic models, quantitative comparisons between the models and the experimental data are required. The data for these systems have small sample sizes and time-evolving distributions. The aim of this study is to identify appropriate distance metrics for the quantitative comparison of stochastic model outputs and time-evolving stochastic measurements of a system. We identify distance metrics with features suitable for driving parameter inference, model comparison, and model validation, constrained by data from multiple experimental protocols. In this study, stochastic model outputs are compared to synthetic data across three scales: that of the data at the points the system is sampled during the time course of each type of experiment; a combined distance across the time course of each experiment; and a combined distance across all the experiments. Two broad categories of comparators at each point were considered, based on the empirical cumulative distribution function (ECDF) of the data and of the model outputs: discrete based measures such as the Kolmogorov-Smirnov distance, and integrated measures such as the Wasserstein-1 distance between the ECDFs. It was found that the discrete based measures were highly sensitive to parameter changes near the synthetic data parameters, but were largely insensitive otherwise, whereas the integrated distances had smoother transitions as the parameters approached the true values. The integrated measures were also found to be robust to noise added to the synthetic data, replicating experimental error. The characteristics of the identified distances provides the basis for the design of an algorithm suitable for fitting stochastic models to real world stochastic data.
Collapse
Affiliation(s)
- Brock D Sherlock
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, 2052, Australia
- Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB, Eindhoven, The Netherlands
| | - Marko A A Boon
- Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB, Eindhoven, The Netherlands
| | - Maria Vlasiou
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
| | - Adelle C F Coster
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|
2
|
Liu Y, Zhang SY, Kleijn IT, Stumpf MPH. Approximate Bayesian computation for inferring Waddington landscapes from single-cell data. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231697. [PMID: 39076359 PMCID: PMC11285904 DOI: 10.1098/rsos.231697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 05/01/2024] [Indexed: 07/31/2024]
Abstract
Single-cell technologies allow us to gain insights into cellular processes at unprecedented resolution. In stem cell and developmental biology snapshot data allow us to characterize how the transcriptional states of cells change between successive cell types. Here, we show how approximate Bayesian computation (ABC) can be employed to calibrate mathematical models against single-cell data. In our simulation study, we demonstrate the pivotal role of the adequate choice of distance measures appropriate for single-cell data. We show that for good distance measures, notably optimal transport with the Sinkhorn divergence, we can infer parameters for mathematical models from simulated single-cell data. We show that the ABC posteriors can be used (i) to characterize parameter sensitivity and identify dependencies between different parameters and (ii) to construct representations of the Waddington or epigenetic landscape, which forms a popular and interpretable representation of the developmental dynamics. In summary, these results pave the way for fitting mechanistic models of stem cell differentiation to single-cell data.
Collapse
Affiliation(s)
- Yujing Liu
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | - Stephen Y. Zhang
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
| | | | - Michael P. H. Stumpf
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
- School of BioScience, University of Melbourne, Melbourne, Australia
| |
Collapse
|
3
|
Cheng C, Wen L, Li J. Parameter estimation from aggregate observations: a Wasserstein distance-based sequential Monte Carlo sampler. ROYAL SOCIETY OPEN SCIENCE 2023; 10:230275. [PMID: 37564064 PMCID: PMC10410207 DOI: 10.1098/rsos.230275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/21/2023] [Indexed: 08/12/2023]
Abstract
In this work, we study systems consisting of a group of moving particles. In such systems, often some important parameters are unknown and have to be estimated from observed data. Such parameter estimation problems can often be solved via a Bayesian inference framework. However, in many practical problems, only data at the aggregate level is available and as a result the likelihood function is not available, which poses a challenge for Bayesian methods. In particular, we consider the situation where the distributions of the particles are observed. We propose a Wasserstein distance (WD)-based sequential Monte Carlo sampler to solve the problem: the WD is used to measure the similarity between the observed and the simulated particle distributions and the sequential Monte Carlo samplers is used to deal with the sequentially available observations. Two real-world examples are provided to demonstrate the performance of the proposed method.
Collapse
Affiliation(s)
- Chen Cheng
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Linjie Wen
- School of Earth and Space Sciences, Peking University, 5 Yiheyuan Rd, Beijing 100871, People’s Republic of China
| | - Jinglai Li
- School of Mathematics, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
4
|
Schälte Y, Hasenauer J. Informative and adaptive distances and summary statistics in sequential approximate Bayesian computation. PLoS One 2023; 18:e0285836. [PMID: 37216372 DOI: 10.1371/journal.pone.0285836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 05/02/2023] [Indexed: 05/24/2023] Open
Abstract
Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.
Collapse
Affiliation(s)
- Yannik Schälte
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| | - Jan Hasenauer
- Faculty of Mathematics and Natural Sciences, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
- Center for Mathematics, Technische Universität München, Garching, Germany
| |
Collapse
|
5
|
Järvenpää M, Corander J. On predictive inference for intractable models via approximate Bayesian computation. STATISTICS AND COMPUTING 2023; 33:42. [PMID: 36785730 PMCID: PMC9911513 DOI: 10.1007/s11222-022-10163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/02/2022] [Indexed: 06/18/2023]
Abstract
UNLABELLED Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based statistical models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10163-6.
Collapse
Affiliation(s)
- Marko Järvenpää
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), University of Helsinki, Helsinki, Finland
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
6
|
Martin GM, Frazier DT, Robert CP. Approximating Bayes in the 21st Century. Stat Sci 2023. [DOI: 10.1214/22-sts875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
- Gael M. Martin
- Gael M. Martin is Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | - David T. Frazier
- David T. Frazier is Associate Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | | |
Collapse
|
7
|
Kaji T, Ročková V. Metropolis-Hastings via Classification. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2060836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Dutta R, Zouaoui Boudjeltia K, Kotsalos C, Rousseau A, Ribeiro de Sousa D, Desmet JM, Van Meerhaeghe A, Mira A, Chopard B. Personalized pathology test for Cardio-vascular disease: Approximate Bayesian computation with discriminative summary statistics learning. PLoS Comput Biol 2022; 18:e1009910. [PMID: 35271585 PMCID: PMC8939803 DOI: 10.1371/journal.pcbi.1009910] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2022] [Accepted: 02/09/2022] [Indexed: 11/19/2022] Open
Abstract
Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. But recent studies show that the present pathology tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions and are incapable to consider inter-individual variability. Here we propose a stochastic platelet deposition model and an inferential scheme to estimate the biologically meaningful model parameters using approximate Bayesian computation with a summary statistic that maximally discriminates between different types of patients. Inferred parameters from data collected on healthy volunteers and different patient types help us to identify specific biological parameters and hence biological reasoning behind the dysfunction for each type of patients. This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.
Collapse
Affiliation(s)
| | - Karim Zouaoui Boudjeltia
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | | | - Alexandre Rousseau
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | - Daniel Ribeiro de Sousa
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | - Jean-Marc Desmet
- Nephrology Department, ISPPC CHU de Charleroi, Charleroi, Belgium
| | | | - Antonietta Mira
- Università della Svizzera italiana, Lugano, Switzerland
- University of Insubria, Varese, Italy
| | | |
Collapse
|
9
|
Manole T, Balakrishnan S, Wasserman L. Minimax confidence intervals for the Sliced Wasserstein distance. Electron J Stat 2022. [DOI: 10.1214/22-ejs2001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tudor Manole
- Department of Statistics and Data Science, Carnegie Mellon University
| | | | - Larry Wasserman
- Department of Statistics and Data Science, Carnegie Mellon University
| |
Collapse
|
10
|
Ceresa L, Guadagnini A, Porta GM, Riva M. Formulation and probabilistic assessment of reversible biodegradation pathway of Diclofenac in groundwater. WATER RESEARCH 2021; 204:117466. [PMID: 34530227 DOI: 10.1016/j.watres.2021.117466] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 07/14/2021] [Accepted: 07/25/2021] [Indexed: 06/13/2023]
Abstract
We present a conceptual and mathematical framework leading to the development of a biodegradation model capable to interpret the observed reversibility of the Pharmaceutical Sodium Diclofenac along its biological degradation pathway in groundwater. Diclofenac occurrence in water bodies poses major concerns due to its persistent (and bioactive) nature and its detection in surface waters and aquifer systems. Despite some evidences of its biodegradability at given reducing conditions, Diclofenac attenuation is often interpreted with models which are too streamlined, thus potentially hampering appropriate quantification of its fate. In this context, we propose a modeling framework based on the conceptualization of the molecular mechanisms of Diclofenac biodegradation which we then embed in a stochastic context, thus enabling one to quantify predictive uncertainty. We consider reference environmental conditions (biotic and denitrifying) associated with a set of batch experiments that evidence the occurrence of a reversible biotransformation pathway, a feature that is fully captured by our model. The latter is then calibrated in the context of a Bayesian modeling framework through an Acceptance-Rejection Sampling approach. By doing so, we quantify the uncertainty associated with model parameters and predicted Diclofenac concentrations. We discuss the probabilistic nature of uncertain model parameters and the challenges posed by their calibration with the available data. Our results are consistent with the recalcitrant behavior exhibited by Diclofenac in groundwater and documented through experimental data and support the observation that unbiased estimates of the hazard posed by Diclofenac to water resources should be assessed through a modeling strategy which fully embeds uncertainty quantification.
Collapse
Affiliation(s)
- Laura Ceresa
- Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano 20133, Italy.
| | - Alberto Guadagnini
- Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano 20133, Italy
| | - Giovanni M Porta
- Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano 20133, Italy
| | - Monica Riva
- Department of Civil and Environmental Engineering (DICA), Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano 20133, Italy
| |
Collapse
|
11
|
Curve Registration of Functional Data for Approximate Bayesian Computation. STATS 2021. [DOI: 10.3390/stats4030045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Approximate Bayesian computation is a likelihood-free inference method which relies on comparing model realisations to observed data with informative distance measures. We obtain functional data that are not only subject to noise along their y axis but also to a random warping along their x axis, which we refer to as the time axis. Conventional distances on functions, such as the L2 distance, are not informative under these conditions. The Fisher–Rao metric, previously generalised from the space of probability distributions to the space of functions, is an ideal objective function for aligning one function to another by warping the time axis. We assess the usefulness of alignment with the Fisher–Rao metric for approximate Bayesian computation with four examples: two simulation examples, an example about passenger flow at an international airport, and an example of hydrological flow modelling. We find that the Fisher–Rao metric works well as the objective function to minimise for alignment; however, once the functions are aligned, it is not necessarily the most informative distance for inference. This means that likelihood-free inference may require two distances: one for alignment and one for parameter inference.
Collapse
|
12
|
Pilgrim C, Hills TT. Bias in Zipf's law estimators. Sci Rep 2021; 11:17309. [PMID: 34453066 PMCID: PMC8397718 DOI: 10.1038/s41598-021-96214-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 01/12/2021] [Indexed: 11/21/2022] Open
Abstract
The prevailing maximum likelihood estimators for inferring power law models from rank-frequency data are biased. The source of this bias is an inappropriate likelihood function. The correct likelihood function is derived and shown to be computationally intractable. A more computationally efficient method of approximate Bayesian computation (ABC) is explored. This method is shown to have less bias for data generated from idealised rank-frequency Zipfian distributions. However, the existing estimators and the ABC estimator described here assume that words are drawn from a simple probability distribution, while language is a much more complex process. We show that this false assumption leads to continued biases when applying any of these methods to natural language to estimate Zipf exponents. We recommend that researchers be aware of the bias when investigating power laws in rank-frequency data.
Collapse
Affiliation(s)
- Charlie Pilgrim
- Mathematics for Real-World Systems Centre for Doctoral Training, The University of Warwick, Coventry, CV4 7AL, UK.
| | - Thomas T Hills
- Department of Psychology, The University of Warwick, Coventry, CV4 7AL, UK
- The Alan Turing Institute, British Library, 96 Euston Road, London, NW1 2DB, UK
| |
Collapse
|
13
|
Ebert A, Dutta R, Mengersen K, Mira A, Ruggeri F, Wu P. Likelihood‐free parameter estimation for dynamic queueing networks: Case study of passenger flow in an international airport terminal. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12487] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Anthony Ebert
- Università della Svizzera italiana Lugano Switzerland
- Queensland University of Technology Brisbane Australia
| | | | | | - Antonietta Mira
- Università della Svizzera italiana Lugano Switzerland
- Università dell’Insubria Como Italy
| | - Fabrizio Ruggeri
- Queensland University of Technology Brisbane Australia
- CNR‐IMATI Milano Italy
| | - Paul Wu
- Queensland University of Technology Brisbane Australia
| |
Collapse
|
14
|
Weighted approximate Bayesian computation via Sanov’s theorem. Comput Stat 2021. [DOI: 10.1007/s00180-021-01093-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractWe consider the problem of sample degeneracy in Approximate Bayesian Computation. It arises when proposed values of the parameters, once given as input to the generative model, rarely lead to simulations resembling the observed data and are hence discarded. Such “poor” parameter proposals do not contribute at all to the representation of the parameter’s posterior distribution. This leads to a very large number of required simulations and/or a waste of computational resources, as well as to distortions in the computed posterior distribution. To mitigate this problem, we propose an algorithm, referred to as the Large Deviations Weighted Approximate Bayesian Computation algorithm, where, via Sanov’s Theorem, strictly positive weights are computed for all proposed parameters, thus avoiding the rejection step altogether. In order to derive a computable asymptotic approximation from Sanov’s result, we adopt the information theoretic “method of types” formulation of the method of Large Deviations, thus restricting our attention to models for i.i.d. discrete random variables. Finally, we experimentally evaluate our method through a proof-of-concept implementation.
Collapse
|
15
|
Vihrs N, Møller J, Gelfand AE. Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ninna Vihrs
- Department of Mathematical Sciences Aalborg University Aalborg Denmark
| | - Jesper Møller
- Department of Mathematical Sciences Aalborg University Aalborg Denmark
| | - Alan E. Gelfand
- Department of Statistical Science Duke University Durham North Carolina USA
| |
Collapse
|
16
|
Le Mire E, Burger E, Iooss B, Mai C. Prediction of crack propagation kinetics through multipoint stochastic simulations of microscopic fields. EPJ NUCLEAR SCIENCES & TECHNOLOGIES 2021. [DOI: 10.1051/epjn/2021001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Prediction of crack propagation kinetics in the components of nuclear plant primary circuits undergoing Stress Corrosion Cracking (SCC) can be improved by a refinement of the SCC models. One of the steps in the estimation of the time to rupture is the crack propagation criterion. Current models make use of macroscopic measures (e.g. stress, strain) obtained for instance using the Finite Element Method. To go down to the microscopic scale and use local measures, a two-step approach is proposed. First, synthetic microstructures representing the material under specific loadings are simulated, and their quality is validated using statistical measures. Second, the shortest path to rupture in terms of propagation time is computed, and the distribution of those synthetic times to rupture is compared with the time to rupture estimated only from macroscopic values. The first step is realized with the cross-correlation-based simulation (CCSIM), a multipoint simulation algorithm that produces synthetic stochastic fields from a training field. The Earth Mover’s Distance is the metric which allows to assess the quality of the realizations. The computation of shortest paths is realized using Dijkstra’s algorithm. This approach allows to obtain a refinement in the prediction of the kinetics of crack propagation compared to the macroscopic approach. An influence of the loading conditions on the distribution of the computed synthetic times to rupture was observed, which could be reduced through a more robust use of the CCSIM.
Collapse
|
17
|
Chae M, De Blasi P, Walker SG. Posterior asymptotics in Wasserstein metrics on the real line. Electron J Stat 2021. [DOI: 10.1214/21-ejs1869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Minwoo Chae
- Department of Industrial and Management Engineering, Pohang University of Science and Technology, South Korea
| | | | | |
Collapse
|
18
|
Alquier P. Approximate Bayesian Inference. ENTROPY 2020; 22:e22111272. [PMID: 33287041 PMCID: PMC7711853 DOI: 10.3390/e22111272] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 11/06/2020] [Indexed: 11/16/2022]
Abstract
This is the Editorial article summarizing the scope of the Special Issue: Approximate Bayesian Inference.
Collapse
Affiliation(s)
- Pierre Alquier
- Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo 103-0027, Japan
| |
Collapse
|
19
|
Harrison JU, Baker RE. An automatic adaptive method to combine summary statistics in approximate Bayesian computation. PLoS One 2020; 15:e0236954. [PMID: 32760106 PMCID: PMC7410215 DOI: 10.1371/journal.pone.0236954] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 07/16/2020] [Indexed: 11/18/2022] Open
Abstract
To infer the parameters of mechanistic models with intractable likelihoods, techniques such as approximate Bayesian computation (ABC) are increasingly being adopted. One of the main disadvantages of ABC in practical situations, however, is that parameter inference must generally rely on summary statistics of the data. This is particularly the case for problems involving high-dimensional data, such as biological imaging experiments. However, some summary statistics contain more information about parameters of interest than others, and it is not always clear how to weight their contributions within the ABC framework. We address this problem by developing an automatic, adaptive algorithm that chooses weights for each summary statistic. Our algorithm aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function. Computationally, we use a nearest neighbour estimator of the distance between distributions. We justify the algorithm theoretically based on properties of the nearest neighbour distance estimator. To demonstrate the effectiveness of our algorithm, we apply it to a variety of test problems, including several stochastic models of biochemical reaction networks, and a spatial model of diffusion, and compare our results with existing algorithms.
Collapse
Affiliation(s)
- Jonathan U. Harrison
- Mathematical Institute, Mathematical Sciences Building, University of Warwick, Coventry, United Kingdom
- * E-mail:
| | - Ruth E. Baker
- Mathematical Institute, Andrew Wiles Building, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
20
|
Distance-learning For Approximate Bayesian Computation To Model a Volcanic Eruption. SANKHYA B 2020. [DOI: 10.1007/s13571-019-00208-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
AbstractApproximate Bayesian computation (ABC) provides us with a way to infer parameters of models, for which the likelihood function is not available, from an observation. Using ABC, which depends on many simulations from the considered model, we develop an inferential framework to learn parameters of a stochastic numerical simulator of volcanic eruption. Moreover, the model itself is parallelized using Message Passing Interface (MPI). Thus, we develop a nested-parallelized MPI communicator to handle the expensive numerical model with ABC algorithms. ABC usually relies on summary statistics of the data in order to measure the discrepancy model output and observation. However, informative summary statistics cannot be found for the considered model. We therefore develop a technique to learn a distance between model outputs based on deep metric-learning. We use this framework to learn the plume characteristics (eg. initial plume velocity) of the volcanic eruption from the tephra deposits collected by field-work associated with the 2450 BP Pululagua (Ecuador) volcanic eruption.
Collapse
|
21
|
Frazier DT, Robert CP, Rousseau J. Model misspecification in approximate Bayesian computation: consequences and diagnostics. J R Stat Soc Series B Stat Methodol 2020. [DOI: 10.1111/rssb.12356] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - Christian P. Robert
- Université Paris Dauphine, Ceremade, Paris, France and University of Warwick; Coventry UK
| | - Judith Rousseau
- University of Oxford, UK, Université Paris Dauphine and Ceremade; Paris France
| |
Collapse
|
22
|
Buckwar E, Tamborrino M, Tubikanec I. Spectral density-based and measure-preserving ABC for partially observed diffusion processes. An illustration on Hamiltonian SDEs. STATISTICS AND COMPUTING 2020; 30:627-648. [PMID: 32132771 PMCID: PMC7026277 DOI: 10.1007/s11222-019-09909-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 10/17/2019] [Indexed: 05/15/2023]
Abstract
Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise: First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler-Maruyama discretisation) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.
Collapse
Affiliation(s)
- Evelyn Buckwar
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| | - Massimiliano Tamborrino
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| | - Irene Tubikanec
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| |
Collapse
|
23
|
Arbel J, Crispino M, Girard S. Dependence properties and Bayesian inference for asymmetric multivariate copulas. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.06.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
24
|
Abstract
Summary
We consider the problem of approximating the product of $n$ expectations with respect to a common probability distribution $\mu$. Such products routinely arise in statistics as values of the likelihood in latent variable models. Motivated by pseudo-marginal Markov chain Monte Carlo schemes, we focus on unbiased estimators of such products. The standard approach is to sample $N$ particles from $\mu$ and assign each particle to one of the expectations; this is wasteful and typically requires the number of particles to grow quadratically with the number of expectations. We propose an alternative estimator that approximates each expectation using most of the particles while preserving unbiasedness, which is computationally more efficient when the cost of simulations greatly exceeds the cost of likelihood evaluations. We carefully study the properties of our proposed estimator, showing that in latent variable contexts it needs only ${O} (n)$ particles to match the performance of the standard approach with ${O}(n^{2})$ particles. We demonstrate the procedure on two latent variable examples from approximate Bayesian computation and single-cell gene expression analysis, observing computational gains by factors of about 25 and 450, respectively.
Collapse
Affiliation(s)
- A Lee
- School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK
| | - S Tiberi
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland
| | - G Zanella
- Department of Decision Sciences, BIDSA and IGIER, Bocconi University, Via Roentgen 1, 20136 Milan, Italy
| |
Collapse
|