1
|
Cheng C, Wen L, Li J. Parameter estimation from aggregate observations: a Wasserstein distance-based sequential Monte Carlo sampler. R Soc Open Sci 2023; 10:230275. [PMID: 37564064 PMCID: PMC10410207 DOI: 10.1098/rsos.230275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/21/2023] [Indexed: 08/12/2023]
Abstract
In this work, we study systems consisting of a group of moving particles. In such systems, often some important parameters are unknown and have to be estimated from observed data. Such parameter estimation problems can often be solved via a Bayesian inference framework. However, in many practical problems, only data at the aggregate level is available and as a result the likelihood function is not available, which poses a challenge for Bayesian methods. In particular, we consider the situation where the distributions of the particles are observed. We propose a Wasserstein distance (WD)-based sequential Monte Carlo sampler to solve the problem: the WD is used to measure the similarity between the observed and the simulated particle distributions and the sequential Monte Carlo samplers is used to deal with the sequentially available observations. Two real-world examples are provided to demonstrate the performance of the proposed method.
Collapse
Affiliation(s)
- Chen Cheng
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Linjie Wen
- School of Earth and Space Sciences, Peking University, 5 Yiheyuan Rd, Beijing 100871, People’s Republic of China
| | - Jinglai Li
- School of Mathematics, University of Birmingham, Birmingham B15 2TT, UK
| |
Collapse
|
5
|
Radev ST, Mertens UK, Voss A, Köthe U. Towards end-to-end likelihood-free inference with convolutional neural networks. Br J Math Stat Psychol 2020; 73:23-43. [PMID: 30793299 DOI: 10.1111/bmsp.12159] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Revised: 12/11/2018] [Indexed: 06/09/2023]
Abstract
Complex simulator-based models with non-standard sampling distributions require sophisticated design choices for reliable approximate parameter inference. We introduce a fast, end-to-end approach for approximate Bayesian computation (ABC) based on fully convolutional neural networks. The method enables users of ABC to derive simultaneously the posterior mean and variance of multidimensional posterior distributions directly from raw simulated data. Once trained on simulated data, the convolutional neural network is able to map real data samples of variable size to the first two posterior moments of the relevant parameter's distributions. Thus, in contrast to other machine learning approaches to ABC, our approach allows us to generate reusable models that can be applied by different researchers employing the same model. We verify the utility of our method on two common statistical models (i.e., a multivariate normal distribution and a multiple regression scenario), for which the posterior parameter distributions can be derived analytically. We then apply our method to recover the parameters of the leaky competing accumulator (LCA) model and we reference our results to the current state-of-the-art technique, which is the probability density estimation (PDA). Results show that our method exhibits a lower approximation error compared with other machine learning approaches to ABC. It also performs similarly to PDA in recovering the parameters of the LCA model.
Collapse
Affiliation(s)
| | - Ulf K Mertens
- Institute of Psychology, Heidelberg University, Germany
| | - Andreas Voss
- Institute of Psychology, Heidelberg University, Germany
| | - Ullrich Köthe
- Heidelberg Collaboratory for Image Processing (HCI), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany
| |
Collapse
|
6
|
Abstract
Likelihood-free inference for simulator-based models is an emerging methodological branch of statistics which has attracted considerable attention in applications across diverse fields such as population genetics, astronomy and economics. Recently, the power of statistical classifiers has been harnessed in likelihood-free inference to obtain either point estimates or even posterior distributions of model parameters. Here we introduce PYLFIRE, an open-source Python implementation of the inference method LFIRE (likelihood-free inference by ratio estimation) that uses penalised logistic regression. PYLFIRE is made available as part of the general ELFI inference software http://elfi.ai to benefit both the user and developer communities for likelihood-free inference.
Collapse
Affiliation(s)
- Jan Kokko
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Ulpu Remes
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Owen Thomas
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Henri Pesonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
7
|
Economou P, Batsidis A, Tzavelas G, Alexopoulos P. Berkson's paradox and weighted distributions: An application to Alzheimer's disease. Biom J 2019; 62:238-249. [PMID: 31696967 DOI: 10.1002/bimj.201900046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 09/11/2019] [Accepted: 09/18/2019] [Indexed: 11/07/2022]
Abstract
One reason for observing in practice a false positive or negative correlation between two random variables, which are either not correlated or correlated with a different direction, is the overrepresentation in the sample of individuals satisfying specific properties. In 1946, Berkson first illustrated the presence of a false correlation due to this last reason, which is known as Berkson's paradox and is one of the most famous paradox in probability and statistics. In this paper, the concept of weighted distributions is utilized to describe Berskon's paradox. Moreover, a proper procedure is suggested to make inference for the population given a biased sample which possesses all the characteristics of Berkson's paradox. A real data application for patients with dementia due to Alzheimer's disease demonstrates that the proposed method reveals characteristics of the population that are masked by the sampling procedure.
Collapse
Affiliation(s)
| | | | - George Tzavelas
- Department of Statistics and Insurance Science, University of Piraeus, Piraeus, Greece
| | - Panagiotis Alexopoulos
- Department of Psychiatry, Faculty of Medicine, University of Patras, University Hospital of Rion, Rion Patras, Greece.,Department of Psychiatry and Psychotherapy, Faculty of Medicine, Technical University of Munich, Klinikum rechts der Isar, Munich, Germany
| | | |
Collapse
|
8
|
Abstract
Plant root systems play vital roles in the biosphere, environment and agriculture, but the quantitative principles governing their growth and architecture remain poorly understood. The 'forward problem' of what root forms can arise from given models and parameters has been well studied through modelling and simulation, but comparatively little attention has been given to the 'inverse problem': what models and parameters are responsible for producing an experimentally observed root system? Here, we propose the use of approximate Bayesian computation (ABC) to infer mechanistic parameters governing root growth and architecture, allowing us to learn and quantify uncertainty in parameters and model structures using observed root architectures. We demonstrate the use of this platform on synthetic and experimental root data and show how it may be used to identify growth mechanisms and characterize growth parameters in different mutants. Our highly adaptable framework can be used to gain mechanistic insight into the generation of observed root system architectures.
Collapse
Affiliation(s)
- Clare Ziegler
- 1 School of Biosciences, University of Birmingham , Birmingham , UK.,2 Birmingham Institute of Forest Research, University of Birmingham , Birmingham , UK
| | - Rosemary J Dyson
- 2 Birmingham Institute of Forest Research, University of Birmingham , Birmingham , UK.,3 School of Mathematics, University of Birmingham , Birmingham , UK
| | - Iain G Johnston
- 1 School of Biosciences, University of Birmingham , Birmingham , UK.,2 Birmingham Institute of Forest Research, University of Birmingham , Birmingham , UK.,4 Alan Turing Institute , London , UK.,5 Faculty of Mathematics and Natural Sciences , University of Bergen , Bergen, Norway
| |
Collapse
|
9
|
Baudet C, Donati B, Sinaimeri B, Crescenzi P, Gautier C, Matias C, Sagot MF. Cophylogeny reconstruction via an approximate Bayesian computation. Syst Biol 2014; 64:416-31. [PMID: 25540454 PMCID: PMC4395844 DOI: 10.1093/sysbio/syu129] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 12/18/2014] [Indexed: 12/27/2022] Open
Abstract
Despite an increasingly vast literature on cophylogenetic reconstructions for studying host–parasite associations, understanding the common evolutionary history of such systems remains a problem that is far from being solved. Most algorithms for host–parasite reconciliation use an event-based model, where the events include in general (a subset of) cospeciation, duplication, loss, and host switch. All known parsimonious event-based methods then assign a cost to each type of event in order to find a reconstruction of minimum cost. The main problem with this approach is that the cost of the events strongly influences the reconciliation obtained. Some earlier approaches attempt to avoid this problem by finding a Pareto set of solutions and hence by considering event costs under some minimization constraints. To deal with this problem, we developed an algorithm, called Coala, for estimating the frequency of the events based on an approximate Bayesian computation approach. The benefits of this method are 2-fold: (i) it provides more confidence in the set of costs to be used in a reconciliation, and (ii) it allows estimation of the frequency of the events in cases where the data set consists of trees with a large number of taxa. We evaluate our method on simulated and on biological data sets. We show that in both cases, for the same pair of host and parasite trees, different sets of frequencies for the events lead to equally probable solutions. Moreover, often these solutions differ greatly in terms of the number of inferred events. It appears crucial to take this into account before attempting any further biological interpretation of such reconciliations. More generally, we also show that the set of frequencies can vary widely depending on the input host and parasite trees. Indiscriminately applying a standard vector of costs may thus not be a good strategy.
Collapse
Affiliation(s)
- C Baudet
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| | - B Donati
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| | - B Sinaimeri
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| | - P Crescenzi
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| | - C Gautier
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| | - C Matias
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| | - M-F Sagot
- INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry INRIA Grenoble Rhône-Alpes, 38330 Montbonnot Saint-Martin, France; Université de Lyon, F-69000 Lyon; Université Lyon 1; CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622 Villeurbanne, France; Università di Firenze, Dipartimento di Sistemi e Informatica, I-50134 Firenze, Italy; and Laboratoire Statistique et Génome, UMR CNRS 8071 & USC INRA, Université d'Évry
| |
Collapse
|