1
|
Serradell JM, Lorenzo-Salazar JM, Flores C, Lao O, Comas D. Modelling the demographic history of human North African genomes points to a recent soft split divergence between populations. Genome Biol 2024; 25:201. [PMID: 39080715 PMCID: PMC11290046 DOI: 10.1186/s13059-024-03341-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 07/22/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND North African human populations present a complex demographic scenario due to the presence of an autochthonous genetic component and population substructure, plus extensive gene flow from the Middle East, Europe, and sub-Saharan Africa. RESULTS We conducted a comprehensive analysis of 364 genomes to construct detailed demographic models for the North African region, encompassing its two primary ethnic groups, the Arab and Amazigh populations. This was achieved through an Approximate Bayesian Computation with Deep Learning (ABC-DL) framework and a novel algorithm called Genetic Programming for Population Genetics (GP4PG). This innovative approach enabled us to effectively model intricate demographic scenarios, utilizing a subset of 16 whole genomes at > 30X coverage. The demographic model suggested by GP4PG exhibited a closer alignment with the observed data compared to the ABC-DL model. Both point to a back-to-Africa origin of North African individuals and a close relationship with Eurasian populations. Results support different origins for Amazigh and Arab populations, with Amazigh populations originating back in Epipaleolithic times, while GP4PG supports Arabization as the main source of Middle Eastern ancestry. The GP4PG model includes population substructure in surrounding populations (sub-Saharan Africa and Middle East) with continuous decaying gene flow after population split. Contrary to ABC-DL, the best GP4PG model does not require pulses of admixture from surrounding populations into North Africa pointing to soft splits as drivers of divergence in North Africa. CONCLUSIONS We have built a demographic model on North Africa that points to a back-to-Africa expansion and a differential origin between Arab and Amazigh populations.
Collapse
Affiliation(s)
- Jose M Serradell
- Departament de Medicina i Ciències de la Vida, Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Carrer del Doctor Aiguader 88, Barcelona, 08003, Spain
| | - Jose M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona s/n, Santa Cruz de Tenerife, 38600, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona s/n, Santa Cruz de Tenerife, 38600, Spain
- Plataforma Genómica de Alto Rendimiento para el Estudio de la Biodiversidad, Instituto de Productos Naturales y Agrobiología (IPNA), Consejo Superior de Investigaciones Científicas, San Cristóbal de La Laguna, Santa Cruz de Tenerife, 38206, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Carretera del Rosario 145, Santa Cruz de Tenerife, 38010, Spain
- CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Av. de Monforte de Lemos, 3-5, Madrid, 28029, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, Calle de La Juventud S/N, Santa María de Guía, Las Palmas de Gran Canaria, 35450, Spain
| | - Oscar Lao
- Departament de Medicina i Ciències de la Vida, Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Carrer del Doctor Aiguader 88, Barcelona, 08003, Spain.
| | - David Comas
- Departament de Medicina i Ciències de la Vida, Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Carrer del Doctor Aiguader 88, Barcelona, 08003, Spain.
| |
Collapse
|
2
|
Wang MH, Onnela JP. Flexible Bayesian inference on partially observed epidemics. JOURNAL OF COMPLEX NETWORKS 2024; 12:cnae017. [PMID: 38533184 PMCID: PMC10962317 DOI: 10.1093/comnet/cnae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 03/02/2024] [Indexed: 03/28/2024]
Abstract
Individual-based models of contagious processes are useful for predicting epidemic trajectories and informing intervention strategies. In such models, the incorporation of contact network information can capture the non-randomness and heterogeneity of realistic contact dynamics. In this article, we consider Bayesian inference on the spreading parameters of an SIR contagion on a known, static network, where information regarding individual disease status is known only from a series of tests (positive or negative disease status). When the contagion model is complex or information such as infection and removal times is missing, the posterior distribution can be difficult to sample from. Previous work has considered the use of Approximate Bayesian Computation (ABC), which allows for simulation-based Bayesian inference on complex models. However, ABC methods usually require the user to select reasonable summary statistics. Here, we consider an inference scheme based on the Mixture Density Network compressed ABC, which minimizes the expected posterior entropy in order to learn informative summary statistics. This allows us to conduct Bayesian inference on the parameters of a partially observed contagious process while also circumventing the need for manual summary statistic selection. This methodology can be extended to incorporate additional simulation complexities, including behavioural change after positive tests or false test results.
Collapse
Affiliation(s)
- Maxwell H Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Jukka-Pekka Onnela
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| |
Collapse
|
3
|
Henley L, Jones O, Mathews F, Woolley TE. Bat Motion can be Described by Leap Frogging. Bull Math Biol 2024; 86:16. [PMID: 38197980 PMCID: PMC10781826 DOI: 10.1007/s11538-023-01233-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 11/01/2023] [Indexed: 01/11/2024]
Abstract
We present models of bat motion derived from radio-tracking data collected over 14 nights. The data presents an initial dispersal period and a return to roost period. Although a simple diffusion model fits the initial dispersal motion we show that simple convection cannot provide a description of the bats returning to their roost. By extending our model to include non-autonomous parameters, or a leap frogging form of motion, where bats on the exterior move back first, we find we are able to accurately capture the bat's motion. We discuss ways of distinguishing between the two movement descriptions and, finally, consider how the different motion descriptions would impact a bat's hunting strategy.
Collapse
Affiliation(s)
- Lucy Henley
- Cardiff School of Mathematics Cardiff University, Senghennydd Road, Cardiff, CF24 4AG, UK
| | - Owen Jones
- Cardiff School of Mathematics Cardiff University, Senghennydd Road, Cardiff, CF24 4AG, UK
| | - Fiona Mathews
- University of Sussex, John Maynard Smith Building, Falmer, Brighton, BN1 9RH, UK
| | - Thomas E Woolley
- Cardiff School of Mathematics Cardiff University, Senghennydd Road, Cardiff, CF24 4AG, UK.
| |
Collapse
|
4
|
Järvenpää M, Corander J. On predictive inference for intractable models via approximate Bayesian computation. STATISTICS AND COMPUTING 2023; 33:42. [PMID: 36785730 PMCID: PMC9911513 DOI: 10.1007/s11222-022-10163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/02/2022] [Indexed: 06/18/2023]
Abstract
UNLABELLED Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based statistical models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10163-6.
Collapse
Affiliation(s)
- Marko Järvenpää
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), University of Helsinki, Helsinki, Finland
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
5
|
Martin GM, Frazier DT, Robert CP. Approximating Bayes in the 21st Century. Stat Sci 2023. [DOI: 10.1214/22-sts875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
- Gael M. Martin
- Gael M. Martin is Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | - David T. Frazier
- David T. Frazier is Associate Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | | |
Collapse
|
6
|
Pesonen H, Simola U, Köhn‐Luque A, Vuollekoski H, Lai X, Frigessi A, Kaski S, Frazier DT, Maneesoonthorn W, Martin GM, Corander J. ABC of the future. Int Stat Rev 2022. [DOI: 10.1111/insr.12522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Henri Pesonen
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
| | - Umberto Simola
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics University of Helsinki Helsinki Finland
| | - Alvaro Köhn‐Luque
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
| | - Henri Vuollekoski
- Helsinki Institute of Information Technology, Department of Computer Science Aalto University Helsinki Finland
| | - Xiaoran Lai
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
| | - Arnoldo Frigessi
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
- Oslo Centre for Biostatistics and Epidemiology Oslo University Hospital Oslo Norway
| | - Samuel Kaski
- Helsinki Institute of Information Technology, Department of Computer Science Aalto University Helsinki Finland
- Department of Computer Science University of Manchester Manchester UK
| | - David T. Frazier
- Department of Econometrics & Business Statistics Monash University Clayton Victoria Australia
| | | | - Gael M. Martin
- Department of Econometrics & Business Statistics Monash University Clayton Victoria Australia
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology University of Oslo Oslo Norway
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics University of Helsinki Helsinki Finland
- Parasites and Microbes Wellcome Sanger Institute Hinxton UK
| |
Collapse
|
7
|
Zhu Y, Shin HM, Jiang L, Bartell SM. Retrospective exposure reconstruction using approximate Bayesian computation: A case study on perfluorooctanoic acid and preeclampsia. ENVIRONMENTAL RESEARCH 2022; 209:112892. [PMID: 35149111 DOI: 10.1016/j.envres.2022.112892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 01/28/2022] [Accepted: 02/02/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND In environmental epidemiology, measurements of toxicants in biological samples are often used as individual exposure assignments. It is common to obtain only one or a few exposure biomarkers per person and use those measurements to represent each person's relevant toxicant exposure for a given health outcome, even though most exposure biomarkers can fluctuate over time. When the timing of the exposure reflected by the biomarker measurement is misaligned with disease development especially if it occurs after the disease outcome, results could be subject to reverse causality or exposure measurement error. OBJECTIVE This study aimed to use an approximate Bayesian computation (ABC) method to improve PFOA exposure estimates and characterize the effects of PFOA on preeclampsia in the C8 Studies. METHODS Serum PFOA concentrations were measured in blood samples collected during 2005-2006 in West Virginia and Ohio (the C8 Studies), and residential and water use histories and pregnancy outcomes were obtained from self-reports. Our previous results may have been influenced by the choice of methods for characterizing PFOA exposures. Here we use an ABC method to combine measured PFOA serum concentrations and environmentally modeled PFOA concentrations to reconstruct historical PFOA exposures. We also expanded our previous work by assuming more realistic lognormal distributions for key input parameters in the exposure and pharmacokinetic models. RESULTS Compared to using fixed values of model parameters and Monte Carlo simulations, ABC produced similar Spearman correlations between estimated and measured serum PFOA concentrations, yet substantially reduced the mean squared error by over 50%. Based on ABC, compared to previous studies, we found a similar adjusted odds ratio (AOR) for the association between PFOA and preeclampsia. CONCLUSIONS Bayesian combination of modeled exposure and measured biomarker concentrations can reduce exposure measurement error compared to modeled exposure.
Collapse
Affiliation(s)
- Yachen Zhu
- Program in Public Health, University of California, Irvine, CA, 92697-3957, USA
| | - Hyeong-Moo Shin
- Department of Earth and Environmental Science, University of Texas, Arlington, TX, 76019-0049, USA
| | - Luohua Jiang
- Program in Public Health, University of California, Irvine, CA, 92697-3957, USA; Department of Epidemiology and Biostatistics, University of California, Irvine, CA, 92697-3957, USA
| | - Scott M Bartell
- Program in Public Health, University of California, Irvine, CA, 92697-3957, USA; Department of Statistics, University of California, Irvine, CA, 92697-1250, USA; Department of Environmental and Occupational Health, University of California, Irvine, CA, 92697-1250, USA.
| |
Collapse
|
8
|
Kaji T, Ročková V. Metropolis-Hastings via Classification. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2060836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Hainy M, Price DJ, Restif O, Drovandi C. Optimal Bayesian design for model discrimination via classification. STATISTICS AND COMPUTING 2022; 32:25. [PMID: 35310544 PMCID: PMC8924111 DOI: 10.1007/s11222-022-10078-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 01/20/2022] [Indexed: 06/14/2023]
Abstract
UNLABELLED Performing optimal Bayesian design for discriminating between competing models is computationally intensive as it involves estimating posterior model probabilities for thousands of simulated data sets. This issue is compounded further when the likelihood functions for the rival models are computationally expensive. A new approach using supervised classification methods is developed to perform Bayesian optimal model discrimination design. This approach requires considerably fewer simulations from the candidate models than previous approaches using approximate Bayesian computation. Further, it is easy to assess the performance of the optimal design through the misclassification error rate. The approach is particularly useful in the presence of models with intractable likelihoods but can also provide computational advantages when the likelihoods are manageable. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10078-2.
Collapse
Affiliation(s)
- Markus Hainy
- Department of Applied Statistics, Johannes Kepler University, 4040 Linz, Austria
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD 4000 Australia
| | - David J. Price
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3010 Australia
- The Department of Infectious Diseases at The Peter Doherty Institute for Infection and Immunity, The University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC 3000 Australia
- Department of Veterinary Medicine, University of Cambridge, Cambridgeshire, CB3 0ES United Kingdom
| | - Olivier Restif
- Department of Veterinary Medicine, University of Cambridge, Cambridgeshire, CB3 0ES United Kingdom
| | - Christopher Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD 4000 Australia
- ARC Centre of Excellence for Mathematical & Statistical Frontiers, Melbourne, Australia
- QUT Centre for Data Science, Brisbane, Australia
| |
Collapse
|
10
|
McCullough K, Dmitrieva T, Ebrahimi N. New approximate Bayesian computation algorithm for censored data. Comput Stat 2021. [DOI: 10.1007/s00180-021-01167-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
11
|
Bi J, Shen W, Zhu W. Random Forest Adjustment for Approximate Bayesian Computation. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1981341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jiefeng Bi
- Wang Yanan Institute for Studies in Economics (WISE), Xiamen University, Xiamen, China
| | - Weining Shen
- Department of Statistics, University of California, Irvine, CA
| | - Weixuan Zhu
- Wang Yanan Institute for Studies in Economics (WISE), Department of Statistics and Data Science, School of Economics, Xiamen University, Xiamen, China
| |
Collapse
|
12
|
Priddle JW, Sisson SA, Frazier DT, Turner I, Drovandi C. Efficient Bayesian Synthetic Likelihood With Whitening Transformations. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1979012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Jacob W. Priddle
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | | | - David T. Frazier
- Department of Econometrics and Business Statistics, Monash University, Clayton, Australia
| | - Ian Turner
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| | - Christopher Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
13
|
Zhu B, Pei Y, Li C. An improved approximate Bayesian computation scheme for parameter inference based on a recalibration post-processing method. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1963456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Bin Zhu
- School of Computer Science and Technology, Tiangong University, Tianjin, China
| | - Yongzhen Pei
- School of Mathematical Sciences, Tiangong University, Tianjin, China
| | - Changguo Li
- Department of Basic Science, Military Traffic University, Tianjin, China
| |
Collapse
|
14
|
Dmitrieva T, McCullough K, Ebrahimi N. Improved approximate Bayesian computation methods via empirical likelihood. Comput Stat 2021. [DOI: 10.1007/s00180-020-00985-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
15
|
Yu X, Nott DJ, Tran MN, Klein N. Assessment and Adjustment of Approximate Inference Algorithms Using the Law of Total Variance. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1880921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xuejun Yu
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - David J. Nott
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
- Institute of Operations Research and Analytics, National University of Singapore, Singapore
| | - Minh-Ngoc Tran
- Discipline of Business Analytics, The University of Sydney Business School, Australian Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), Sydney, NSW, Australia
| | - Nadja Klein
- School of Business and Economics, Statistics and Data Science, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
16
|
Lepers C, Billiard S, Porte M, Méléard S, Tran VC. Inference with selection, varying population size, and evolving population structure: application of ABC to a forward-backward coalescent process with interactions. Heredity (Edinb) 2021; 126:335-350. [PMID: 33128035 PMCID: PMC8027416 DOI: 10.1038/s41437-020-00381-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 10/15/2020] [Indexed: 11/08/2022] Open
Abstract
Genetic data are often used to infer demographic history and changes or detect genes under selection. Inferential methods are commonly based on models making various strong assumptions: demography and population structures are supposed a priori known, the evolution of the genetic composition of a population does not affect demography nor population structure, and there is no selection nor interaction between and within genetic strains. In this paper, we present a stochastic birth-death model with competitive interactions and asexual reproduction. We develop an inferential procedure for ecological, demographic, and genetic parameters. We first show how genetic diversity and genealogies are related to birth and death rates, and to how individuals compete within and between strains. This leads us to propose an original model of phylogenies, with trait structure and interactions, that allows multiple merging. Second, we develop an Approximate Bayesian Computation framework to use our model for analyzing genetic data. We apply our procedure to simulated data from a toy model, and to real data by analyzing the genetic diversity of microsatellites on Y-chromosomes sampled from Central Asia human populations in order to test whether different social organizations show significantly different fertilities.
Collapse
Affiliation(s)
| | - Sylvain Billiard
- Univ. Lille, CNRS, UMR 819 8 -Evo-Eco-Paleo, F-59000, Lille, France.
| | - Matthieu Porte
- IGN, Institut National de l'Information Géographique et Forestière, F-94165, Saint-Mandé, France.
| | - Sylvie Méléard
- CMAP, CNRS, Ecole Polytechnique, Institut polytechnique de Paris, route de Saclay, 91128, Palaiseau Cedex, France.
| | - Viet Chi Tran
- LAMA, Univ Gustave Eiffel, Univ Paris Est Creteil, CNRS, F-77454, Marne-la-Vallée, France.
| |
Collapse
|
17
|
Simola U, Cisewski-Kehe J, Wolpert RL. Approximate Bayesian computation for finite mixture models. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1843169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Umberto Simola
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | | | | |
Collapse
|
18
|
Sanchez T, Cury J, Charpiat G, Jay F. Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Mol Ecol Resour 2020; 21:2645-2660. [DOI: 10.1111/1755-0998.13224] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 06/19/2020] [Accepted: 07/02/2020] [Indexed: 12/28/2022]
Affiliation(s)
- Théophile Sanchez
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Jean Cury
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Guillaume Charpiat
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| | - Flora Jay
- Laboratoire de Recherche en Informatique CNRS UMR 8623 Université Paris‐Saclay Orsay France
| |
Collapse
|
19
|
Warne DJ, Baker RE, Simpson MJ. Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art. J R Soc Interface 2020; 16:20180943. [PMID: 30958205 DOI: 10.1098/rsif.2018.0943] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Stochasticity is a key characteristic of intracellular processes such as gene regulation and chemical signalling. Therefore, characterizing stochastic effects in biochemical systems is essential to understand the complex dynamics of living things. Mathematical idealizations of biochemically reacting systems must be able to capture stochastic phenomena. While robust theory exists to describe such stochastic models, the computational challenges in exploring these models can be a significant burden in practice since realistic models are analytically intractable. Determining the expected behaviour and variability of a stochastic biochemical reaction network requires many probabilistic simulations of its evolution. Using a biochemical reaction network model to assist in the interpretation of time-course data from a biological experiment is an even greater challenge due to the intractability of the likelihood function for determining observation probabilities. These computational challenges have been subjects of active research for over four decades. In this review, we present an accessible discussion of the major historical developments and state-of-the-art computational techniques relevant to simulation and inference problems for stochastic biochemical reaction network models. Detailed algorithms for particularly important methods are described and complemented with Matlab® implementations. As a result, this review provides a practical and accessible introduction to computational methods for stochastic models within the life sciences community.
Collapse
Affiliation(s)
- David J Warne
- 1 School of Mathematical Sciences, Queensland University of Technology , Brisbane, Queensland 4001 , Australia
| | - Ruth E Baker
- 2 Mathematical Institute, University of Oxford , Oxford OX2 6GG , UK
| | - Matthew J Simpson
- 1 School of Mathematical Sciences, Queensland University of Technology , Brisbane, Queensland 4001 , Australia
| |
Collapse
|
20
|
|
21
|
Radev ST, Mertens UK, Voss A, Köthe U. Towards end-to-end likelihood-free inference with convolutional neural networks. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73:23-43. [PMID: 30793299 DOI: 10.1111/bmsp.12159] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Revised: 12/11/2018] [Indexed: 06/09/2023]
Abstract
Complex simulator-based models with non-standard sampling distributions require sophisticated design choices for reliable approximate parameter inference. We introduce a fast, end-to-end approach for approximate Bayesian computation (ABC) based on fully convolutional neural networks. The method enables users of ABC to derive simultaneously the posterior mean and variance of multidimensional posterior distributions directly from raw simulated data. Once trained on simulated data, the convolutional neural network is able to map real data samples of variable size to the first two posterior moments of the relevant parameter's distributions. Thus, in contrast to other machine learning approaches to ABC, our approach allows us to generate reusable models that can be applied by different researchers employing the same model. We verify the utility of our method on two common statistical models (i.e., a multivariate normal distribution and a multiple regression scenario), for which the posterior parameter distributions can be derived analytically. We then apply our method to recover the parameters of the leaky competing accumulator (LCA) model and we reference our results to the current state-of-the-art technique, which is the probability density estimation (PDA). Results show that our method exhibits a lower approximation error compared with other machine learning approaches to ABC. It also performs similarly to PDA in recovering the parameters of the LCA model.
Collapse
Affiliation(s)
| | - Ulf K Mertens
- Institute of Psychology, Heidelberg University, Germany
| | - Andreas Voss
- Institute of Psychology, Heidelberg University, Germany
| | - Ullrich Köthe
- Heidelberg Collaboratory for Image Processing (HCI), Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Germany
| |
Collapse
|
22
|
Jay F, Boitard S, Austerlitz F. An ABC Method for Whole-Genome Sequence Data: Inferring Paleolithic and Neolithic Human Expansions. Mol Biol Evol 2020; 36:1565-1579. [PMID: 30785202 DOI: 10.1093/molbev/msz038] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Species generally undergo a complex demographic history consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large data sets. Here, we design an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation, a simulation-based statistical framework that allows 1) identifying the best demographic scenario among several competing scenarios and 2) estimating the best-fitting parameters under the chosen scenario. Approximate Bayesian Computation relies on the computation of summary statistics. Using a cross-validation approach, we show that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (e.g., heterozygosity and Tajima's D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrate the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally show that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion is the most relevant for Eurasian populations.
Collapse
Affiliation(s)
- Flora Jay
- Laboratoire EcoAnthropologie et Ethnobiologie, CNRS/MNHN/Université Paris Diderot, Paris, France.,Laboratoire de Recherche en Informatique, CNRS/Université Paris-Sud/Université Paris-Saclay, Orsay, France
| | - Simon Boitard
- GenPhySE, Université de Toulouse, INRA, INPT, INP-ENVT, Castanet Tolosan, France
| | - Frédéric Austerlitz
- Laboratoire EcoAnthropologie et Ethnobiologie, CNRS/MNHN/Université Paris Diderot, Paris, France
| |
Collapse
|
23
|
Buckwar E, Tamborrino M, Tubikanec I. Spectral density-based and measure-preserving ABC for partially observed diffusion processes. An illustration on Hamiltonian SDEs. STATISTICS AND COMPUTING 2020; 30:627-648. [PMID: 32132771 PMCID: PMC7026277 DOI: 10.1007/s11222-019-09909-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 10/17/2019] [Indexed: 05/15/2023]
Abstract
Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise: First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler-Maruyama discretisation) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.
Collapse
Affiliation(s)
- Evelyn Buckwar
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| | - Massimiliano Tamborrino
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| | - Irene Tubikanec
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| |
Collapse
|
24
|
Lintusaari J, Blomstedt P, Rose B, Sivula T, Gutmann MU, Kaski S, Corander J. Resolving outbreak dynamics using approximate Bayesian computation for stochastic birth-death models. Wellcome Open Res 2019; 4:14. [PMID: 37744419 PMCID: PMC10514576 DOI: 10.12688/wellcomeopenres.15048.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/16/2019] [Indexed: 09/26/2023] Open
Abstract
Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters, such as the reproductive number R, may remain poorly identifiable with these models. Here we show that this identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with its own distinct dynamics and clear epidemiological interpretation. We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a byproduct of the inference, the model provides an estimate of the infectious population size at the time the data were collected. The acquired estimate is approximately two orders of magnitude smaller than assumed in earlier related studies, and it is much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three times larger than previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.
Collapse
Affiliation(s)
- Jarno Lintusaari
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Paul Blomstedt
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Brittany Rose
- Department of Infectious Diseases Epidemiology and Modelling, Norwegian Institute of Public Health, Oslo, Norway
- Helsinki Institute for Information Technology (HIIT), Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Tuomas Sivula
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | | | - Samuel Kaski
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Jukka Corander
- Helsinki Institute for Information Technology (HIIT), Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Infection Genomics, The Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
25
|
Tancredi A. Approximate Bayesian inference for discretely observed continuous-time multi-state models. Biometrics 2019; 75:966-977. [PMID: 30648730 DOI: 10.1111/biom.13019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 12/21/2018] [Indexed: 11/30/2022]
Abstract
Inference for continuous time multi-state models presents considerable computational difficulties when the process is only observed at discrete time points with no additional information about the state transitions. In fact, for general multi-state Markov model, evaluation of the likelihood function is possible only via intensive numerical approximations. Moreover, in real applications, transitions between states may depend on the time since entry into the current state, and semi-Markov models, where the likelihood function is not available in closed form, should be fitted to the data. Approximate Bayesian Computation (ABC) methods, which make use only of comparisons between simulated and observed summary statistics, represent a solution to intractable likelihood problems and provide alternative algorithms when the likelihood calculation is computationally too costly. In this article we investigate the potentiality of ABC techniques for multi-state models both to obtain the posterior distributions of the model parameters and to compare Markov and semi-Markov models. In addition, we will also exploit ABC methods to estimate and compare hidden Markov and semi-Markov models when observed states are subject to classification errors. We illustrate the performance of the ABC methodology both with simulated data and with a real data example.
Collapse
Affiliation(s)
- Andrea Tancredi
- Department of Methods and Models for Economics Territory and Finance, Sapienza University of Rome, Via del Castro Laurenziano 9, 00161, Rome, Italy
| |
Collapse
|
26
|
Martin GM, McCabe BPM, Frazier DT, Maneesoonthorn W, Robert CP. Auxiliary Likelihood-Based Approximate Bayesian Computation in State Space Models. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2018.1552154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Gael M. Martin
- Department of Econometrics and Business Statistics, Monash University, Clayton, VIC, Australia
| | | | - David T. Frazier
- Department of Econometrics and Business Statistics, Monash University, Clayton, VIC, Australia
| | | | | |
Collapse
|
27
|
Izbicki R, Lee AB, Pospisil T. ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2018.1546594] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Rafael Izbicki
- Department of Statistics, Federal University of São Carlos, São Carlos, Brazil
| | - Ann B. Lee
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA
| | - Taylor Pospisil
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA
| |
Collapse
|
28
|
An Z, South LF, Nott DJ, Drovandi CC. Accelerating Bayesian Synthetic Likelihood With the Graphical Lasso. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2018.1537928] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Ziwen An
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- Australian Research Council Centre of Excellence for Mathematical and Statistics Frontiers (ACEMS), Australia
| | - Leah F. South
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- Australian Research Council Centre of Excellence for Mathematical and Statistics Frontiers (ACEMS), Australia
| | - David J. Nott
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Christopher C. Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- Australian Research Council Centre of Excellence for Mathematical and Statistics Frontiers (ACEMS), Australia
| |
Collapse
|
29
|
Lintusaari J, Blomstedt P, Sivula T, Gutmann MU, Kaski S, Corander J. Resolving outbreak dynamics using approximate Bayesian computation for stochastic birth-death models. Wellcome Open Res 2019. [DOI: 10.12688/wellcomeopenres.15048.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters such as the reproductive number R may remain poorly identifiable with these models. Here we show that the identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case-study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with their distinct dynamics and clear epidemiological interpretation. We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a by-product of the inference, the model provides an estimate of the infectious population size at the time the data was collected. The acquired estimate is approximately two orders of magnitude smaller compared to the assumptions made in the earlier related studies, and much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three-fold compared with the previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.
Collapse
|
30
|
Cisewski-Kehe J, Weller G, Schafer C. A preferential attachment model for the stellar initial mass function. Electron J Stat 2019. [DOI: 10.1214/19-ejs1556] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
Järvenpää M, Gutmann MU, Vehtari A, Marttinen P. Gaussian process modelling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Ann Appl Stat 2018. [DOI: 10.1214/18-aoas1150] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
32
|
ABC model selection for spatial extremes models applied to South Australian maximum temperature data. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2018.06.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
33
|
|
34
|
Hoey JA, Pinsky ML. Genomic signatures of environmental selection despite near-panmixia in summer flounder. Evol Appl 2018; 11:1732-1747. [PMID: 30344639 PMCID: PMC6183468 DOI: 10.1111/eva.12676] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 06/13/2018] [Accepted: 06/16/2018] [Indexed: 01/01/2023] Open
Abstract
Rapid environmental change is altering the selective pressures experienced by marine species. While adaptation to local environmental conditions depends on a balance between dispersal and natural selection across the seascape, the spatial scale of adaptation and the relative importance of mechanisms maintaining adaptation in the ocean are not well understood. Here, using population assignment tests, Approximate Bayesian Computation (ABC), and genome scans with double-digest restriction-site associated DNA sequencing data, we evaluated population structure and locus-environment associations in a commercially important species, summer flounder (Paralichthys dentatus), along the U.S. east coast. Based on 1,137 single nucleotide polymorphisms across 232 individuals spanning nearly 1,900 km, we found no indication of population structure across Cape Hatteras, North Carolina (F ST = 0.0014) or of isolation by distance along the coast using individual relatedness. ABC estimated the probability of dispersal across the biogeographic break at Cape Hatteras to be high (95% credible interval: 7%-50% migration). However, we found 15 loci whose allele frequencies were associated with at least one of four environmental variables. Of those, 11 were correlated with bottom temperature. For summer flounder, our results suggest continued fisheries management as a single population and identify likely response mechanisms to climate change. Broadly speaking, our findings suggest that spatial balancing selection can manifest in adaptive divergence on regional scales in marine fish despite high dispersal, and that these conditions likely result in the widespread distribution of adaptive alleles and a high potential for future genetic adaptation in response to changing environmental conditions. In the context of a rapidly changing world, a landscape genomics perspective offers a useful approach for understanding the causes and consequences of genetic differentiation.
Collapse
Affiliation(s)
- Jennifer A. Hoey
- Department of Ecology, Evolution, & Natural ResourcesRutgers UniversityNew BrunswickNew JerseyUSA
| | - Malin L. Pinsky
- Department of Ecology, Evolution, & Natural ResourcesRutgers UniversityNew BrunswickNew JerseyUSA
| |
Collapse
|
35
|
Frazier DT, Martin GM, Robert CP, Rousseau J. Asymptotic properties of approximate Bayesian computation. Biometrika 2018. [DOI: 10.1093/biomet/asy027] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- D T Frazier
- Department of Econometrics and Business Statistics, Monash University, Scenic Boulevard, Clayton, Victoria, Australia
| | - G M Martin
- Department of Econometrics and Business Statistics, Monash University, Scenic Boulevard, Clayton, Victoria, Australia
| | - C P Robert
- Université Paris Dauphine, Place du Maréchal de Lattre de Tassigny, Paris cedex 16, France
| | - J Rousseau
- Department of Statistics, University of Oxford, 24–29 St Giles’, Oxford, U.K
| |
Collapse
|
36
|
Lintusaari J, Gutmann MU, Dutta R, Kaski S, Corander J. Fundamentals and Recent Developments in Approximate Bayesian Computation. Syst Biol 2018; 66:e66-e82. [PMID: 28175922 PMCID: PMC5837704 DOI: 10.1093/sysbio/syw077] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 08/09/2016] [Accepted: 08/09/2016] [Indexed: 12/16/2022] Open
Abstract
Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.]
Collapse
Affiliation(s)
- Jarno Lintusaari
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Michael U Gutmann
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Ritabrata Dutta
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology HIIT, Espoo, Finland
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Espoo, Finland.,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.,Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
37
|
Tataru P, Simonsen M, Bataillon T, Hobolth A. Statistical Inference in the Wright-Fisher Model Using Allele Frequency Data. Syst Biol 2018; 66:e30-e46. [PMID: 28173553 PMCID: PMC5837693 DOI: 10.1093/sysbio/syw056] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 05/31/2016] [Accepted: 06/06/2016] [Indexed: 11/14/2022] Open
Abstract
The Wright–Fisher model provides an elegant mathematical framework for understanding allele frequency data. In particular, the model can be used to infer the demographic history of species and identify loci under selection. A crucial quantity for inference under the Wright–Fisher model is the distribution of allele frequencies (DAF). Despite the apparent simplicity of the model, the calculation of the DAF is challenging. We review and discuss strategies for approximating the DAF, and how these are used in methods that perform inference from allele frequency data. Various evolutionary forces can be incorporated in the Wright–Fisher model, and we consider these in turn. We begin our review with the basic bi-allelic Wright–Fisher model where random genetic drift is the only evolutionary force. We then consider mutation, migration, and selection. In particular, we compare diffusion-based and moment-based methods in terms of accuracy, computational efficiency, and analytical tractability. We conclude with a brief overview of the multi-allelic process with a general mutation model.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Maria Simonsen
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| |
Collapse
|
38
|
Li W, Fearnhead P. Convergence of regression-adjusted approximate Bayesian computation. Biometrika 2018. [DOI: 10.1093/biomet/asx081] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Wentao Li
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne NE1 7RU, U.K
| | - Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Bailrigg, Lancaster LA1 4YF, U.K
| |
Collapse
|
39
|
Li W, Fearnhead P. On the asymptotic efficiency of approximate Bayesian computation estimators. Biometrika 2018. [DOI: 10.1093/biomet/asx078] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Wentao Li
- School of Mathematics, Statistics and Physics, Newcastle University, Newcastle upon Tyne NE1 7RU, U.K
| | - Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Bailrigg, Lancaster LA1 4YF, U.K
| |
Collapse
|
40
|
Karabatsos G, Leisen F. An approximate likelihood perspective on ABC methods. STATISTICS SURVEYS 2018. [DOI: 10.1214/18-ss120] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
41
|
Fasiolo M, Wood SN, Hartig F, Bravington MV. An extended empirical saddlepoint approximation for intractable likelihoods. Electron J Stat 2018. [DOI: 10.1214/18-ejs1433] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
42
|
Inferring responses to climate dynamics from historical demography in neotropical forest lizards. Proc Natl Acad Sci U S A 2017; 113:7978-85. [PMID: 27432951 DOI: 10.1073/pnas.1601063113] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
We apply a comparative framework to test for concerted demographic changes in response to climate shifts in the neotropical lowland forests, learning from the past to inform projections of the future. Using reduced genomic (SNP) data from three lizard species codistributed in Amazonia and the Atlantic Forest (Anolis punctatus, Anolis ortonii, and Polychrus marmoratus), we first reconstruct former population history and test for assemblage-level responses to cycles of moisture transport recently implicated in changes of forest distribution during the Late Quaternary. We find support for population shifts within the time frame of inferred precipitation fluctuations (the last 250,000 y) but detect idiosyncratic responses across species and uniformity of within-species responses across forest regions. These results are incongruent with expectations of concerted population expansion in response to increased rainfall and fail to detect out-of-phase demographic syndromes (expansions vs. contractions) across forest regions. Using reduced genomic data to infer species-specific demographical parameters, we then model the plausible spatial distribution of genetic diversity in the Atlantic Forest into future climates (2080) under a medium carbon emission trajectory. The models forecast very distinct trajectories for the lizard species, reflecting unique estimated population densities and dispersal abilities. Ecological and demographic constraints seemingly lead to distinct and asynchronous responses to climatic regimes in the tropics, even among similarly distributed taxa. Incorporating such constraints is key to improve modeling of the distribution of biodiversity in the past and future.
Collapse
|
43
|
Affiliation(s)
- L. F. Price
- School of Mathematical Sciences, Queensland University of Technology, Australia and Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
| | - C. C. Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Australia and Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS)
| | - A. Lee
- Department of Statistics, University of Warwick, Coventry, UK
| | - D. J. Nott
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| |
Collapse
|
44
|
Smith ML, Ruffley M, Espíndola A, Tank DC, Sullivan J, Carstens BC. Demographic model selection using random forests and the site frequency spectrum. Mol Ecol 2017; 26:4562-4573. [PMID: 28665011 DOI: 10.1111/mec.14223] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Revised: 05/16/2017] [Accepted: 05/22/2017] [Indexed: 01/18/2023]
Abstract
Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the "curse of dimensionality" and issues related to the simulation and summarization of data when applied to next-generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated.
Collapse
Affiliation(s)
- Megan L Smith
- Department of Evolution, Ecology & Organismal Biology, The Ohio State University, Columbus, OH, USA
| | - Megan Ruffley
- Department of Biological Sciences, University of Idaho, Moscow, ID, USA.,Biological Sciences, Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, ID, USA
| | - Anahí Espíndola
- Department of Biological Sciences, University of Idaho, Moscow, ID, USA.,Biological Sciences, Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, ID, USA
| | - David C Tank
- Department of Biological Sciences, University of Idaho, Moscow, ID, USA.,Biological Sciences, Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, ID, USA
| | - Jack Sullivan
- Department of Biological Sciences, University of Idaho, Moscow, ID, USA.,Biological Sciences, Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, ID, USA
| | - Bryan C Carstens
- Department of Evolution, Ecology & Organismal Biology, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
45
|
Li J, Nott D, Fan Y, Sisson S. Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2016.07.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
46
|
Rodrigues G, Nott DJ, Sisson S. Functional regression approximate Bayesian computation for Gaussian process density estimation. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2016.05.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
47
|
Kypraios T, Neal P, Prangle D. A tutorial introduction to Bayesian inference for stochastic epidemic models using Approximate Bayesian Computation. Math Biosci 2016; 287:42-53. [PMID: 27444577 DOI: 10.1016/j.mbs.2016.07.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Revised: 06/30/2016] [Accepted: 07/01/2016] [Indexed: 10/21/2022]
Abstract
Likelihood-based inference for disease outbreak data can be very challenging due to the inherent dependence of the data and the fact that they are usually incomplete. In this paper we review recent Approximate Bayesian Computation (ABC) methods for the analysis of such data by fitting to them stochastic epidemic models without having to calculate the likelihood of the observed data. We consider both non-temporal and temporal-data and illustrate the methods with a number of examples featuring different models and datasets. In addition, we present extensions to existing algorithms which are easy to implement and provide an improvement to the existing methodology. Finally, R code to implement the algorithms presented in the paper is available on https://github.com/kypraios/epiABC.
Collapse
Affiliation(s)
| | - Peter Neal
- Department of Mathematics and Statistics, Lancaster University, UK
| | - Dennis Prangle
- School of Mathematics and Statistics, Newcastle University, UK
| |
Collapse
|
48
|
Kousathanas A, Leuenberger C, Helfer J, Quinodoz M, Foll M, Wegmann D. Likelihood-Free Inference in High-Dimensional Models. Genetics 2016; 203:893-904. [PMID: 27052569 PMCID: PMC4896201 DOI: 10.1534/genetics.116.187567] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/04/2016] [Indexed: 11/18/2022] Open
Abstract
Methods that bypass analytical evaluations of the likelihood function have become an indispensable tool for statistical inference in many fields of science. These so-called likelihood-free methods rely on accepting and rejecting simulations based on summary statistics, which limits them to low-dimensional models for which the value of the likelihood is large enough to result in manageable acceptance rates. To get around these issues, we introduce a novel, likelihood-free Markov chain Monte Carlo (MCMC) method combining two key innovations: updating only one parameter per iteration and accepting or rejecting this update based on subsets of statistics approximately sufficient for this parameter. This increases acceptance rates dramatically, rendering this approach suitable even for models of very high dimensionality. We further derive that for linear models, a one-dimensional combination of statistics per parameter is sufficient and can be found empirically with simulations. Finally, we demonstrate that our method readily scales to models of very high dimensionality, using toy models as well as by jointly inferring the effective population size, the distribution of fitness effects (DFE) of segregating mutations, and selection coefficients for each locus from data of a recent experiment on the evolution of drug resistance in influenza.
Collapse
Affiliation(s)
- Athanasios Kousathanas
- Department of Biology and Biochemistry, University of Fribourg, 1700 Fribourg, Switzerland Swiss Institute of Bioinformatics, 1700 Fribourg, Switzerland
| | | | - Jonas Helfer
- Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge Massachusetts 02139
| | - Mathieu Quinodoz
- Department of Computational Biology, University of Lausanne, 1200 Lausanne, Switzerland
| | - Matthieu Foll
- International Agency for Research on Cancer, 69372 Lyon, France
| | - Daniel Wegmann
- Department of Biology and Biochemistry, University of Fribourg, 1700 Fribourg, Switzerland Swiss Institute of Bioinformatics, 1700 Fribourg, Switzerland
| |
Collapse
|
49
|
Bertin K, Lacour C, Rivoirard V. Adaptive pointwise estimation of conditional density function. ANNALES DE L'INSTITUT HENRI POINCARÉ, PROBABILITÉS ET STATISTIQUES 2016. [DOI: 10.1214/14-aihp665] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
50
|
|