1
|
Rmus M, Pan TF, Xia L, Collins AGE. Artificial neural networks for model identification and parameter estimation in computational cognitive models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.14.557793. [PMID: 37767088 PMCID: PMC10521012 DOI: 10.1101/2023.09.14.557793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/29/2023]
Abstract
Computational cognitive models have been used extensively to formalize cognitive processes. Model parameters offer a simple way to quantify individual differences in how humans process information. Similarly, model comparison allows researchers to identify which theories, embedded in different models, provide the best accounts of the data. Cognitive modeling uses statistical tools to quantitatively relate models to data that often rely on computing/estimating the likelihood of the data under the model. However, this likelihood is computationally intractable for a substantial number of models. These relevant models may embody reasonable theories of cognition, but are often under-explored due to the limited range of tools available to relate them to data. We contribute to filling this gap in a simple way using artificial neural networks (ANNs) to map data directly onto model identity and parameters, bypassing the likelihood estimation. We test our instantiation of an ANN as a cognitive model fitting tool on classes of cognitive models with strong inter-trial dependencies (such as reinforcement learning models), which offer unique challenges to most methods. We show that we can adequately perform both parameter estimation and model identification using our ANN approach, including for models that cannot be fit using traditional likelihood-based methods. We further discuss our work in the context of the ongoing research leveraging simulation-based approaches to parameter estimation and model identification, and how these approaches broaden the class of cognitive models researchers can quantitatively investigate.
Collapse
|
2
|
Krüger M, Mishra A, Spichtinger P, Pöschl U, Berkemeier T. A numerical compass for experiment design in chemical kinetics and molecular property estimation. J Cheminform 2024; 16:34. [PMID: 38520014 PMCID: PMC10960421 DOI: 10.1186/s13321-024-00825-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 03/10/2024] [Indexed: 03/25/2024] Open
Abstract
Kinetic process models are widely applied in science and engineering, including atmospheric, physiological and technical chemistry, reactor design, or process optimization. These models rely on numerous kinetic parameters such as reaction rate, diffusion or partitioning coefficients. Determining these properties by experiments can be challenging, especially for multiphase systems, and researchers often face the task of intuitively selecting experimental conditions to obtain insightful results. We developed a numerical compass (NC) method that integrates computational models, global optimization, ensemble methods, and machine learning to identify experimental conditions with the greatest potential to constrain model parameters. The approach is based on the quantification of model output variance in an ensemble of solutions that agree with experimental data. The utility of the NC method is demonstrated for the parameters of a multi-layer model describing the heterogeneous ozonolysis of oleic acid aerosols. We show how neural network surrogate models of the multiphase chemical reaction system can be used to accelerate the application of the NC for a comprehensive mapping and analysis of experimental conditions. The NC can also be applied for uncertainty quantification of quantitative structure-activity relationship (QSAR) models. We show that the uncertainty calculated for molecules that are used to extend training data correlates with the reduction of QSAR model error. The code is openly available as the Julia package KineticCompass.
Collapse
Affiliation(s)
- Matteo Krüger
- Multiphase Chemistry Department, Max Planck Institute for Chemistry, Hahn-Meitner-Weg 1, Mainz, 55128, Rhineland Palatinate, Germany
| | - Ashmi Mishra
- Multiphase Chemistry Department, Max Planck Institute for Chemistry, Hahn-Meitner-Weg 1, Mainz, 55128, Rhineland Palatinate, Germany
| | - Peter Spichtinger
- Institute for Atmospheric Physics, Johannes Gutenberg University, Johann-Joachim-Becher-Weg 21, Mainz, 55128, Rhineland Palatinate, Germany
| | - Ulrich Pöschl
- Multiphase Chemistry Department, Max Planck Institute for Chemistry, Hahn-Meitner-Weg 1, Mainz, 55128, Rhineland Palatinate, Germany
| | - Thomas Berkemeier
- Multiphase Chemistry Department, Max Planck Institute for Chemistry, Hahn-Meitner-Weg 1, Mainz, 55128, Rhineland Palatinate, Germany.
| |
Collapse
|
3
|
Luo M, Zhu J, Jia J, Zhang H, Zhao J. Progress on network modeling and analysis of gut microecology: a review. Appl Environ Microbiol 2024; 90:e0009224. [PMID: 38415584 PMCID: PMC11207142 DOI: 10.1128/aem.00092-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open
Abstract
The gut microecological network is a complex microbial community within the human body that plays a key role in linking dietary nutrition and host physiology. To understand the complex relationships among microbes and their functions within this community, network analysis has emerged as a powerful tool. By representing the interactions between microbes and their associated omics data as a network, we can gain a comprehensive understanding of the ecological mechanisms that drive the human gut microbiota. In addition, the network-based approach provides a more intuitive analysis of the gut microbiota, simplifying the study of its complex dynamics and interdependencies. This review provides a comprehensive overview of the methods used to construct and analyze networks in the context of gut microecological background. We discuss various types of network modeling approaches, including co-occurrence networks, causal networks, dynamic networks, and multi-omics networks, and describe the analytical techniques used to identify important network properties. We also highlight the challenges and limitations of network modeling in this area, such as data scarcity and heterogeneity, and provide future research directions to overcome these limitations. By exploring these network-based methods, researchers can gain valuable insights into the intricate relationships and functional roles of microbial communities within the gut, ultimately advancing our understanding of the gut microbiota's impact on human health.
Collapse
Affiliation(s)
- Meng Luo
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Jinlin Zhu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Jiajia Jia
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi, China
| | - Hao Zhang
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- National Engineering Research Center for Functional Food, Jiangnan University, Wuxi, Jiangsu, China
- Wuxi Translational Medicine Research Center, Jiangsu Translational Medicine Research Institute Wuxi Branch, Wuxi, China
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, China
| | - Jianxin Zhao
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
- Wuxi Translational Medicine Research Center, Jiangsu Translational Medicine Research Institute Wuxi Branch, Wuxi, China
- (Yangzhou) Institute of Food Biotechnology, Jiangnan University, Yangzhou, China
| |
Collapse
|
4
|
Valentin S, Kleinegesse S, Bramley NR, Seriès P, Gutmann MU, Lucas CG. Designing optimal behavioral experiments using machine learning. eLife 2024; 13:e86224. [PMID: 38261382 PMCID: PMC10805374 DOI: 10.7554/elife.86224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 11/19/2023] [Indexed: 01/24/2024] Open
Abstract
Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code to replicate all analyses as well as tutorial notebooks and pointers to adapt the methodology to different experimental settings.
Collapse
Affiliation(s)
- Simon Valentin
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | | | - Neil R Bramley
- Department of Psychology, University of EdinburghEdinburghUnited Kingdom
| | - Peggy Seriès
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | - Michael U Gutmann
- School of Informatics, University of EdinburghEdinburghUnited Kingdom
| | | |
Collapse
|
5
|
Hung KL, Jones MG, Wong ITL, Lange JT, Luebeck J, Scanu E, He BJ, Brückner L, Li R, González RC, Schmargon R, Dörr JR, Belk JA, Bafna V, Werner B, Huang W, Henssen AG, Mischel PS, Chang HY. Coordinated inheritance of extrachromosomal DNA species in human cancer cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.18.549597. [PMID: 37503111 PMCID: PMC10371175 DOI: 10.1101/2023.07.18.549597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The chromosomal theory of inheritance has dominated human genetics, including cancer genetics. Genes on the same chromosome segregate together while genes on different chromosomes assort independently, providing a fundamental tenet of Mendelian inheritance. Extrachromosomal DNA (ecDNA) is a frequent event in cancer that drives oncogene amplification, dysregulated gene expression and intratumoral heterogeneity, including through random segregation during cell division. Distinct ecDNA sequences, herein termed ecDNA species, can co-exist to facilitate intermolecular cooperation in cancer cells. However, how multiple ecDNA species within a tumor cell are assorted and maintained across somatic cell generations to drive cancer cell evolution is not known. Here we show that cooperative ecDNA species can be coordinately inherited through mitotic co-segregation. Imaging and single-cell analyses show that multiple ecDNAs encoding distinct oncogenes co-occur and are correlated in copy number in human cancer cells. EcDNA species are coordinately segregated asymmetrically during mitosis, resulting in daughter cells with simultaneous copy number gains in multiple ecDNA species prior to any selection. Computational modeling reveals the quantitative principles of ecDNA co-segregation and co-selection, predicting their observed distributions in cancer cells. Finally, we show that coordinated inheritance of ecDNAs enables co-amplification of specialized ecDNAs containing only enhancer elements and guides therapeutic strategies to jointly deplete cooperating ecDNA oncogenes. Coordinated inheritance of ecDNAs confers stability to oncogene cooperation and novel gene regulatory circuits, allowing winning combinations of epigenetic states to be transmitted across cell generations.
Collapse
Affiliation(s)
- King L. Hung
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| | - Matthew G. Jones
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| | - Ivy Tsz-Lo Wong
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Joshua T. Lange
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Jens Luebeck
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Elisa Scanu
- Department of Mathematics, Queen Mary University of London, London, UK
| | - Britney Jiayu He
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| | - Lotte Brückner
- Max-Delbrück-Centrum für Molekulare Medizin (BIMSB/BIH), Berlin, Germany
- Experimental and Clinical Research Center (ECRC), Max Delbrück Center for Molecular Medicine and Charité—Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany
| | - Rui Li
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| | - Rocío Chamorro González
- Experimental and Clinical Research Center (ECRC), Max Delbrück Center for Molecular Medicine and Charité—Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany
- Department of Pediatric Oncology/Hematology, Charité—Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Rachel Schmargon
- Experimental and Clinical Research Center (ECRC), Max Delbrück Center for Molecular Medicine and Charité—Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany
- Department of Pediatric Oncology/Hematology, Charité—Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Jan R. Dörr
- Experimental and Clinical Research Center (ECRC), Max Delbrück Center for Molecular Medicine and Charité—Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany
- Department of Pediatric Oncology/Hematology, Charité—Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Julia A. Belk
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA, 92093, USA
| | - Benjamin Werner
- Evolutionary Dynamics Group, Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Weini Huang
- Department of Mathematics, Queen Mary University of London, London, UK
- Group of Theoretical Biology, The State Key Laboratory of Biocontrol, School of Life Science, Sun Yat-sen University, Guangzhou, China
| | - Anton G. Henssen
- Experimental and Clinical Research Center (ECRC), Max Delbrück Center for Molecular Medicine and Charité—Universitätsmedizin Berlin, Lindenberger Weg 80, 13125, Berlin, Germany
- Department of Pediatric Oncology/Hematology, Charité—Universitätsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
- German Cancer Consortium (DKTK), partner site Berlin, and German Cancer Research Center DKFZ, Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
- Berlin Institute of Health, Anna-Louisa-Karsch-Str. 2, 10178, Berlin, Germany
| | - Paul S. Mischel
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Howard Y. Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Gilabert A, Rieux A, Robert S, Vitalis R, Zapater M, Abadie C, Carlier J, Ravigné V. Revisiting the historical scenario of a disease dissemination using genetic data and Approximate Bayesian Computation methodology: The case of Pseudocercospora fijiensis invasion in Africa. Ecol Evol 2023; 13:e10013. [PMID: 37091563 PMCID: PMC10116021 DOI: 10.1002/ece3.10013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 04/25/2023] Open
Abstract
The reconstruction of geographic and demographic scenarios of dissemination for invasive pathogens of crops is a key step toward improving the management of emerging infectious diseases. Nowadays, the reconstruction of biological invasions typically uses the information of both genetic and historical information to test for different hypotheses of colonization. The Approximate Bayesian Computation framework and its recent Random Forest development (ABC-RF) have been successfully used in evolutionary biology to decipher multiple histories of biological invasions. Yet, for some organisms, typically plant pathogens, historical data may not be reliable notably because of the difficulty to identify the organism and the delay between the introduction and the first mention. We investigated the history of the invasion of Africa by the fungal pathogen of banana Pseudocercospora fijiensis, by testing the historical hypothesis against other plausible hypotheses. We analyzed the genetic structure of eight populations from six eastern and western African countries, using 20 microsatellite markers and tested competing scenarios of population foundation using the ABC-RF methodology. We do find evidence for an invasion front consistent with the historical hypothesis, but also for the existence of another front never mentioned in historical records. We question the historical introduction point of the disease on the continent. Crucially, our results illustrate that even if ABC-RF inferences may sometimes fail to infer a single, well-supported scenario of invasion, they can be helpful in rejecting unlikely scenarios, which can prove much useful to shed light on disease dissemination routes.
Collapse
Affiliation(s)
- A. Gilabert
- Université de la Réunion, UMR PVBMTSaint‐PierreFrance
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
- Present address:
CIRAD, UMR AGAP InstitutMontpellierFrance
- Present address:
UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut AgroMontpellierFrance
| | - A. Rieux
- CIRAD, UMR PVBMTSaint‐PierreFrance
| | - S. Robert
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - R. Vitalis
- CBGPUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - M.‐F. Zapater
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - C. Abadie
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - J. Carlier
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| | - V. Ravigné
- CIRAD, UMR PHIMMontpellierFrance
- PHIM Plant Health InstituteUniv Montpellier, CIRAD, INRAE, Institut Agro, IRDMontpellierFrance
| |
Collapse
|
7
|
Järvenpää M, Corander J. On predictive inference for intractable models via approximate Bayesian computation. STATISTICS AND COMPUTING 2023; 33:42. [PMID: 36785730 PMCID: PMC9911513 DOI: 10.1007/s11222-022-10163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/02/2022] [Indexed: 06/18/2023]
Abstract
UNLABELLED Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based statistical models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10163-6.
Collapse
Affiliation(s)
- Marko Järvenpää
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), University of Helsinki, Helsinki, Finland
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
8
|
Martin GM, Frazier DT, Robert CP. Approximating Bayes in the 21st Century. Stat Sci 2023. [DOI: 10.1214/22-sts875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
- Gael M. Martin
- Gael M. Martin is Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | - David T. Frazier
- David T. Frazier is Associate Professor, Department of Econometrics and Business Statistics, Monash University, Melbourne, Australia
| | | |
Collapse
|
9
|
Dekermanjian JP, Shaddox E, Nandy D, Ghosh D, Kechris K. Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics. BMC Bioinformatics 2022; 23:179. [PMID: 35578165 PMCID: PMC9109373 DOI: 10.1186/s12859-022-04659-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 03/23/2022] [Indexed: 11/19/2022] Open
Abstract
When analyzing large datasets from high-throughput technologies, researchers often encounter missing quantitative measurements, which are particularly frequent in metabolomics datasets. Metabolomics, the comprehensive profiling of metabolite abundances, are typically measured using mass spectrometry technologies that often introduce missingness via multiple mechanisms: (1) the metabolite signal may be smaller than the instrument limit of detection; (2) the conditions under which the data are collected and processed may lead to missing values; (3) missing values can be introduced randomly. Missingness resulting from mechanism (1) would be classified as Missing Not At Random (MNAR), that from mechanism (2) would be Missing At Random (MAR), and that from mechanism (3) would be classified as Missing Completely At Random (MCAR). Two common approaches for handling missing data are the following: (1) omit missing data from the analysis; (2) impute the missing values. Both approaches may introduce bias and reduce statistical power in downstream analyses such as testing metabolite associations with clinical variables. Further, standard imputation methods in metabolomics often ignore the mechanisms causing missingness and inaccurately estimate missing values within a data set. We propose a mechanism-aware imputation algorithm that leverages a two-step approach in imputing missing values. First, we use a random forest classifier to classify the missing mechanism for each missing value in the data set. Second, we impute each missing value using imputation algorithms that are specific to the predicted missingness mechanism (i.e., MAR/MCAR or MNAR). Using complete data, we conducted simulations, where we imposed different missingness patterns within the data and tested the performance of combinations of imputation algorithms. Our proposed algorithm provided imputations closer to the original data than those using only one imputation algorithm for all the missing values. Consequently, our two-step approach was able to reduce bias for improved downstream analyses.
Collapse
Affiliation(s)
- Jonathan P Dekermanjian
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Elin Shaddox
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Debmalya Nandy
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
10
|
Cappello L, Kim J, Liu S, Palacios JA. Statistical Challenges in Tracking the Evolution of SARS-CoV-2. Stat Sci 2022; 37:162-182. [PMID: 36034090 PMCID: PMC9409356 DOI: 10.1214/22-sts853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.
Collapse
Affiliation(s)
- Lorenzo Cappello
- Departments of Economics and Business, Universitat Pompeu Fabra, 08005, Spain
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, New York 14853, USA\
| | - Sifan Liu
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Julia A Palacios
- Departments of Statistics and Biomedical Data Sciences, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
11
|
Approximate Bayesian computation using asymptotically normal point estimates. Comput Stat 2022. [DOI: 10.1007/s00180-022-01226-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
12
|
Raynal L, Chen S, Mira A, Onnela JP. Scalable Approximate Bayesian Computation for Growing Network Models via Extrapolated and Sampled Summaries. BAYESIAN ANALYSIS 2022; 17:165-192. [PMID: 36213769 PMCID: PMC9541316 DOI: 10.1214/20-ba1248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Approximate Bayesian computation (ABC) is a simulation-based likelihood-free method applicable to both model selection and parameter estimation. ABC parameter estimation requires the ability to forward simulate datasets from a candidate model, but because the sizes of the observed and simulated datasets usually need to match, this can be computationally expensive. Additionally, since ABC inference is based on comparisons of summary statistics computed on the observed and simulated data, using computationally expensive summary statistics can lead to further losses in efficiency. ABC has recently been applied to the family of mechanistic network models, an area that has traditionally lacked tools for inference and model choice. Mechanistic models of network growth repeatedly add nodes to a network until it reaches the size of the observed network, which may be of the order of millions of nodes. With ABC, this process can quickly become computationally prohibitive due to the resource intensive nature of network simulations and evaluation of summary statistics. We propose two methodological developments to enable the use of ABC for inference in models for large growing networks. First, to save time needed for forward simulating model realizations, we propose a procedure to extrapolate (via both least squares and Gaussian processes) summary statistics from small to large networks. Second, to reduce computation time for evaluating summary statistics, we use sample-based rather than census-based summary statistics. We show that the ABC posterior obtained through this approach, which adds two additional layers of approximation to the standard ABC, is similar to a classic ABC posterior. Although we deal with growing network models, both extrapolated summaries and sampled summaries are expected to be relevant in other ABC settings where the data are generated incrementally.
Collapse
Affiliation(s)
- Louis Raynal
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA, USA 02115
| | - Sixing Chen
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA, USA 02115
| | - Antonietta Mira
- Data Science Lab, Institute of Computational Science, Università della Svizzera italiana, Via Buffi 6, 6900 Lugano, Switzerland
- Dipartimento di Scienza e Alta Tecnologia, Università degli Studi dell’Insubria, Via Valleggio, 11 - 22100 Como, Italy
| | - Jukka-Pekka Onnela
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA, USA 02115
| |
Collapse
|
13
|
Dutta R, Zouaoui Boudjeltia K, Kotsalos C, Rousseau A, Ribeiro de Sousa D, Desmet JM, Van Meerhaeghe A, Mira A, Chopard B. Personalized pathology test for Cardio-vascular disease: Approximate Bayesian computation with discriminative summary statistics learning. PLoS Comput Biol 2022; 18:e1009910. [PMID: 35271585 PMCID: PMC8939803 DOI: 10.1371/journal.pcbi.1009910] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2022] [Accepted: 02/09/2022] [Indexed: 11/19/2022] Open
Abstract
Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. But recent studies show that the present pathology tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions and are incapable to consider inter-individual variability. Here we propose a stochastic platelet deposition model and an inferential scheme to estimate the biologically meaningful model parameters using approximate Bayesian computation with a summary statistic that maximally discriminates between different types of patients. Inferred parameters from data collected on healthy volunteers and different patient types help us to identify specific biological parameters and hence biological reasoning behind the dysfunction for each type of patients. This work opens up an unprecedented opportunity of personalized pathology test for CVD detection and medical treatment.
Collapse
Affiliation(s)
| | - Karim Zouaoui Boudjeltia
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | | | - Alexandre Rousseau
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | - Daniel Ribeiro de Sousa
- Laboratory of Experimental Medicine (ULB 222), Medicine Faculty, Université Libre de Bruxelles, ISPPC CHU de Charleroi, Charleroi, Belgium
| | - Jean-Marc Desmet
- Nephrology Department, ISPPC CHU de Charleroi, Charleroi, Belgium
| | | | - Antonietta Mira
- Università della Svizzera italiana, Lugano, Switzerland
- University of Insubria, Varese, Italy
| | | |
Collapse
|
14
|
Pray IW, Pizzitutti F, Bonnet G, Gonzales-Gustavson E, Wakeland W, Pan WK, Lambert WE, Gonzalez AE, Garcia HH, O’Neal SE. Validation of a spatial agent-based model for Taenia solium transmission ("CystiAgent") against a large prospective trial of control strategies in northern Peru. PLoS Negl Trop Dis 2021; 15:e0009885. [PMID: 34705827 PMCID: PMC8575314 DOI: 10.1371/journal.pntd.0009885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 11/08/2021] [Accepted: 10/08/2021] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The pork tapeworm (Taenia solium) is a parasitic helminth that imposes a major health and economic burden on poor rural populations around the world. As recognized by the World Health Organization, a key barrier for achieving control of T. solium is the lack of an accurate and validated simulation model with which to study transmission and evaluate available control and elimination strategies. CystiAgent is a spatially-explicit agent based model for T. solium that is unique among T. solium models in its ability to represent key spatial and environmental features of transmission and simulate spatially targeted interventions, such as ring strategy. METHODS/PRINCIPAL FINDINGS We validated CystiAgent against results from the Ring Strategy Trial (RST)-a large cluster-randomized trial conducted in northern Peru that evaluated six unique interventions for T. solium control in 23 villages. For the validation, each intervention strategy was replicated in CystiAgent, and the simulated prevalences of human taeniasis, porcine cysticercosis, and porcine seroincidence were compared against prevalence estimates from the trial. Results showed that CystiAgent produced declines in transmission in response to each of the six intervention strategies, but overestimated the effect of interventions in the majority of villages; simulated prevalences for human taenasis and porcine cysticercosis at the end of the trial were a median of 0.53 and 5.0 percentages points less than prevalence observed at the end of the trial, respectively. CONCLUSIONS/SIGNIFICANCE The validation of CystiAgent represented an important step towards developing an accurate and reliable T. solium transmission model that can be deployed to fill critical gaps in our understanding of T. solium transmission and control. To improve model accuracy, future versions would benefit from improved data on pig immunity and resistance, field effectiveness of anti-helminthic treatment, and factors driving spatial clustering of T. solium infections including dispersion and contact with T. solium eggs in the environment.
Collapse
Affiliation(s)
- Ian W. Pray
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, Oregon, United States of America
| | - Francesco Pizzitutti
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, Oregon, United States of America
| | - Gabrielle Bonnet
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, Oregon, United States of America
| | - Eloy Gonzales-Gustavson
- Tropical and Highlands Veterinary Research Institute, School of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, EL Mantaro, Peru
- School of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Wayne Wakeland
- Systems Science Program, Portland State University, Portland, Oregon, United States of America
| | - William K. Pan
- Duke Global Health Institute & Nicholas School of Environment, Duke University, Durham, North Carolina, United States of America
| | - William E. Lambert
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, Oregon, United States of America
| | - Armando E. Gonzalez
- School of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Hector H. Garcia
- School of Sciences, Department of Microbiology, Universidad Peruana Cayetano Heredia, Lima, Peru
- Center for Global Health Tumbes, Universidad Peruana Cayetano Heredia, Tumbes, Peru
| | - Seth E. O’Neal
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, Oregon, United States of America
- Center for Global Health Tumbes, Universidad Peruana Cayetano Heredia, Tumbes, Peru
| | | |
Collapse
|
15
|
Zhu B, Pei Y, Li C. An improved approximate Bayesian computation scheme for parameter inference based on a recalibration post-processing method. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1963456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Bin Zhu
- School of Computer Science and Technology, Tiangong University, Tianjin, China
| | - Yongzhen Pei
- School of Mathematical Sciences, Tiangong University, Tianjin, China
| | - Changguo Li
- Department of Basic Science, Military Traffic University, Tianjin, China
| |
Collapse
|
16
|
Dutta R, Gomes SN, Kalise D, Pacchiardi L. Using mobility data in the design of optimal lockdown strategies for the COVID-19 pandemic. PLoS Comput Biol 2021; 17:e1009236. [PMID: 34383756 PMCID: PMC8360388 DOI: 10.1371/journal.pcbi.1009236] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 07/02/2021] [Indexed: 01/29/2023] Open
Abstract
A mathematical model for the COVID-19 pandemic spread, which integrates age-structured Susceptible-Exposed-Infected-Recovered-Deceased dynamics with real mobile phone data accounting for the population mobility, is presented. The dynamical model adjustment is performed via Approximate Bayesian Computation. Optimal lockdown and exit strategies are determined based on nonlinear model predictive control, constrained to public-health and socio-economic factors. Through an extensive computational validation of the methodology, it is shown that it is possible to compute robust exit strategies with realistic reduced mobility values to inform public policy making, and we exemplify the applicability of the methodology using datasets from England and France. In many countries, the COVID-19 pandemic has revealed a gap between public policy making and the use of advanced technological tools to inform such a process. In the big data era, decisions concerning the implementation of quarantines and travel restrictions are still being taken based on incomplete public health data, despite the myriad of information our society provides in real time, such as mobility data, commuting network structures, and financial patterns, to name a few. To advance towards an effective data-driven, quantitative policy making, we propose a computational framework where a predictive epidemiological model is fitted by feeding both public health and Google mobility data. The resulting model is then used as a basis for designing mobility reduction strategies which are optimised taking into account both the healthcare system capacity, and the economic impact of an extended lockdown. For the COVID-19 pandemic in England and France, we show that it is possible to design lockdown policies allowing a partial return to workplaces and schools, while maintaining the epidemic under control.
Collapse
Affiliation(s)
- Ritabrata Dutta
- Department of Statistics, Warwick University, Coventry, United Kingdom
- * E-mail:
| | - Susana N. Gomes
- Department of Mathematics, Warwick University, Coventry, United Kingdom
| | - Dante Kalise
- School of Mathematical Sciences, University of Nottingham, Nottingham, United Kingdom
| | | |
Collapse
|
17
|
West TO, Berthouze L, Farmer SF, Cagnan H, Litvak V. Inference of brain networks with approximate Bayesian computation - assessing face validity with an example application in Parkinsonism. Neuroimage 2021; 236:118020. [PMID: 33839264 PMCID: PMC8270890 DOI: 10.1016/j.neuroimage.2021.118020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 03/16/2021] [Accepted: 03/21/2021] [Indexed: 11/21/2022] Open
Abstract
This paper describes and validates a novel framework using the Approximate Bayesian Computation (ABC) algorithm for parameter estimation and model selection in models of mesoscale brain network activity. We provide a proof of principle, first pass validation of this framework using a set of neural mass models of the cortico-basal ganglia thalamic circuit inverted upon spectral features from experimental, in vivo recordings. This optimization scheme relaxes an assumption of fixed-form posteriors (i.e. the Laplace approximation) taken in previous approaches to inverse modelling of spectral features. This enables the exploration of model dynamics beyond that approximated from local linearity assumptions and so fit to explicit, numerical solutions of the underlying non-linear system of equations. In this first paper, we establish a face validation of the optimization procedures in terms of: (i) the ability to approximate posterior densities over parameters that are plausible given the known causes of the data; (ii) the ability of the model comparison procedures to yield posterior model probabilities that can identify the model structure known to generate the data; and (iii) the robustness of these procedures to local minima in the face of different starting conditions. Finally, as an illustrative application we show (iv) that model comparison can yield plausible conclusions given the known neurobiology of the cortico-basal ganglia-thalamic circuit in Parkinsonism. These results lay the groundwork for future studies utilizing highly nonlinear or brittle models that can explain time dependant dynamics, such as oscillatory bursts, in terms of the underlying neural circuits.
Collapse
Affiliation(s)
- Timothy O West
- Nuffield Department of Clinical Neurosciences, Medical Sciences Division, University of Oxford, Oxford OX3 9DU, United Kingdom; Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford OX1 3TH, United Kingdom; Wellcome Trust Centre for Human Neuroimaging, UCL Institute of Neurology, Queen Square, London WC1N 3BG, United Kingdom.
| | - Luc Berthouze
- Centre for Computational Neuroscience and Robotics, University of Sussex, Falmer, United Kingdom; UCL Great Ormond Street Institute of Child Health, Guildford St., London WC1N 1EH, United Kingdom
| | - Simon F Farmer
- Department of Neurology, National Hospital for Neurology & Neurosurgery, Queen Square, London WC1N 3BG, United Kingdom; Department of Clinical and Movement Neurosciences, Institute of Neurology, Queen Square, UCL, London WC1N 3BG, United Kingdom
| | - Hayriye Cagnan
- Nuffield Department of Clinical Neurosciences, Medical Sciences Division, University of Oxford, Oxford OX3 9DU, United Kingdom; Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford OX1 3TH, United Kingdom; Wellcome Trust Centre for Human Neuroimaging, UCL Institute of Neurology, Queen Square, London WC1N 3BG, United Kingdom
| | - Vladimir Litvak
- Wellcome Trust Centre for Human Neuroimaging, UCL Institute of Neurology, Queen Square, London WC1N 3BG, United Kingdom
| |
Collapse
|
18
|
Hendry JA, Kwiatkowski D, McVean G. Elucidating relationships between P.falciparum prevalence and measures of genetic diversity with a combined genetic-epidemiological model of malaria. PLoS Comput Biol 2021; 17:e1009287. [PMID: 34411093 PMCID: PMC8407561 DOI: 10.1371/journal.pcbi.1009287] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 08/31/2021] [Accepted: 07/19/2021] [Indexed: 12/05/2022] Open
Abstract
There is an abundance of malaria genetic data being collected from the field, yet using these data to understand the drivers of regional epidemiology remains a challenge. A key issue is the lack of models that relate parasite genetic diversity to epidemiological parameters. Classical models in population genetics characterize changes in genetic diversity in relation to demographic parameters, but fail to account for the unique features of the malaria life cycle. In contrast, epidemiological models, such as the Ross-Macdonald model, capture malaria transmission dynamics but do not consider genetics. Here, we have developed an integrated model encompassing both parasite evolution and regional epidemiology. We achieve this by combining the Ross-Macdonald model with an intra-host continuous-time Moran model, thus explicitly representing the evolution of individual parasite genomes in a traditional epidemiological framework. Implemented as a stochastic simulation, we use the model to explore relationships between measures of parasite genetic diversity and parasite prevalence, a widely-used metric of transmission intensity. First, we explore how varying parasite prevalence influences genetic diversity at equilibrium. We find that multiple genetic diversity statistics are correlated with prevalence, but the strength of the relationships depends on whether variation in prevalence is driven by host- or vector-related factors. Next, we assess the responsiveness of a variety of statistics to malaria control interventions, finding that those related to mixed infections respond quickly (∼months) whereas other statistics, such as nucleotide diversity, may take decades to respond. These findings provide insights into the opportunities and challenges associated with using genetic data to monitor malaria epidemiology.
Collapse
Affiliation(s)
- Jason A. Hendry
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Dominic Kwiatkowski
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
19
|
Auzina IA, Tomczak JM. Approximate Bayesian Computation for Discrete Spaces. ENTROPY 2021; 23:e23030312. [PMID: 33800743 PMCID: PMC7998962 DOI: 10.3390/e23030312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 02/27/2021] [Accepted: 03/02/2021] [Indexed: 11/23/2022]
Abstract
Many real-life processes are black-box problems, i.e., the internal workings are inaccessible or a closed-form mathematical expression of the likelihood function cannot be defined. For continuous random variables, likelihood-free inference problems can be solved via Approximate Bayesian Computation (ABC). However, an optimal alternative for discrete random variables is yet to be formulated. Here, we aim to fill this research gap. We propose an adjusted population-based MCMC ABC method by re-defining the standard ABC parameters to discrete ones and by introducing a novel Markov kernel that is inspired by differential evolution. We first assess the proposed Markov kernel on a likelihood-based inference problem, namely discovering the underlying diseases based on a QMR-DTnetwork and, subsequently, the entire method on three likelihood-free inference problems: (i) the QMR-DT network with the unknown likelihood function, (ii) the learning binary neural network, and (iii) neural architecture search. The obtained results indicate the high potential of the proposed framework and the superiority of the new Markov kernel.
Collapse
|
20
|
Suzuki Y, Nakamura A, Milosevic M, Nomura K, Tanahashi T, Endo T, Sakoda S, Morasso P, Nomura T. Postural instability via a loss of intermittent control in elderly and patients with Parkinson's disease: A model-based and data-driven approach. CHAOS (WOODBURY, N.Y.) 2020; 30:113140. [PMID: 33261318 DOI: 10.1063/5.0022319] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 10/28/2020] [Indexed: 06/12/2023]
Abstract
Postural instability is one of the major symptoms of Parkinson's disease. Here, we assimilated a model of intermittent delay feedback control during quiet standing into postural sway data from healthy young and elderly individuals as well as patients with Parkinson's disease to elucidate the possible mechanisms of instability. Specifically, we estimated the joint probability distribution of a set of parameters in the model using the Bayesian parameter inference such that the model with the inferred parameters can best-fit sway data for each individual. It was expected that the parameter values for three populations would distribute differently in the parameter space depending on their balance capability. Because the intermittent control model is parameterized by a parameter associated with the degree of intermittency in the control, it can represent not only the intermittent model but also the traditional continuous control model with no intermittency. We showed that the inferred parameter values for the three groups of individuals are classified into two major groups in the parameter space: one represents the intermittent control mostly for healthy people and patients with mild postural symptoms and the other the continuous control mostly for some elderly and patients with severe postural symptoms. The results of this study may be interpreted by postulating that increased postural instability in most Parkinson's patients and some elderly persons might be characterized as a dynamical disease.
Collapse
Affiliation(s)
- Yasuyuki Suzuki
- Graduate School of Engineering Science, Osaka University, Osaka 5608531, Japan
| | - Akihiro Nakamura
- Graduate School of Engineering Science, Osaka University, Osaka 5608531, Japan
| | - Matija Milosevic
- Graduate School of Engineering Science, Osaka University, Osaka 5608531, Japan
| | - Kunihiko Nomura
- Department of Information Technology and Social Sciences, Osaka University of Economics, Osaka 5338533, Japan
| | - Takao Tanahashi
- Department of Neurology, Osaka Rosai Hospital, Osaka 5918025, Japan
| | - Takuyuki Endo
- Department of Neurology, Osaka Toneyama Medical Center, Osaka 5608552, Japan
| | - Saburo Sakoda
- Department of Neurology, Osaka Toneyama Medical Center, Osaka 5608552, Japan
| | - Pietro Morasso
- Center for Human Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Taishin Nomura
- Graduate School of Engineering Science, Osaka University, Osaka 5608531, Japan
| |
Collapse
|
21
|
Oesterle J, Behrens C, Schröder C, Hermann T, Euler T, Franke K, Smith RG, Zeck G, Berens P. Bayesian inference for biophysical neuron models enables stimulus optimization for retinal neuroprosthetics. eLife 2020; 9:e54997. [PMID: 33107821 PMCID: PMC7673784 DOI: 10.7554/elife.54997] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 10/26/2020] [Indexed: 01/02/2023] Open
Abstract
While multicompartment models have long been used to study the biophysics of neurons, it is still challenging to infer the parameters of such models from data including uncertainty estimates. Here, we performed Bayesian inference for the parameters of detailed neuron models of a photoreceptor and an OFF- and an ON-cone bipolar cell from the mouse retina based on two-photon imaging data. We obtained multivariate posterior distributions specifying plausible parameter ranges consistent with the data and allowing to identify parameters poorly constrained by the data. To demonstrate the potential of such mechanistic data-driven neuron models, we created a simulation environment for external electrical stimulation of the retina and optimized stimulus waveforms to target OFF- and ON-cone bipolar cells, a current major problem of retinal neuroprosthetics.
Collapse
Affiliation(s)
- Jonathan Oesterle
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
| | - Christian Behrens
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
| | - Cornelius Schröder
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
| | - Thoralf Hermann
- Naturwissenschaftliches und Medizinisches Institut an der Universität TübingenReutlingenGermany
| | - Thomas Euler
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Center for Integrative Neuroscience, University of TübingenTübingenGermany
- Bernstein Center for Computational Neuroscience, University of TübingenTübingenGermany
| | - Katrin Franke
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Bernstein Center for Computational Neuroscience, University of TübingenTübingenGermany
| | - Robert G Smith
- Department of Neuroscience, University of PennsylvaniaPhiladelphiaUnited States
| | - Günther Zeck
- Naturwissenschaftliches und Medizinisches Institut an der Universität TübingenReutlingenGermany
| | - Philipp Berens
- Institute for Ophthalmic Research, University of TübingenTübingenGermany
- Center for Integrative Neuroscience, University of TübingenTübingenGermany
- Bernstein Center for Computational Neuroscience, University of TübingenTübingenGermany
- Institute for Bioinformatics and Medical Informatics, University of TübingenTübingenGermany
| |
Collapse
|
22
|
Memory Alone Does Not Account for the Way Rats Learn a Simple Spatial Alternation Task. J Neurosci 2020; 40:7311-7317. [PMID: 32753514 PMCID: PMC7534917 DOI: 10.1523/jneurosci.0972-20.2020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/01/2020] [Accepted: 07/08/2020] [Indexed: 01/21/2023] Open
Abstract
Animal behavior provides context for understanding disease models and physiology. However, that behavior is often characterized subjectively, creating opportunity for misinterpretation and misunderstanding. For example, spatial alternation tasks are treated as paradigmatic tools for examining memory; however, that link is actually an assumption. To test this assumption, we simulated a reinforcement learning (RL) agent equipped with a perfect memory process. We found that it learns a simple spatial alternation task more slowly and makes different errors than a group of male rats, illustrating that memory alone may not be sufficient to capture the behavior. We demonstrate that incorporating spatial biases permits rapid learning and enables the model to fit rodent behavior accurately. Our results suggest that even simple spatial alternation behaviors reflect multiple cognitive processes that need to be taken into account when studying animal behavior.SIGNIFICANCE STATEMENT Memory is a critical function for cognition whose impairment has significant clinical consequences. Experimental systems aimed at testing various sorts of memory are therefore also central. However, experimental designs to test memory are typically based on intuition about the underlying processes. We tested this using a popular behavioral paradigm: a spatial alternation task. Using behavioral modeling, we show that the straightforward intuition that these tasks just probe spatial memory fails to account for the speed at which rats learn or the types of errors they make. Only when memory-independent dynamic spatial preferences are added can the model learn like the rats. This highlights the importance of respecting the complexity of animal behavior to interpret neural function and validate disease models.
Collapse
|
23
|
Pray IW, Wakeland W, Pan W, Lambert WE, Garcia HH, Gonzalez AE, O'Neal SE. Understanding transmission and control of the pork tapeworm with CystiAgent: a spatially explicit agent-based model. Parasit Vectors 2020; 13:372. [PMID: 32709250 PMCID: PMC7379812 DOI: 10.1186/s13071-020-04226-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 07/14/2020] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The pork tapeworm, Taenia solium, is a serious public health problem in rural low-resource areas of Latin America, Africa and Asia, where the associated conditions of nuerocysticercosis (NCC) and porcine cysticercosis cause substantial health and economic harms. An accurate and validated transmission model for T. solium would serve as an important new tool for control and elimination, as it would allow for comparison of available intervention strategies, and prioritization of the most effective strategies for control and elimination efforts. METHODS We developed a spatially-explicit agent-based model (ABM) for T. solium ("CystiAgent") that differs from prior T. solium models by including a spatial framework and behavioral parameters such as pig roaming, open human defecation, and human travel. In this article, we introduce the structure and function of the model, describe the data sources used to parameterize the model, and apply sensitivity analyses (Latin hypercube sampling-partial rank correlation coefficient (LHS-PRCC)) to evaluate model parameters. RESULTS LHS-PRCC analysis of CystiAgent found that the parameters with the greatest impact on model uncertainty were the roaming range of pigs, the infectious duration of human taeniasis, use of latrines, and the set of "tuning" parameters defining the probabilities of infection in humans and pigs given exposure to T. solium. CONCLUSIONS CystiAgent is a novel ABM that has the ability to model spatial and behavioral features of T. solium transmission not available in other models. There is a small set of impactful model parameters that contribute uncertainty to the model and may impact the accuracy of model projections. Field and laboratory studies to better understand these key components of transmission may help reduce uncertainty, while current applications of CystiAgent may consider calibration of these parameters to improve model performance. These results will ultimately allow for improved interpretation of model validation results, and usage of the model to compare available control and elimination strategies for T. solium.
Collapse
Affiliation(s)
- Ian W Pray
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, OR, USA.
| | - Wayne Wakeland
- Systems Science Program, Portland State University, Portland, OR, USA
| | - William Pan
- Global Health Institute, Duke University, Durham, NC, USA
| | - William E Lambert
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, OR, USA
| | - Hector H Garcia
- School of Sciences, Department of Microbiology, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Armando E Gonzalez
- School of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Seth E O'Neal
- School of Public Health, Oregon Health & Science University and Portland State University, Portland, OR, USA
| | | |
Collapse
|
24
|
Hazelbag CM, Dushoff J, Dominic EM, Mthombothi ZE, Delva W. Calibration of individual-based models to epidemiological data: A systematic review. PLoS Comput Biol 2020; 16:e1007893. [PMID: 32392252 PMCID: PMC7241852 DOI: 10.1371/journal.pcbi.1007893] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 05/21/2020] [Accepted: 04/21/2020] [Indexed: 01/24/2023] Open
Abstract
Individual-based models (IBMs) informing public health policy should be calibrated to data and provide estimates of uncertainty. Two main components of model-calibration methods are the parameter-search strategy and the goodness-of-fit (GOF) measure; many options exist for each of these. This review provides an overview of calibration methods used in IBMs modelling infectious disease spread. We identified articles on PubMed employing simulation-based methods to calibrate IBMs informing public health policy in HIV, tuberculosis, and malaria epidemiology published between 1 January 2013 and 31 December 2018. Articles were included if models stored individual-specific information, and calibration involved comparing model output to population-level targets. We extracted information on parameter-search strategies, GOF measures, and model validation. The PubMed search identified 653 candidate articles, of which 84 met the review criteria. Of the included articles, 40 (48%) combined a quantitative GOF measure with an algorithmic parameter-search strategy–either an optimisation algorithm (14/40) or a sampling algorithm (26/40). These 40 articles varied widely in their choices of parameter-search strategies and GOF measures. For the remaining 44 (52%) articles, the parameter-search strategy could either not be identified (32/44) or was described as an informal, non-reproducible method (12/44). Of these 44 articles, the majority (25/44) were unclear about the GOF measure used; of the rest, only five quantitatively evaluated GOF. Only a minority of the included articles, 14 (17%) provided a rationale for their choice of model-calibration method. Model validation was reported in 31 (37%) articles. Reporting on calibration methods is far from optimal in epidemiological modelling studies of HIV, malaria and TB transmission dynamics. The adoption of better documented, algorithmic calibration methods could improve both reproducibility and the quality of inference in model-based epidemiology. There is a need for research comparing the performance of calibration methods to inform decisions about the parameter-search strategies and GOF measures. Calibration—that is, “fitting” the model to data—is a crucial part of using mathematical models to better forecast and control the population-level spread of infectious diseases. Evidence that the mathematical model is well-calibrated improves confidence that the model provides a realistic picture of the consequences of health policy decisions. To make informed decisions, Policymakers need information about uncertainty: i.e., what is the range of likely outcomes (rather than just a single prediction). Thus, modellers should also strive to provide accurate measurements of uncertainty, both for their model parameters and for their predictions. This systematic review provides an overview of the methods used to calibrate individual-based models (IBMs) of the spread of HIV, malaria, and tuberculosis. We found that less than half of the reviewed articles used reproducible, non-subjective calibration methods. For the remaining articles, the method could either not be identified or was described as an informal, non-reproducible method. Only one-third of the articles obtained estimates of parameter uncertainty. We conclude that the adoption of better-documented, algorithmic calibration methods could improve both reproducibility and the quality of inference in model-based epidemiology.
Collapse
Affiliation(s)
- C. Marijn Hazelbag
- South African DSI-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
- * E-mail:
| | - Jonathan Dushoff
- South African DSI-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
- Department of Biology, Department of Mathematics and Statistics, Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Emanuel M. Dominic
- South African DSI-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
| | - Zinhle E. Mthombothi
- South African DSI-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
| | - Wim Delva
- South African DSI-NRF Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
- School for Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
- Center for Statistics, I-BioStat, Hasselt University, Diepenbeek, Belgium
- Department of Global Health, Faculty of Medicine and Health, Stellenbosch University, Stellenbosch, South Africa
- International Centre for Reproductive Health, Ghent University, Ghent, Belgium
- Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| |
Collapse
|
25
|
Chen S, Mira A, Onnela JP. Flexible model selection for mechanistic network models. JOURNAL OF COMPLEX NETWORKS 2020; 8:cnz024. [PMID: 32765880 PMCID: PMC7391990 DOI: 10.1093/comnet/cnz024] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 06/24/2019] [Indexed: 05/25/2023]
Abstract
Network models are applied across many domains where data can be represented as a network. Two prominent paradigms for modelling networks are statistical models (probabilistic models for the observed network) and mechanistic models (models for network growth and/or evolution). Mechanistic models are better suited for incorporating domain knowledge, to study effects of interventions (such as changes to specific mechanisms) and to forward simulate, but they typically have intractable likelihoods. As such, and in a stark contrast to statistical models, there is a relative dearth of research on model selection for such models despite the otherwise large body of extant work. In this article, we propose a simulator-based procedure for mechanistic network model selection that borrows aspects from Approximate Bayesian Computation along with a means to quantify the uncertainty in the selected model. To select the most suitable network model, we consider and assess the performance of several learning algorithms, most notably the so-called Super Learner, which makes our framework less sensitive to the choice of a particular learning algorithm. Our approach takes advantage of the ease to forward simulate from mechanistic network models to circumvent their intractable likelihoods. The overall process is flexible and widely applicable. Our simulation results demonstrate the approach's ability to accurately discriminate between competing mechanistic models. Finally, we showcase our approach with a protein-protein interaction network model from the literature for yeast (Saccharomyces cerevisiae).
Collapse
Affiliation(s)
- Sixing Chen
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University 655 Huntington Avenue, Building 2, 4th Floor, Boston, MA 02115, USA
| | - Antonietta Mira
- Data Science Lab, Institute of Computational Science, Università della Svizzera italiana Via Buffi 6, 6900 Lugano, Switzerland and Dipartimento di Scienza e Alta Tecnologia, Università degli Studi dell'Insubria Via Valleggio, 11 - 22100 Como, Italy
| | | |
Collapse
|
26
|
Distance-learning For Approximate Bayesian Computation To Model a Volcanic Eruption. SANKHYA B 2020. [DOI: 10.1007/s13571-019-00208-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
AbstractApproximate Bayesian computation (ABC) provides us with a way to infer parameters of models, for which the likelihood function is not available, from an observation. Using ABC, which depends on many simulations from the considered model, we develop an inferential framework to learn parameters of a stochastic numerical simulator of volcanic eruption. Moreover, the model itself is parallelized using Message Passing Interface (MPI). Thus, we develop a nested-parallelized MPI communicator to handle the expensive numerical model with ABC algorithms. ABC usually relies on summary statistics of the data in order to measure the discrepancy model output and observation. However, informative summary statistics cannot be found for the considered model. We therefore develop a technique to learn a distance between model outputs based on deep metric-learning. We use this framework to learn the plume characteristics (eg. initial plume velocity) of the volcanic eruption from the tephra deposits collected by field-work associated with the 2450 BP Pululagua (Ecuador) volcanic eruption.
Collapse
|
27
|
Buckwar E, Tamborrino M, Tubikanec I. Spectral density-based and measure-preserving ABC for partially observed diffusion processes. An illustration on Hamiltonian SDEs. STATISTICS AND COMPUTING 2020; 30:627-648. [PMID: 32132771 PMCID: PMC7026277 DOI: 10.1007/s11222-019-09909-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Accepted: 10/17/2019] [Indexed: 05/15/2023]
Abstract
Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise: First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler-Maruyama discretisation) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.
Collapse
Affiliation(s)
- Evelyn Buckwar
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| | - Massimiliano Tamborrino
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| | - Irene Tubikanec
- Institute for Stochastics, Johannes Kepler University Linz, Altenberger Straße 69, 4040 Linz, Austria
| |
Collapse
|
28
|
Kokko J, Remes U, Thomas O, Pesonen H, Corander J. PYLFIRE: Python implementation of likelihood-free inference by ratio estimation. Wellcome Open Res 2019; 4:197. [PMID: 32133422 PMCID: PMC7041362 DOI: 10.12688/wellcomeopenres.15583.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/13/2019] [Indexed: 11/21/2022] Open
Abstract
Likelihood-free inference for simulator-based models is an emerging methodological branch of statistics which has attracted considerable attention in applications across diverse fields such as population genetics, astronomy and economics. Recently, the power of statistical classifiers has been harnessed in likelihood-free inference to obtain either point estimates or even posterior distributions of model parameters. Here we introduce PYLFIRE, an open-source Python implementation of the inference method LFIRE (likelihood-free inference by ratio estimation) that uses penalised logistic regression. PYLFIRE is made available as part of the general ELFI inference software http://elfi.ai to benefit both the user and developer communities for likelihood-free inference.
Collapse
Affiliation(s)
- Jan Kokko
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Ulpu Remes
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Owen Thomas
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Henri Pesonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
29
|
Filipe JA, Kyriazakis I. Bayesian, Likelihood-Free Modelling of Phenotypic Plasticity and Variability in Individuals and Populations. Front Genet 2019; 10:727. [PMID: 31616460 PMCID: PMC6764410 DOI: 10.3389/fgene.2019.00727] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 07/11/2019] [Indexed: 12/17/2022] Open
Abstract
There is a paradigm shift from the traditional focus on the "average" individual towards the definition and analysis of trait variation within individual life-history and among individuals in populations. This is a result of increasing availability of individual phenotypic data. The shift allows the use of genetic and environment-driven variations to assess robustness to challenge, gain greater understanding of organismal biological processes, or deliver individual-targeted treatments or genetic selection. These consequences apply, in particular, to variation in ontogenetic growth. We propose an approach to parameterise mathematical models of individual traits (e.g., reaction norms, growth curves) that address two challenges: 1) Estimation of individual traits while making minimal assumptions about data distribution and correlation, addressed via Approximate Bayesian Computation (a form of nonparametric inference). We are motivated by the fact that available information on distribution of biological data is often less precise than assumed by conventional likelihood functions. 2) Scaling-up to population phenotype distributions while facilitating unbiased use of individual data; this is addressed via a probabilistic framework where population distributions build on separately-inferred individual distributions and individual-trait interpretability is preserved. The approach is tested against Bayesian likelihood-based inference, by fitting weight and energy intake growth models to animal data and normal- and skewed-distributed simulated data. i) Individual inferences were accurate and robust to changes in data distribution and sample size; in particular, median-based predictions were more robust than maximum- likelihood-based curves. These results suggest that the approach gives reliable inferences using few observations and monitoring resources. ii) At the population level, each individual contributed via a specific data distribution, and population phenotype estimates were not disproportionally influenced by outlier individuals. Indices measuring population phenotype variation can be derived for study comparisons. The approach offers an alternative for estimating trait variability in biological systems that may be reliable for various applications, for example, in genetics, health, and individualised nutrition, while using fewer assumptions and fewer empirical observations. In livestock breeding, the potentially greater accuracy of trait estimation (without specification of multitrait variance-covariance parameters) could lead to improved selection and to more decisive estimates of trait heritability.
Collapse
Affiliation(s)
- Joao A.N. Filipe
- Agriculture, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | | |
Collapse
|
30
|
Lintusaari J, Blomstedt P, Rose B, Sivula T, Gutmann MU, Kaski S, Corander J. Resolving outbreak dynamics using approximate Bayesian computation for stochastic birth-death models. Wellcome Open Res 2019; 4:14. [PMID: 37744419 PMCID: PMC10514576 DOI: 10.12688/wellcomeopenres.15048.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/16/2019] [Indexed: 09/26/2023] Open
Abstract
Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters, such as the reproductive number R, may remain poorly identifiable with these models. Here we show that this identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with its own distinct dynamics and clear epidemiological interpretation. We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a byproduct of the inference, the model provides an estimate of the infectious population size at the time the data were collected. The acquired estimate is approximately two orders of magnitude smaller than assumed in earlier related studies, and it is much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three times larger than previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.
Collapse
Affiliation(s)
- Jarno Lintusaari
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Paul Blomstedt
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Brittany Rose
- Department of Infectious Diseases Epidemiology and Modelling, Norwegian Institute of Public Health, Oslo, Norway
- Helsinki Institute for Information Technology (HIIT), Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Tuomas Sivula
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | | | - Samuel Kaski
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Jukka Corander
- Helsinki Institute for Information Technology (HIIT), Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Infection Genomics, The Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
31
|
Kangasrääsiö A, Jokinen JPP, Oulasvirta A, Howes A, Kaski S. Parameter Inference for Computational Cognitive Models with Approximate Bayesian Computation. Cogn Sci 2019; 43:e12738. [PMID: 31204797 PMCID: PMC6593436 DOI: 10.1111/cogs.12738] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 04/09/2019] [Accepted: 04/11/2019] [Indexed: 11/28/2022]
Abstract
This paper addresses a common challenge with computational cognitive models: identifying parameter values that are both theoretically plausible and generate predictions that match well with empirical data. While computational models can offer deep explanations of cognition, they are computationally complex and often out of reach of traditional parameter fitting methods. Weak methodology may lead to premature rejection of valid models or to acceptance of models that might otherwise be falsified. Mathematically robust fitting methods are, therefore, essential to the progress of computational modeling in cognitive science. In this article, we investigate the capability and role of modern fitting methods—including Bayesian optimization and approximate Bayesian computation—and contrast them to some more commonly used methods: grid search and Nelder–Mead optimization. Our investigation consists of a reanalysis of the fitting of two previous computational models: an Adaptive Control of Thought—Rational model of skill acquisition and a computational rationality model of visual search. The results contrast the efficiency and informativeness of the methods. A key advantage of the Bayesian methods is the ability to estimate the uncertainty of fitted parameter values. We conclude that approximate Bayesian computation is (a) efficient, (b) informative, and (c) offers a path to reproducible results.
Collapse
Affiliation(s)
| | | | | | - Andrew Howes
- School of Computer Science, University of Birmingham
| | - Samuel Kaski
- Department of Computer Science, Aalto University
| |
Collapse
|
32
|
Sard N, Robinson J, Kanefsky J, Herbst S, Scribner K. Coalescent models characterize sources and demographic history of recent round goby colonization of Great Lakes and inland waters. Evol Appl 2019; 12:1034-1049. [PMID: 31080513 PMCID: PMC6503821 DOI: 10.1111/eva.12779] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 01/15/2019] [Indexed: 12/25/2022] Open
Abstract
The establishment and spread of aquatic invasive species are ecologically and economically harmful and a source of conservation concern internationally. Processes of species invasion have traditionally been inferred from observational data of species presence/absence and relative abundance. However, genetic-based approaches can provide valuable sources of inference. Restriction site-associated DNA sequencing was used to identify and genotype single nucleotide polymorphism (SNP) loci for Round Gobies (Neogobius melanostomus) (N = 440) from 18 sampling locations in the Great Lakes and in three Michigan, USA, drainages (Flint, Au Sable, and Cheboygan River basins). Sampled rivers differed in size, accessibility, and physical characteristics including man-made dispersal barriers. Population levels of genetic diversity and interpopulation variance in SNP allele frequency were used in coalescence-based approximate Bayesian computation (ABC) to statistically compare models representing competing hypotheses regarding source population, postcolonization dispersal, and demographic history in the Great Lakes and inland waters. Results indicate different patterns of colonization across the three drainages. In the Flint River, models indicate a strong population bottleneck (<3% of contemporary effective population size) and a single founding event from Saginaw Bay led to the colonization of inland river segments. In the Au Sable River, analyses could not distinguish potential source populations, but supported models indicated multiple introductions from one source population. In the Cheboygan River, supported models indicated that colonization likely proceeded from east (Lake Huron source) to west among inland locales sampled in the system. Despite the recent occupancy of Great Lakes and inland habitats, large numbers of loci analyzed in an ABC framework enable statistically supported identification of source populations and reconstruction of the direction of inland spread and demographic history following establishment. Information from analyses can direct management actions to limit the spread of invasive species from identified sources and most probable vectors into additional inland aquatic habitats.
Collapse
Affiliation(s)
- Nicholas Sard
- Department of Fisheries and WildlifeMichigan State UniversityEast LansingMichigan
- Present address:
Biology DepartmentSUNY OswegoOswegoNew York
| | - John Robinson
- Department of Fisheries and WildlifeMichigan State UniversityEast LansingMichigan
| | - Jeannette Kanefsky
- Department of Fisheries and WildlifeMichigan State UniversityEast LansingMichigan
| | - Seth Herbst
- Michigan Department of Natural ResourcesEast LansingMichigan
| | - Kim Scribner
- Department of Fisheries and WildlifeMichigan State UniversityEast LansingMichigan
- Department of Integrative BiologyMichigan State UniversityEast LansingMichigan
| |
Collapse
|
33
|
Järvenpää M, Sater MRA, Lagoudas GK, Blainey PC, Miller LG, McKinnell JA, Huang SS, Grad YH, Marttinen P. A Bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation. PLoS Comput Biol 2019; 15:e1006534. [PMID: 31009452 PMCID: PMC6497309 DOI: 10.1371/journal.pcbi.1006534] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 05/02/2019] [Accepted: 02/22/2019] [Indexed: 11/19/2022] Open
Abstract
Bacterial populations that colonize a host can play important roles in host health, including serving as a reservoir that transmits to other hosts and from which invasive strains emerge, thus emphasizing the importance of understanding rates of acquisition and clearance of colonizing populations. Studies of colonization dynamics have been based on assessment of whether serial samples represent a single population or distinct colonization events. With the use of whole genome sequencing to determine genetic distance between isolates, a common solution to estimate acquisition and clearance rates has been to assume a fixed genetic distance threshold below which isolates are considered to represent the same strain. However, this approach is often inadequate to account for the diversity of the underlying within-host evolving population, the time intervals between consecutive measurements, and the uncertainty in the estimated acquisition and clearance rates. Here, we present a fully Bayesian model that provides probabilities of whether two strains should be considered the same, allowing us to determine bacterial clearance and acquisition from genomes sampled over time. Our method explicitly models the within-host variation using population genetic simulation, and the inference is done using a combination of Approximate Bayesian Computation (ABC) and Markov Chain Monte Carlo (MCMC). We validate the method with multiple carefully conducted simulations and demonstrate its use in practice by analyzing a collection of methicillin resistant Staphylococcus aureus (MRSA) isolates from a large recently completed longitudinal clinical study. An R-code implementation of the method is freely available at: https://github.com/mjarvenpaa/bacterial-colonization-model.
Collapse
Affiliation(s)
- Marko Järvenpää
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
- Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Mohamad R. Abdul Sater
- Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Georgia K. Lagoudas
- Department of Biological Engineering, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Paul C. Blainey
- Department of Biological Engineering, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Loren G. Miller
- Infectious Disease Clinical Outcomes Research Unit, Division of Infectious Diseases, LA Biomed Research Institute at Harbor–UCLA Medical Center, Torrance, CA, USA
| | - James A. McKinnell
- Infectious Disease Clinical Outcomes Research Unit, Division of Infectious Diseases, LA Biomed Research Institute at Harbor–UCLA Medical Center, Torrance, CA, USA
| | - Susan S. Huang
- Division of Infectious Diseases and Health Policy Research Institute, University of California, Irvine School of Medicine, Irvine, CA, USA
| | - Yonatan H. Grad
- Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
34
|
Moens V, Zénon A. Learning and forgetting using reinforced Bayesian change detection. PLoS Comput Biol 2019; 15:e1006713. [PMID: 30995214 PMCID: PMC6488101 DOI: 10.1371/journal.pcbi.1006713] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 04/29/2019] [Accepted: 12/09/2018] [Indexed: 12/17/2022] Open
Abstract
Agents living in volatile environments must be able to detect changes in contingencies while refraining to adapt to unexpected events that are caused by noise. In Reinforcement Learning (RL) frameworks, this requires learning rates that adapt to past reliability of the model. The observation that behavioural flexibility in animals tends to decrease following prolonged training in stable environment provides experimental evidence for such adaptive learning rates. However, in classical RL models, learning rate is either fixed or scheduled and can thus not adapt dynamically to environmental changes. Here, we propose a new Bayesian learning model, using variational inference, that achieves adaptive change detection by the use of Stabilized Forgetting, updating its current belief based on a mixture of fixed, initial priors and previous posterior beliefs. The weight given to these two sources is optimized alongside the other parameters, allowing the model to adapt dynamically to changes in environmental volatility and to unexpected observations. This approach is used to implement the "critic" of an actor-critic RL model, while the actor samples the resulting value distributions to choose which action to undertake. We show that our model can emulate different adaptation strategies to contingency changes, depending on its prior assumptions of environmental stability, and that model parameters can be fit to real data with high accuracy. The model also exhibits trade-offs between flexibility and computational costs that mirror those observed in real data. Overall, the proposed method provides a general framework to study learning flexibility and decision making in RL contexts.
Collapse
Affiliation(s)
- Vincent Moens
- CoAction Lab, Institue of Neuroscience, Université Catholique de Louvain, Bruxelles, Belgium
| | - Alexandre Zénon
- CoAction Lab, Institue of Neuroscience, Université Catholique de Louvain, Bruxelles, Belgium
- INCIA, Université de Bordeaux, Bordeaux, France
| |
Collapse
|
35
|
Lintusaari J, Blomstedt P, Sivula T, Gutmann MU, Kaski S, Corander J. Resolving outbreak dynamics using approximate Bayesian computation for stochastic birth-death models. Wellcome Open Res 2019. [DOI: 10.12688/wellcomeopenres.15048.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Earlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth-death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters such as the reproductive number R may remain poorly identifiable with these models. Here we show that the identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case-study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with their distinct dynamics and clear epidemiological interpretation. We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a by-product of the inference, the model provides an estimate of the infectious population size at the time the data was collected. The acquired estimate is approximately two orders of magnitude smaller compared to the assumptions made in the earlier related studies, and much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three-fold compared with the previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.
Collapse
|
36
|
Shen P, Lees JA, Bee GCW, Brown SP, Weiser JN. Pneumococcal quorum sensing drives an asymmetric owner-intruder competitive strategy during carriage via the competence regulon. Nat Microbiol 2018; 4:198-208. [PMID: 30546100 DOI: 10.1038/s41564-018-0314-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 10/30/2018] [Indexed: 11/09/2022]
Abstract
Competition among microorganisms is a key determinant of successful host colonization and persistence. For Streptococcus pneumoniae, lower than predicted rates of co-colonizing strains suggest a competitive advantage for resident bacteria over newcomers. In light of evolutionary theory, we hypothesized that S. pneumoniae use owner-intruder asymmetries to settle contests, leading to the disproportionate success of the initial resident 'owner', regardless of the genetic identity of the 'intruder'. We investigated the determinants of within-host competitive success utilizing S. pneumoniae colonization of the upper respiratory tract of infant mice. Within 6 h, colonization by the resident inhibited colonization by an isogenic challenger. The competitive advantage of the resident was dependent on quorum sensing via the competence (Com) regulon and downstream choline binding protein D (CbpD) and on the competence-induced bacteriocins A and B (CibAB) implicated in fratricide. CbpD and CibAB are highly conserved across pneumococcal lineages, indicating evolutionary advantages for asymmetric competitive strategies within the species. Mathematical modelling supported a significant role for quorum sensing via the Com regulon in competition, even for strains with different competitive advantages. Our study suggests that asymmetric owner-intruder competitive strategies do not require complex cognition and are used by a major human pathogen to determine 'ownership' of human hosts.
Collapse
Affiliation(s)
- Pamela Shen
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - John A Lees
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - Gavyn Chern Wei Bee
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - Sam P Brown
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jeffrey N Weiser
- Department of Microbiology, New York University School of Medicine, New York, NY, USA.
| |
Collapse
|
37
|
Järvenpää M, Gutmann MU, Vehtari A, Marttinen P. Gaussian process modelling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Ann Appl Stat 2018. [DOI: 10.1214/18-aoas1150] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
38
|
Dutta R, Brotzakis ZF, Mira A. Bayesian calibration of force-fields from experimental data: TIP4P water. J Chem Phys 2018; 149:154110. [DOI: 10.1063/1.5030950] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Affiliation(s)
- Ritabrata Dutta
- Institute of Computational Science, Università della Svizzera Italiana, Lugano, Switzerland
| | - Zacharias Faidon Brotzakis
- Institute of Computational Science, Università della Svizzera Italiana, Lugano, Switzerland
- Department of Chemistry and Applied Bioscience, ETH Zürich, Zürich, Switzerland
| | - Antonietta Mira
- Institute of Computational Science, Università della Svizzera Italiana, Lugano, Switzerland
- Department of Science and High Technology, Università degli Studi dell’Insubria, Varese, Italy
| |
Collapse
|
39
|
Xu Y, Puranen S, Corander J, Kabashima Y. Inverse finite-size scaling for high-dimensional significance analysis. Phys Rev E 2018; 97:062112. [PMID: 30011500 DOI: 10.1103/physreve.97.062112] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Indexed: 11/07/2022]
Abstract
We propose an efficient procedure for significance determination in high-dimensional dependence learning based on surrogate data testing, termed inverse finite-size scaling (IFSS). The IFSS method is based on our discovery of a universal scaling property of random matrices which enables inference about signal behavior from much smaller scale surrogate data than the dimensionality of the original data. As a motivating example, we demonstrate the procedure for ultra-high-dimensional Potts models with order of 10^{10} parameters. IFSS reduces the computational effort of the data-testing procedure by several orders of magnitude, making it very efficient for practical purposes. This approach thus holds considerable potential for generalization to other types of complex models.
Collapse
Affiliation(s)
- Yingying Xu
- Department of Computer Science, School of Science, Aalto University, 00076 Espoo, Finland.,Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland
| | - Santeri Puranen
- Department of Computer Science, School of Science, Aalto University, 00076 Espoo, Finland.,Department of Computer Science, University of Helsinki, 00014 Helsinki, Finland.,Department of Biostatistics, University of Oslo, 0317 Oslo, Norway
| | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland.,Department of Biostatistics, University of Oslo, 0317 Oslo, Norway
| | - Yoshiyuki Kabashima
- Department of Mathematical and Computing Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8552, Japan
| |
Collapse
|
40
|
Dutta R, Mira A, Onnela JP. Bayesian inference of spreading processes on networks. Proc Math Phys Eng Sci 2018; 474:20180129. [PMID: 30100809 PMCID: PMC6083242 DOI: 10.1098/rspa.2018.0129] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 06/19/2018] [Indexed: 01/18/2023] Open
Abstract
Infectious diseases are studied to understand their spreading mechanisms, to evaluate control strategies and to predict the risk and course of future outbreaks. Because people only interact with few other individuals, and the structure of these interactions influence spreading processes, the pairwise relationships between individuals can be usefully represented by a network. Although the underlying transmission processes are different, the network approach can be used to study the spread of pathogens in a contact network or the spread of rumours in a social network. We study simulated simple and complex epidemics on synthetic networks and on two empirical networks, a social/contact network in an Indian village and an online social network. Our goal is to learn simultaneously the spreading process parameters and the first infected node, given a fixed network structure and the observed state of nodes at several time points. Our inference scheme is based on approximate Bayesian computation, a likelihood-free inference technique. Our method is agnostic about the network topology and the spreading process. It generally performs well and, somewhat counter-intuitively, the inference problem appears to be easier on more heterogeneous network topologies, which enhances its future applicability to real-world settings where few networks have homogeneous topologies.
Collapse
Affiliation(s)
- Ritabrata Dutta
- Institute of Computational Science, Università della Svizzera italiana, Lugano, Switzerland
| | - Antonietta Mira
- Institute of Computational Science, Università della Svizzera italiana, Lugano, Switzerland
- Department of Science and High Technology, Università degli Studi dell'Insubria, Varese, Italy
| | | |
Collapse
|
41
|
Karabatsos G, Leisen F. An approximate likelihood perspective on ABC methods. STATISTICS SURVEYS 2018. [DOI: 10.1214/18-ss120] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
42
|
Corander J, Fraser C, Gutmann MU, Arnold B, Hanage WP, Bentley SD, Lipsitch M, Croucher NJ. Frequency-dependent selection in vaccine-associated pneumococcal population dynamics. Nat Ecol Evol 2017; 1:1950-1960. [PMID: 29038424 PMCID: PMC5708525 DOI: 10.1038/s41559-017-0337-x] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 09/01/2017] [Indexed: 12/21/2022]
Abstract
Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.
Collapse
Affiliation(s)
- Jukka Corander
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014, Helsinki, Finland
- Department of Biostatistics, University of Oslo, 0317, Oslo, Norway
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Christophe Fraser
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7LF, UK
| | - Michael U Gutmann
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK
| | - Brian Arnold
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - William P Hanage
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Stephen D Bentley
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Marc Lipsitch
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
- Departments of Epidemiology and Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Nicholas J Croucher
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, W2 1PG, UK.
| |
Collapse
|
43
|
Overcast I, Bagley JC, Hickerson MJ. Strategies for improving approximate Bayesian computation tests for synchronous diversification. BMC Evol Biol 2017; 17:203. [PMID: 28836959 PMCID: PMC5571621 DOI: 10.1186/s12862-017-1052-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 08/14/2017] [Indexed: 11/22/2022] Open
Abstract
Background Estimating the variability in isolation times across co-distributed taxon pairs that may have experienced the same allopatric isolating mechanism is a core goal of comparative phylogeography. The use of hierarchical Approximate Bayesian Computation (ABC) and coalescent models to infer temporal dynamics of lineage co-diversification has been a contentious topic in recent years. Key issues that remain unresolved include the choice of an appropriate prior on the number of co-divergence events (Ψ), as well as the optimal strategies for data summarization. Methods Through simulation-based cross validation we explore the impact of the strategy for sorting summary statistics and the choice of prior on Ψ on the estimation of co-divergence variability. We also introduce a new setting (β) that can potentially improve estimation of Ψ by enforcing a minimal temporal difference between pulses of co-divergence. We apply this new method to three empirical datasets: one dataset each of co-distributed taxon pairs of Panamanian frogs and freshwater fishes, and a large set of Neotropical butterfly sister-taxon pairs. Results We demonstrate that the choice of prior on Ψ has little impact on inference, but that sorting summary statistics yields substantially more reliable estimates of co-divergence variability despite violations of assumptions about exchangeability. We find the implementation of β improves estimation of Ψ, with improvement being most dramatic given larger numbers of taxon pairs. We find equivocal support for synchronous co-divergence for both of the Panamanian groups, but we find considerable support for asynchronous divergence among the Neotropical butterflies. Conclusions Our simulation experiments demonstrate that using sorted summary statistics results in improved estimates of the variability in divergence times, whereas the choice of hyperprior on Ψ has negligible effect. Additionally, we demonstrate that estimating the number of pulses of co-divergence across co-distributed taxon-pairs is improved by applying a flexible buffering regime over divergence times. This improves the correlation between Ψ and the true variability in isolation times and allows for more meaningful interpretation of this hyperparameter. This will allow for more accurate identification of the number of temporally distinct pulses of co-divergence that generated the diversification pattern of a given regional assemblage of sister-taxon-pairs. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-1052-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Isaac Overcast
- Biology Department, City College of New York, New York, NY, 10031, USA. .,The Graduate Center, City University of New York, New York, NY, 10016, USA.
| | - Justin C Bagley
- Departamento de Zoologia, Universidade de Brasília, Brasília, DF, 70910-900, Brazil.,Departamento de Zoologia e Botânica, IBiLCE, Universidade Estadual Paulista, São José do Rio Preto, SP, 15054-000, Brazil
| | - Michael J Hickerson
- Biology Department, City College of New York, New York, NY, 10031, USA.,The Graduate Center, City University of New York, New York, NY, 10016, USA
| |
Collapse
|
44
|
Tietäväinen A, Gutmann MU, Keski-Vakkuri E, Corander J, Hæggström E. Bayesian inference of physiologically meaningful parameters from body sway measurements. Sci Rep 2017. [PMID: 28630413 PMCID: PMC5476665 DOI: 10.1038/s41598-017-02372-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The control of the human body sway by the central nervous system, muscles, and conscious brain is of interest since body sway carries information about the physiological status of a person. Several models have been proposed to describe body sway in an upright standing position, however, due to the statistical intractability of the more realistic models, no formal parameter inference has previously been conducted and the expressive power of such models for real human subjects remains unknown. Using the latest advances in Bayesian statistical inference for intractable models, we fitted a nonlinear control model to posturographic measurements, and we showed that it can accurately predict the sway characteristics of both simulated and real subjects. Our method provides a full statistical characterization of the uncertainty related to all model parameters as quantified by posterior probability density functions, which is useful for comparisons across subjects and test settings. The ability to infer intractable control models from sensor data opens new possibilities for monitoring and predicting body status in health applications.
Collapse
Affiliation(s)
- A Tietäväinen
- Department of Physics, University of Helsinki, FI-00014, Helsinki, Finland.
| | - M U Gutmann
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK
| | - E Keski-Vakkuri
- Department of Physics, University of Helsinki, FI-00014, Helsinki, Finland
| | - J Corander
- Department of Mathematics and Statistics, University of Helsinki, FI-00014, Helsinki, Finland.,Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, N-0317, Oslo, Norway
| | - E Hæggström
- Department of Physics, University of Helsinki, FI-00014, Helsinki, Finland
| |
Collapse
|
45
|
Gutmann MU, Dutta R, Kaski S, Corander J. Likelihood-free inference via classification. STATISTICS AND COMPUTING 2017; 28:411-425. [PMID: 31997856 PMCID: PMC6956883 DOI: 10.1007/s11222-017-9738-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 02/28/2017] [Indexed: 06/10/2023]
Abstract
Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.
Collapse
Affiliation(s)
| | - Ritabrata Dutta
- InterDisciplinary Institute of Data Science, Universitá della Svizzera italiana, Lugano, Switzerland
| | - Samuel Kaski
- Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Espoo, Finland
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|