1
|
Sarala O, Pyhäjärvi T, Sillanpää MJ. BELMM: Bayesian model selection and random walk smoothing in time-series clustering. Bioinformatics 2023; 39:btad686. [PMID: 37963057 PMCID: PMC10686958 DOI: 10.1093/bioinformatics/btad686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 09/22/2023] [Accepted: 11/13/2023] [Indexed: 11/16/2023] Open
Abstract
MOTIVATION Due to advances in measuring technology, many new phenotype, gene expression, and other omics time-course datasets are now commonly available. Cluster analysis may provide useful information about the structure of such data. RESULTS In this work, we propose BELMM (Bayesian Estimation of Latent Mixture Models): a flexible framework for analysing, clustering, and modelling time-series data in a Bayesian setting. The framework is built on mixture modelling: first, the mean curves of the mixture components are assumed to follow random walk smoothing priors. Second, we choose the most plausible model and the number of mixture components using the Reversible-jump Markov chain Monte Carlo. Last, we assign the individual time series into clusters based on the similarity to the cluster-specific trend curves determined by the latent random walk processes. We demonstrate the use of fast and slow implementations of our approach on both simulated and real time-series data using widely available software R, Stan, and CU-MSDSp. AVAILABILITY AND IMPLEMENTATION The French mortality dataset is available at http://www.mortality.org, the Drosophila melanogaster embryogenesis gene expression data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121160. Details on our simulated datasets are available in the Supplementary Material, and R scripts and a detailed tutorial on GitHub at https://github.com/ollisa/BELMM. The software CU-MSDSp is available on GitHub at https://github.com/jtchavisIII/CU-MSDSp.
Collapse
Affiliation(s)
- Olli Sarala
- Research Unit of Mathematical Sciences, University of Oulu, FI-90014 Oulu, Finland
| | - Tanja Pyhäjärvi
- Department of Forest Sciences, University of Helsinki, FI-00014 Helsinki, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, FI-90014 Oulu, Finland
| |
Collapse
|
2
|
Jakaite L, Schetinin V. Adaptive Bayesian learning for making risk-aware decisions: A case of trauma survival prediction. Artif Intell Med 2023; 143:102634. [PMID: 37673555 DOI: 10.1016/j.artmed.2023.102634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 07/30/2023] [Accepted: 08/11/2023] [Indexed: 09/08/2023]
Abstract
Decision tree (DT) models provide a transparent approach to prediction of patient's outcomes within a probabilistic framework. Averaging over DT models under certain conditions can deliver reliable estimates of predictive posterior probability distributions, which is of critical importance in the case of predicting an individual patient's outcome. Reliable estimations of the distribution can be achieved within the Bayesian framework using Markov chain Monte Carlo (MCMC) and its Reversible Jump extension enabling DT models to grow to a reasonable size. Existing MCMC strategies however have limited ability to control DT structures and tend to sample overgrown DT models, making unreasonably small partitions, thus deteriorating the uncertainty calibration. This happens because the MCMC explores a DT model parameter space within a limited knowledge of the distribution of data partitions. We propose a new adaptive strategy which overcomes this limitation, and show that in the case of predicting trauma outcomes the number of data partitions can be significantly reduced, so that the unnecessary uncertainty of estimating the predictive posterior density is avoided. The proposed and existing strategies are compared in terms of entropy which, being calculated for predicted posterior distributions, represents the uncertainty in decisions. In this framework, the proposed method has outperformed the existing sampling strategies, so that the unnecessary uncertainty in decisions is efficiently avoided.
Collapse
Affiliation(s)
- Livija Jakaite
- Computer Science Department and Technology, University of Bedfordshire, UK.
| | - Vitaly Schetinin
- Computer Science Department and Technology, University of Bedfordshire, UK
| |
Collapse
|
3
|
Geels V, Pratola MT, Herbei R. The taxicab sampler: MCMC for discrete spaces with application to tree models. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2119972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Vincent Geels
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | | | - Radu Herbei
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
4
|
Mastrantonio G. Modeling animal movement with directional persistence and attractive points. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Mastrantonio G. The modelling of movement of multiple animals that share behavioural features. J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Madin OC, Boothroyd S, Messerly RA, Fass J, Chodera JD, Shirts MR. Bayesian-Inference-Driven Model Parametrization and Model Selection for 2CLJQ Fluid Models. J Chem Inf Model 2022; 62:874-889. [PMID: 35129974 PMCID: PMC9217127 DOI: 10.1021/acs.jcim.1c00829] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work, we demonstrate the use of Bayesian inference for molecular model selection, using Monte Carlo sampling techniques accelerated with surrogate modeling to evaluate the Bayes factor evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three nested levels of model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only for some chemistries. Through this process, we also get detailed information about the distributions and correlation of parameter values, enabling improved parametrization and parameter analysis. We also show how the choice of parameter priors, which encode previous model knowledge, can have substantial effects on the selection of models, penalizing careless introduction of additional complexity. We detail the computational techniques used in this analysis, providing a roadmap for future applications of molecular model selection via Bayesian inference and surrogate modeling.
Collapse
Affiliation(s)
- Owen C. Madin
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO 80309
| | - Simon Boothroyd
- Boothroyd Scientific Consulting Ltd., 71-75 Shelton Street, London, Greater London, United Kingdom, WC2H 9JQ
| | | | - Josh Fass
- Computational & Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065
| | - John D. Chodera
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO 80309
| | - Michael R. Shirts
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO 80309
| |
Collapse
|
7
|
Li Q, Liu L, Li T, Yao K. Bayesian change-points detection assuming a power law process in the recurrent-event context. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.2006711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Qing Li
- Department of Industrial & Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA
| | - Lijie Liu
- Department of Industrial & Manufacturing Systems Engineering, Iowa State University, Ames, Iowa, USA
| | - Tianqi Li
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Kehui Yao
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
8
|
Helekal D, Ledda A, Volz E, Wyllie D, Didelot X. Bayesian inference of clonal expansions in a dated phylogeny. Syst Biol 2021; 71:1073-1087. [PMID: 34893904 PMCID: PMC9366454 DOI: 10.1093/sysbio/syab095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/23/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Microbial population genetics models often assume that all lineages are constrained by the same population size dynamics over time. However, many neutral and selective events can invalidate this assumption and can contribute to the clonal expansion of a specific lineage relative to the rest of the population. Such differential phylodynamic properties between lineages result in asymmetries and imbalances in phylogenetic trees that are sometimes described informally but which are difficult to analyze formally. To this end, we developed a model of how clonal expansions occur and affect the branching patterns of a phylogeny. We show how the parameters of this model can be inferred from a given dated phylogeny using Bayesian statistics, which allows us to assess the probability that one or more clonal expansion events occurred. For each putative clonal expansion event, we estimate its date of emergence and subsequent phylodynamic trajectory, including its long-term evolutionary potential which is important to determine how much effort should be placed on specific control measures. We demonstrate the applicability of our methodology on simulated and real data sets. Inference under our clonal expansion model can reveal important features in the evolution and epidemiology of infectious disease pathogens. [Clonal expansion; genomic epidemiology; microbial population genomics; phylodynamics.]
Collapse
Affiliation(s)
- David Helekal
- Centre for Doctoral Training in Mathematics for Real-World Systems, University of Warwick, United Kingdom
| | - Alice Ledda
- Healthcare Associated Infections and Antimicrobial Resistance Division, National Infection Service, Public Health England, United Kingdom
| | - Erik Volz
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | - David Wyllie
- Field Service, East of England, National Infection Service, Public Health England, Cambridge, United Kingdom
| | - Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, United Kingdom
| |
Collapse
|
9
|
Marrelec G, Giron A. Automated Extraction of Mutual Independence Patterns Using Bayesian Comparison of Partition Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2299-2313. [PMID: 31985405 DOI: 10.1109/tpami.2020.2968065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Mutual independence is a key concept in statistics that characterizes the structural relationships between variables. Existing methods to investigate mutual independence rely on the definition of two competing models, one being nested into the other and used to generate a null distribution for a statistic of interest, usually under the asymptotic assumption of large sample size. As such, these methods have a very restricted scope of application. In this article, we propose to change the investigation of mutual independence from a hypothesis-driven task that can only be applied in very specific cases to a blind and automated search within patterns of mutual independence. To this end, we treat the issue as one of model comparison that we solve in a Bayesian framework. We show the relationship between such an approach and existing methods in the case of multivariate normal distributions as well as cross-classified multinomial distributions. We propose a general Markov chain Monte Carlo (MCMC) algorithm to numerically approximate the posterior distribution on the space of all patterns of mutual independence. The relevance of the method is demonstrated on synthetic data as well as two real datasets, showing the unique insight provided by this approach.
Collapse
|
10
|
Beesley LJ, Taylor JMG. Bayesian variable selection and shrinkage strategies in a complicated modelling setting with missing data: A case study using multistate models. STAT MODEL 2020. [DOI: 10.1177/1471082x20920972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Multistate modelling is a strategy for jointly modelling related time-to-event outcomes that can handle complicated outcome relationships, has appealing interpretations, can provide insight into different aspects of disease development and can be useful for making individualized predictions. A challenge with using multistate modelling in practice is the large number of parameters, and variable selection and shrinkage strategies are needed in order for these models to gain wider adoption. Application of existing selection and shrinkage strategies in the multistate modelling setting can be challenging due to complicated patterns of data missingness, inclusion of highly correlated predictors and hierarchical parameter relationships. In this article, we discuss how to modify and implement several existing Bayesian variable selection and shrinkage methods in a general multistate modelling setting. We compare the performance of these methods in terms of parameter estimation and model selection in a multistate cure model of recurrence and death in patients treated for head and neck cancer. We can view this work as a case study of variable selection and shrinkage in a complicated modelling setting with missing data.
Collapse
Affiliation(s)
- Lauren J Beesley
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Jeremy MG Taylor
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
11
|
Grosser K, Metzler D. Modeling methylation dynamics with simultaneous changes in CpG islands. BMC Bioinformatics 2020; 21:115. [PMID: 32183713 PMCID: PMC7079395 DOI: 10.1186/s12859-020-3438-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 03/02/2020] [Indexed: 11/24/2022] Open
Abstract
Background In vertebrate genomes, CpG sites can be clustered into CpG islands, and the amount of methylation in a CpG island can change due to gene regulation processes. Thus, single regulatory events can simultaneously change the methylation states of many CpG sites within a CpG island. This should be taken into account when quantifying the amount of change in methylation, for example in form of a branch length in a phylogeny of cell types. Results We propose a probabilistic model (the IWE-SSE model) of methylation dynamics that accounts for simultaneous methylation changes in multiple CpG sites belonging to the same CpG island. We further propose a Markov-chain Monte-Carlo (MCMC) method to fit this model to methylation data from cell type phylogenies and apply this method to available data from murine haematopoietic cells and from human cell lines. Combined with simulation studies, these analyses show that accounting for CpG island wide methylation changes has a strong effect on the inferred branch lengths and leads to a significantly better model fit for the methylation data from murine haematopoietic cells and human cell lines. Conclusion The MCMC based parameter estimation method for the IWE-SSE model in combination with our MCMC based inference method allows to quantify the amount of methylation changes at single CpG sites as well as on entire CpG islands. Accounting for changes affecting entire islands can lead to more accurate branch length estimation in the presence of simultaneous methylation change.
Collapse
Affiliation(s)
- Konrad Grosser
- Department of Biology, Ludwigs-Maximilians Universität München, Großhaderner Straße 2, Planegg, 82152, Germany
| | - Dirk Metzler
- Department of Biology, Ludwigs-Maximilians Universität München, Großhaderner Straße 2, Planegg, 82152, Germany.
| |
Collapse
|
12
|
Bayesian modeling and computation for analyte quantification in complex mixtures using Raman spectroscopy. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106846] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Everitt RG, Culliford R, Medina-Aguayo F, Wilson DJ. Sequential Monte Carlo with transformations. STATISTICS AND COMPUTING 2019; 30:663-676. [PMID: 32116416 PMCID: PMC7026014 DOI: 10.1007/s11222-019-09903-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Accepted: 09/03/2019] [Indexed: 06/10/2023]
Abstract
This paper examines methodology for performing Bayesian inference sequentially on a sequence of posteriors on spaces of different dimensions. For this, we use sequential Monte Carlo samplers, introducing the innovation of using deterministic transformations to move particles effectively between target distributions with different dimensions. This approach, combined with adaptive methods, yields an extremely flexible and general algorithm for Bayesian model comparison that is suitable for use in applications where the acceptance rate in reversible jump Markov chain Monte Carlo is low. We use this approach on model comparison for mixture models, and for inferring coalescent trees sequentially, as data arrives.
Collapse
Affiliation(s)
| | - Richard Culliford
- Department of Mathematics and Statistics, University of Reading, Reading, UK
| | | | - Daniel J. Wilson
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
14
|
Bazarova A, Nieduszynski CA, Akerman I, Burroughs NJ. Bayesian inference of origin firing time distributions, origin interference and licencing probabilities from Next Generation Sequencing data. Nucleic Acids Res 2019; 47:2229-2243. [PMID: 30859196 PMCID: PMC6412128 DOI: 10.1093/nar/gkz094] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/27/2019] [Accepted: 02/05/2019] [Indexed: 12/21/2022] Open
Abstract
DNA replication is a stochastic process with replication forks emanating from multiple replication origins. The origins must be licenced in G1, and the replisome activated at licenced origins in order to generate bi-directional replication forks in S-phase. Differential firing times lead to origin interference, where a replication fork from an origin can replicate through and inactivate neighbouring origins (origin obscuring). We developed a Bayesian algorithm to characterize origin firing statistics from Okazaki fragment (OF) sequencing data. Our algorithm infers the distributions of firing times and the licencing probabilities for three consecutive origins. We demonstrate that our algorithm can distinguish partial origin licencing and origin obscuring in OF sequencing data from Saccharomyces cerevisiae and human cell types. We used our method to analyse the decreased origin efficiency under loss of Rat1 activity in S. cerevisiae, demonstrating that both reduced licencing and increased obscuring contribute. Moreover, we show that robust analysis is possible using only local data (across three neighbouring origins), and analysis of the whole chromosome is not required. Our algorithm utilizes an approximate likelihood and a reversible jump sampling technique, a methodology that can be extended to analysis of other mechanistic processes measurable through Next Generation Sequencing data.
Collapse
Affiliation(s)
- Alina Bazarova
- Centre for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | | | - Ildem Akerman
- Institute of Metabolism and Systems Research, Institute of Biomedical Research, University of Birmingham, Birmingham B15 2TT, UK
| | - Nigel J Burroughs
- Mathematics Institute and Zeeman Institute (SBIDER), University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
15
|
Datta A, Zou H, Banerjee S. Bayesian High-Dimensional Regression for Change Point Analysis. STATISTICS AND ITS INTERFACE 2019; 12:253-264. [PMID: 31543930 PMCID: PMC6753958 DOI: 10.4310/sii.2019.v12.n2.a6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In many econometrics applications, the dataset under investigation spans heterogeneous regimes that are more appropriately modeled using piece-wise components for each of the data segments separated by change-points. We consider using Bayesian high-dimensional shrinkage priors in a change point setting to understand segment-specific relationship between the response and the covariates. Covariate selection before and after each change point can identify possibly different sets of relevant covariates, while the fully Bayesian approach ensures posterior inference for the change points is also available. We demonstrate the flexibility of the approach for imposing different variable selection constraints like grouping or partial selection and discuss strategies to detect an unknown number of change points. Simulation experiments reveal that this simple approach delivers accurate variable selection, and inference on location of the change points, and substantially outperforms a frequentist lasso-based approach, uniformly across a wide range of scenarios. Application of our model to Minnesota house price dataset reveals change in the relationship between house and stock prices around the sub-prime mortgage crisis.
Collapse
Affiliation(s)
- Abhirup Datta
- Department of Biostatistics, Johns Hopkins University,
| | - Hui Zou
- Department of Statistics, University of Minnesota,
| | - Sudipto Banerjee
- Department of Biostatistics, University of California, Los angeles,
| |
Collapse
|
16
|
Tan BK, Panagiotelis A, Athanasopoulos G. Bayesian Inference for the One-Factor Copula Model. J Comput Graph Stat 2018. [DOI: 10.1080/10618600.2018.1482765] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Ban Kheng Tan
- Department of Econometrics and Business Statistics, Monash University, Victoria, Australia
| | | | - George Athanasopoulos
- Department of Econometrics and Business Statistics, Monash University, Victoria, Australia
| |
Collapse
|
17
|
Tuyl F. A Method to Handle Zero Counts in the Multinomial Model. AM STAT 2018. [DOI: 10.1080/00031305.2018.1444673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Frank Tuyl
- School of Mathematical and Physical Sciences, University of Newcastle, Callaghan, Australia
| |
Collapse
|
18
|
Pooley CM, Marion G. Bayesian model evidence as a practical alternative to deviance information criterion. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171519. [PMID: 29657762 PMCID: PMC5882686 DOI: 10.1098/rsos.171519] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 02/13/2018] [Indexed: 06/08/2023]
Abstract
While model evidence is considered by Bayesian statisticians as a gold standard for model selection (the ratio in model evidence between two models giving the Bayes factor), its calculation is often viewed as too computationally demanding for many applications. By contrast, the widely used deviance information criterion (DIC), a different measure that balances model accuracy against complexity, is commonly considered a much faster alternative. However, recent advances in computational tools for efficient multi-temperature Markov chain Monte Carlo algorithms, such as steppingstone sampling (SS) and thermodynamic integration schemes, enable efficient calculation of the Bayesian model evidence. This paper compares both the capability (i.e. ability to select the true model) and speed (i.e. CPU time to achieve a given accuracy) of DIC with model evidence calculated using SS. Three important model classes are considered: linear regression models, mixed models and compartmental models widely used in epidemiology. While DIC was found to correctly identify the true model when applied to linear regression models, it led to incorrect model choice in the other two cases. On the other hand, model evidence led to correct model choice in all cases considered. Importantly, and perhaps surprisingly, DIC and model evidence were found to run at similar computational speeds, a result reinforced by analytically derived expressions.
Collapse
Affiliation(s)
- C. M. Pooley
- The Roslin Institute, The University of Edinburgh, Midlothian EH25 9RG, UK
- Biomathematics and Statistics Scotland, James Clerk Maxwell Building, The King's Buildings, Peter Guthrie Tait Road, Edinburgh EH9 3FD, UK
| | - G. Marion
- Biomathematics and Statistics Scotland, James Clerk Maxwell Building, The King's Buildings, Peter Guthrie Tait Road, Edinburgh EH9 3FD, UK
| |
Collapse
|
19
|
Zhao X, Ning Y, Chen MIC, Cook AR. Individual and Population Trajectories of Influenza Antibody Titers Over Multiple Seasons in a Tropical Country. Am J Epidemiol 2018; 187:135-143. [PMID: 29309522 PMCID: PMC5860523 DOI: 10.1093/aje/kwx201] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 03/06/2017] [Indexed: 01/15/2023] Open
Abstract
Seasonal influenza epidemics occur year-round in the tropics, complicating the planning of vaccination programs. We built an individual-level longitudinal model of baseline antibody levels, time of infection, and the subsequent rise and decay of antibodies postinfection using influenza A(H1N1)pdm09 data from 2 sources in Singapore: 1) a noncommunity cohort with real-time polymerase chain reaction–confirmed infections and at least 1 serological sample collected from each participant between May and October 2009 (n = 118) and 2) a community cohort with up to 6 serological samples collected between May 2009 and October 2010 (n = 760). The model was hierarchical, to account for interval censoring and interindividual variation. Model parameters were estimated via a reversible jump Markov chain Monte Carlo algorithm using custom-designed R (https://www.r-project.org/) and C++ (https://isocpp.org/) code. After infection, antibody levels peaked at 4–7 weeks, with a half-life of 26.5 weeks, followed by a slower decrease up to 1 year to approximately preinfection levels. After the third wave, the seropositivity rate and the population-level antibody titer dropped to the same level as they were at the end of the first pandemic wave. The results of this analysis are consistent with the hypothesis that the population-level effect of individuals’ waxing and waning antibodies influences influenza seasonality in the tropics.
Collapse
Affiliation(s)
- Xiahong Zhao
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore
| | - Yilin Ning
- Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore
- Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| | - Mark I-Cheng Chen
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore
- Department of Clinical Epidemiology, Institute of Infectious Diseases and Epidemiology, Tan Tock Seng Hospital, Singapore
| | - Alex R Cook
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore
| |
Collapse
|
20
|
Abstract
A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with symmetric Dirichlet weights, and put a prior on the number of components-that is, to use a mixture of finite mixtures (MFM). The most commonly-used method of inference for MFMs is reversible jump Markov chain Monte Carlo, but it can be nontrivial to design good reversible jump moves, especially in high-dimensional spaces. Meanwhile, there are samplers for Dirichlet process mixture (DPM) models that are relatively simple and are easily adapted to new applications. It turns out that, in fact, many of the essential properties of DPMs are also exhibited by MFMs-an exchangeable partition distribution, restaurant process, random measure representation, and stick-breaking representation-and crucially, the MFM analogues are simple enough that they can be used much like the corresponding DPM properties. Consequently, many of the powerful methods developed for inference in DPMs can be directly applied to MFMs as well; this simplifies the implementation of MFMs and can substantially improve mixing. We illustrate with real and simulated data, including high-dimensional gene expression data used to discriminate cancer subtypes.
Collapse
|
21
|
Whittles LK, Didelot X. Epidemiological analysis of the Eyam plague outbreak of 1665-1666. Proc Biol Sci 2017; 283:rspb.2016.0618. [PMID: 27170724 PMCID: PMC4874723 DOI: 10.1098/rspb.2016.0618] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 04/13/2016] [Indexed: 01/14/2023] Open
Abstract
Plague, caused by the bacterium Yersinia pestis, is one of the deadliest infectious diseases in human history, and still causes worrying outbreaks in Africa and South America. Despite the historical and current importance of plague, several questions remain unanswered concerning its transmission routes and infection risk factors. The plague outbreak that started in September 1665 in the Derbyshire village of Eyam claimed 257 lives over 14 months, wiping out entire families. Since previous attempts at modelling the Eyam plague, new data have been unearthed from parish records revealing a much more complete record of the disease. Using a stochastic compartmental model and Bayesian analytical methods, we found that both rodent-to-human and human-to-human transmission played an important role in spreading the infection, and that they accounted, respectively, for a quarter and three-quarters of all infections, with a statistically significant seasonality effect. We also found that the force of infection was stronger for infectious individuals living in the same household compared with the rest of the village. Poverty significantly increased the risk of disease, whereas adulthood decreased the risk. These results on the Eyam outbreak contribute to the current debate on the relative importance of plague transmission routes.
Collapse
Affiliation(s)
- Lilith K Whittles
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Xavier Didelot
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| |
Collapse
|
22
|
Gamado K, Marion G, Porphyre T. Data-Driven Risk Assessment from Small Scale Epidemics: Estimation and Model Choice for Spatio-Temporal Data with Application to a Classical Swine Fever Outbreak. Front Vet Sci 2017; 4:16. [PMID: 28293559 PMCID: PMC5329025 DOI: 10.3389/fvets.2017.00016] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 01/30/2017] [Indexed: 11/30/2022] Open
Abstract
Livestock epidemics have the potential to give rise to significant economic, welfare, and social costs. Incursions of emerging and re-emerging pathogens may lead to small and repeated outbreaks. Analysis of the resulting data is statistically challenging but can inform disease preparedness reducing potential future losses. We present a framework for spatial risk assessment of disease incursions based on data from small localized historic outbreaks. We focus on between-farm spread of livestock pathogens and illustrate our methods by application to data on the small outbreak of Classical Swine Fever (CSF) that occurred in 2000 in East Anglia, UK. We apply models based on continuous time semi-Markov processes, using data-augmentation Markov Chain Monte Carlo techniques within a Bayesian framework to infer disease dynamics and detection from incompletely observed outbreaks. The spatial transmission kernel describing pathogen spread between farms, and the distribution of times between infection and detection, is estimated alongside unobserved exposure times. Our results demonstrate inference is reliable even for relatively small outbreaks when the data-generating model is known. However, associated risk assessments depend strongly on the form of the fitted transmission kernel. Therefore, for real applications, methods are needed to select the most appropriate model in light of the data. We assess standard Deviance Information Criteria (DIC) model selection tools and recently introduced latent residual methods of model assessment, in selecting the functional form of the spatial transmission kernel. These methods are applied to the CSF data, and tested in simulated scenarios which represent field data, but assume the data generation mechanism is known. Analysis of simulated scenarios shows that latent residual methods enable reliable selection of the transmission kernel even for small outbreaks whereas the DIC is less reliable. Moreover, compared with DIC, model choice based on latent residual assessment correlated better with predicted risk.
Collapse
Affiliation(s)
| | - Glenn Marion
- Biomathematics and Statistics Scotland , Edinburgh , UK
| | - Thibaud Porphyre
- Epidemiology Research Group, Center for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh, UK; The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, UK
| |
Collapse
|
23
|
Keenan DM, Veldhuis JD. Pulsatility of Hypothalamo-Pituitary Hormones: A Challenge in Quantification. Physiology (Bethesda) 2017; 31:34-50. [PMID: 26674550 DOI: 10.1152/physiol.00027.2015] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Neuroendocrine systems control many of the most fundamental physiological processes, e.g., reproduction, growth, adaptations to stress, and metabolism. Each such system involves the hypothalamus, the pituitary, and a specific target gland or organ. In the quantification of the interactions among these components, biostatistical modeling has played an important role. In the present article, five key challenges to an understanding of the interactions of these systems are illustrated and discussed critically.
Collapse
Affiliation(s)
- Daniel M Keenan
- Department of Statistics, University of Virginia, Charlottesville, Virginia; and
| | - Johannes D Veldhuis
- Department of Medicine, Endocrine Research Unit, Mayo School of Graduate Medical Education, Clinical Translational Science Center, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
24
|
Ferraz do Nascimento F, Gamerman D, Davis R. A Bayesian semi-parametric approach to extreme regime identification. BRAZ J PROBAB STAT 2016. [DOI: 10.1214/15-bjps293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Fernández-Durán JJ, Gregorio-Domínguez MM. Bayesian analysis of circular distributions based on non-negative trigonometric sums. J STAT COMPUT SIM 2016. [DOI: 10.1080/00949655.2016.1153641] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
Drovandi CC, McCutchan RA. Alive SMC2: Bayesian model selection for low-count time series models with intractable likelihoods. Biometrics 2015; 72:344-53. [DOI: 10.1111/biom.12449] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 10/01/2015] [Accepted: 10/01/2015] [Indexed: 11/27/2022]
Affiliation(s)
- Christopher C. Drovandi
- Mathematical Sciences School; Queensland University of Technology; Brisbane, Queensland Australia
| | - Roy A. McCutchan
- Mathematical Sciences School; Queensland University of Technology; Brisbane, Queensland Australia
| |
Collapse
|
27
|
|
28
|
Mohedano R, Cavallaro A, García N. Camera Localization UsingTrajectories and Maps. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014; 36:684-697. [PMID: 26353193 DOI: 10.1109/tpami.2013.243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We propose a new Bayesian framework for automatically determining the position (location and orientation) of an uncalibrated camera using the observations of moving objects and a schematic map of the passable areas of the environment. Our approach takes advantage of static and dynamic information on the scene structures through prior probability distributions for object dynamics. The proposed approach restricts plausible positions where the sensor can be located while taking into account the inherent ambiguity of the given setting. The proposed framework samples from the posterior probability distribution for the camera position via data driven MCMC, guided by an initial geometric analysis that restricts the search space. A Kullback-Leibler divergence analysis is then used that yields the final camera position estimate, while explicitly isolating ambiguous settings. The proposed approach is evaluated in synthetic and real environments, showing its satisfactory performance in both ambiguous and unambiguous settings.
Collapse
|
29
|
Pandolfi S, Bartolucci F, Friel N. A generalized multiple-try version of the Reversible Jump algorithm. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.10.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
30
|
Hossain M, Lawson A, Cai B, Choi J, Liu J, Kirby RS. Space-Time Areal Mixture Model: Relabeling Algorithm and Model Selection Issues. ENVIRONMETRICS 2014; 25:84-96. [PMID: 25221430 PMCID: PMC4159962 DOI: 10.1002/env.2265] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
With the growing popularity of spatial mixture models in cluster analysis, model selection criteria have become an established tool in the search for parsimony. However, the label-switching problem is often inherent in Bayesian implementation of mixture models and a variety of relabeling algorithms have been proposed. We use a space-time mixture of Poisson regression models with homogeneous covariate effects to illustrate that the best model selected by using model selection criteria does not always support the model that is chosen by the optimal relabeling algorithm. The results are illustrated for real and simulated datasets. The objective is to make the reader aware that if the purpose of statistical modeling is to identify clusters, applying a relabeling algorithm to the model with the best fit may not generate the optimal relabeling.
Collapse
Affiliation(s)
- M.M. Hossain
- Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - A.B. Lawson
- Division of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, USA
| | - B. Cai
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - J. Choi
- Department of Mathematics, Hanyang University, South Korea
| | - J. Liu
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA
| | - R. S. Kirby
- Department of Community and Family Health, University of South Florida, Tampa, FL, USA
| |
Collapse
|
31
|
Zhou Y, Aston JA, Johansen AM. Bayesian model comparison for compartmental models with applications in positron emission tomography. J Appl Stat 2013. [DOI: 10.1080/02664763.2013.772569] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
32
|
References. Comput Stat 2013. [DOI: 10.1002/9781118555552.refs] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
33
|
Ponciano JM, Burleigh JG, Braun EL, Taper ML. Assessing parameter identifiability in phylogenetic models using data cloning. Syst Biol 2012; 61:955-72. [PMID: 22649181 PMCID: PMC3478565 DOI: 10.1093/sysbio/sys055] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Revised: 02/02/2012] [Accepted: 05/25/2012] [Indexed: 11/14/2022] Open
Abstract
The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approach to phylogenetics demands validation studies and sound measures of performance. To date there has been little practical guidance available as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the maximum likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter nonidentifiability (NI) and distinguish it from other parameter estimability problems, including when parameters are structurally identifiable, but are not estimable in a given data set (INE), and when parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use Bayesian phylogenetics software to diagnose nonidentifiability. Theoreticians and practitioners alike now have a powerful, yet simple tool to detect nonidentifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.
Collapse
|
34
|
Using mixture of Gamma distributions for Bayesian analysis in an M/G/1 queue with optional second service. Comput Stat 2012. [DOI: 10.1007/s00180-012-0323-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|