1
|
Laloy E, Rogiers B, Bielen A, Borella A, Boden S. Improving Bayesian radiological profiling of waste drums using Dirichlet priors, Gaussian process priors, and hierarchical modeling. Appl Radiat Isot 2023; 194:110691. [PMID: 36716689 DOI: 10.1016/j.apradiso.2023.110691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 12/07/2022] [Accepted: 01/21/2023] [Indexed: 01/26/2023]
Abstract
We present three methodological improvements of our recently proposed approach for Bayesian inference of the radionuclide inventory in radioactive waste drums, from radiological measurements. First we resort to the Dirichlet distribution for the prior distribution of the isotopic vector. The Dirichlet distribution possesses the attractive property that the elements of its vector samples sum up to 1. Second, we demonstrate that such Dirichlet priors can be incorporated within an hierarchical modeling of the prior uncertainty in the isotopic vector, when prior information about isotopic composition is available. Our used Bayesian hierarchical modeling framework makes use of this available information but also acknowledges its uncertainty by letting to a controlled extent the information content of the indirect measurement data (i.e., gamma and neutron counts) shape the actual prior distribution of the isotopic vector. Third, we propose to regularize the Bayesian inversion by using Gaussian process (GP) prior modeling when inferring 1D spatially-distributed mass or, equivalently, activity distributions. As of uncertainty in the efficiencies, we keep using the same stylized drum modeling approach as proposed in our previous work to account for the source distribution uncertainty across the vertical direction of the drum. A series of synthetic tests followed by application to a real waste drum show that combining hierarchical modeling of the prior isotopic composition uncertainty together with GP prior modeling of the vertical Pu profile across the drum works well. We also find that our GP prior can handles both cases with and without spatial correlation. Of course, our GP prior modeling framework only makes sense in the context of spatial inference. Furthermore, the computational times involved by our approach are on the order of a few hours, say about 2, to provide uncertainty estimates for all variables of interest in the considered inverse problem. This warrants further investigations to speed up the inference.
Collapse
Affiliation(s)
- Eric Laloy
- Waste and Disposal, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - Bart Rogiers
- Waste and Disposal, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - An Bielen
- Dismantling, Decontamination and Waste, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - Alessandro Borella
- Society and Policy Support, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - Sven Boden
- Dismantling, Decontamination and Waste, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| |
Collapse
|
2
|
Jafari B, Deardon R. Bias and bias-correction for individual-level models of infectious disease. Spat Spatiotemporal Epidemiol 2022; 43:100524. [PMID: 36460441 DOI: 10.1016/j.sste.2022.100524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 12/15/2022]
Abstract
Accurate infectious disease models can help scientists understand how an ongoing disease epidemic spreads and forecast the course of epidemics more effectively. Considering various factors that affect the spread of a disease (e.g. geographical, social, domestic, and genetic), a class of individual-level models (ILMs) was developed to incorporate population heterogeneity. In these models, inferences are developed within a Bayesian Markov chain Monte Carlo (MCMC) framework, obtaining posterior estimates of model parameters. The issues of bias of parameter estimates, and methods for bias correction, have been widely studied with respect to many of the most established and commonly used statistical models and associated methods of parameter estimation. However, these methods are not directly applicable to infectious disease data. This paper investigates circumstances in which ILM parameter estimates may be biased in some simple disease system scenarios. Further, we aim to compare the performance of bias-corrected estimates of ILM parameters, using simulation, with the posterior estimates of the parameter. We also discuss the factors that affect the performance of these estimators.
Collapse
Affiliation(s)
- Behnaz Jafari
- University of Calgary, Department of Mathematics and Statistics, 2500 University Dr. NW, Calgary, AB, Canada, T2N 1N4.
| | - Robert Deardon
- University of Calgary, Department of Mathematics and Statistics, 2500 University Dr. NW, Calgary, AB, Canada, T2N 1N4; University of Calgary, Faculty of Veterinary, 2500 University Dr. NW, Calgary, AB, Canada, T2N 1N4.
| |
Collapse
|
3
|
Laloy E, Rogiers B, Bielen A, Boden S. Bayesian inference of 1D activity profiles from segmented gamma scanning of a heterogeneous radioactive waste drum. Appl Radiat Isot 2021; 175:109803. [PMID: 34118589 DOI: 10.1016/j.apradiso.2021.109803] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 04/20/2021] [Accepted: 05/31/2021] [Indexed: 11/28/2022]
Abstract
We present a Bayesian approach to probabilistically infer vertical activity profiles within a radioactive waste drum from segmented gamma scanning (SGS) measurements. Our approach resorts to Markov chain Monte Carlo (MCMC) sampling using the state-of-the-art Hamiltonian Monte Carlo (HMC) technique and accounts for two important sources of uncertainty: the measurement uncertainty and the uncertainty in the source distribution within the drum. In addition, our efficiency model simulates the contributions of all considered segments to each count measurement. Our approach is first demonstrated with a synthetic example, after which it is used to resolve the vertical activity distribution of 5 nuclides in a real waste package.
Collapse
Affiliation(s)
- Eric Laloy
- Waste and Disposal, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - Bart Rogiers
- Waste and Disposal, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - An Bielen
- Dismantling, Decontamination and Waste, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| | - Sven Boden
- Dismantling, Decontamination and Waste, Institute for Environment, Health and Safety, Belgian Nuclear Research Centre (SCK CEN), Belgium.
| |
Collapse
|
4
|
Fisher HF, Boys RJ, Gillespie CS, Proctor CJ, Golightly A. Parameter inference for a stochastic kinetic model of expanded polyglutamine proteins. Biometrics 2021; 78:1195-1208. [PMID: 33837525 DOI: 10.1111/biom.13467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 03/21/2021] [Accepted: 03/24/2021] [Indexed: 11/30/2022]
Abstract
The presence of protein aggregates in cells is a known feature of many human age-related diseases, such as Huntington's disease. Simulations using fixed parameter values in a model of the dynamic evolution of expanded polyglutaime (PolyQ) proteins in cells have been used to gain a better understanding of the biological system. However, there is considerable uncertainty about the values of some of the parameters governing the system. Currently, appropriate values are chosen by ad hoc attempts to tune the parameters so that the model output matches experimental data. The problem is further complicated by the fact that the data only offer a partial insight into the underlying biological process: the data consist only of the proportions of cell death and of cells with inclusion bodies at a few time points, corrupted by measurement error. Developing inference procedures to estimate the model parameters in this scenario is a significant task. The model probabilities corresponding to the observed proportions cannot be evaluated exactly, and so they are estimated within the inference algorithm by repeatedly simulating realizations from the model. In general such an approach is computationally very expensive, and we therefore construct Gaussian process emulators for the key quantities and reformulate our algorithm around these fast stochastic approximations. We conclude by highlighting appropriate values of the model parameters leading to new insights into the underlying biological processes.
Collapse
Affiliation(s)
- H F Fisher
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK.,Population Health Sciences Institute, Newcastle University, Newcastle Upon Tyne, UK
| | - R J Boys
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK
| | - C S Gillespie
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK
| | - C J Proctor
- Institute of Cellular Medicine, Newcastle University, Newcastle Upon Tyne, UK
| | - A Golightly
- School of Mathematics, Statistics & Physics, Newcastle University, Newcastle Upon Tyne, UK
| |
Collapse
|
5
|
Natesan Batley P, Hedges LV. Accurate models vs. accurate estimates: A simulation study of Bayesian single-case experimental designs. Behav Res Methods 2021; 53:1782-98. [PMID: 33575987 DOI: 10.3758/s13428-020-01522-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/11/2020] [Indexed: 11/08/2022]
Abstract
Although statistical practices to evaluate intervention effects in single-case experimental design (SCEDs) have gained prominence in recent times, models are yet to incorporate and investigate all their analytic complexities. Most of these statistical models incorporate slopes and autocorrelations, both of which contribute to trend in the data. The question that arises is whether in SCED data that show trend, there is indeterminacy between estimating slope and autocorrelation, because both contribute to trend, and the data have a limited number of observations. Using Monte Carlo simulation, we compared the performance of four Bayesian change-point models: (a) intercepts only (IO), (b) slopes but no autocorrelations (SI), (c) autocorrelations but no slopes (NS), and (d) both autocorrelations and slopes (SA). Weakly informative priors were used to remain agnostic about the parameters. Coverage rates showed that for the SA model, either the slope effect size or the autocorrelation credible interval almost always erroneously contained 0, and the type II errors were prohibitively large. Considering the 0-coverage and coverage rates of slope effect size, intercept effect size, mean relative bias, and second-phase intercept relative bias, the SI model outperformed all other models. Therefore, it is recommended that researchers favor the SI model over the other three models. Research studies that develop slope effect sizes for SCEDs should consider the performance of the statistic by taking into account coverage and 0-coverage rates. These helped uncover patterns that were not realized in other simulation studies. We underline the need for investigating the use of informative priors in SCEDs.
Collapse
|
6
|
Bresson G, Chaturvedi A, Rahman MA, Shalabh. Seemingly unrelated regression with measurement error: estimation via Markov Chain Monte Carlo and mean field variational Bayes approximation. Int J Biostat 2020; 17:75-97. [PMID: 32949454 DOI: 10.1515/ijb-2019-0120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/12/2020] [Indexed: 11/15/2022]
Abstract
Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates for different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.
Collapse
Affiliation(s)
| | - Anoop Chaturvedi
- Department of Statistics, University of Allahabad, Allahabad, India
| | | | - Shalabh
- Department of Mathematics and Statistics, Indian Institute of Technology, Kanpur, India
| |
Collapse
|
7
|
de Oliveira Peres MV, Achcar JA, Martinez EZ. Bivariate lifetime models in presence of cure fraction: a comparative study with many different copula functions. Heliyon 2020; 6:e03961. [PMID: 32551374 DOI: 10.1016/j.heliyon.2020.e03961] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 04/13/2020] [Accepted: 05/06/2020] [Indexed: 11/21/2022] Open
Abstract
In time-to-event studies it is common the presence of a fraction of individuals not expecting to experience the event of interest; these individuals who are immune to the event or cured for the disease during the study are known as long-term survivors. In addition, in many studies it is observed two lifetimes associated to the same individual, and in some cases there exists a dependence structure between them. In these situations, the usual existing lifetime distributions are not appropriate to model data sets with long-term survivors and dependent bivariate lifetimes. In this study, it is proposed a bivariate model based on a Weibull standard distribution with a dependence structure based on fifteen different copula functions. We assumed the Weibull distribution due to its wide use in survival data analysis and its greater flexibility and simplicity, but the presented methods can be adapted to other continuous survival distributions. Three examples, considering real data sets are introduced to illustrate the proposed methodology. A Bayesian approach is assumed to get the inferences for the parameters of the model where the posterior summaries of interest are obtained using Markov Chain Monte Carlo simulation methods and the Openbugs software. For the data analysis considering different real data sets it was assumed fifteen different copula models from which is was possible to find models with satisfactory fit for the bivariate lifetimes in presence of long-term survivors.
Collapse
|
8
|
Abstract
Equating and scaling in the context of small sample exams, such as credentialing exams for highly specialized professions, has received increased attention in recent research. Investigators have proposed a variety of both classical and Rasch-based approaches to the problem. This study attempts to extend past research by (1) directly comparing classical and Rasch techniques of equating exam scores when sample sizes are small (N≤ 100 per exam form) and (2) attempting to pool multiple forms' worth of data to improve estimation in the Rasch framework. We simulated multiple years of a small-sample exam program by resampling from a larger certification exam program's real data. Results showed that combining multiple administrations' worth of data via the Rasch model can lead to more accurate equating compared to classical methods designed to work well in small samples. WINSTEPS-based Rasch methods that used multiple exam forms' data worked better than Bayesian Markov Chain Monte Carlo methods, as the prior distribution used to estimate the item difficulty parameters biased predicted scores when there were difficulty differences between exam forms.
Collapse
Affiliation(s)
- Ben Babcock
- The American Registry of Radiologic Technologists, Saint Paul, MN, USA
| | | |
Collapse
|
9
|
Mahsin MD, Deardon R, Brown P. Geographically dependent individual-level models for infectious diseases transmission. Biostatistics 2020; 23:1-17. [PMID: 32118253 DOI: 10.1093/biostatistics/kxaa009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 11/22/2019] [Accepted: 01/29/2020] [Indexed: 11/14/2022] Open
Abstract
Infectious disease models can be of great use for understanding the underlying mechanisms that influence the spread of diseases and predicting future disease progression. Modeling has been increasingly used to evaluate the potential impact of different control measures and to guide public health policy decisions. In recent years, there has been rapid progress in developing spatio-temporal modeling of infectious diseases and an example of such recent developments is the discrete-time individual-level models (ILMs). These models are well developed and provide a common framework for modeling many disease systems; however, they assume the probability of disease transmission between two individuals depends only on their spatial separation and not on their spatial locations. In cases where spatial location itself is important for understanding the spread of emerging infectious diseases and identifying their causes, it would be beneficial to incorporate the effect of spatial location in the model. In this study, we thus generalize the ILMs to a new class of geographically dependent ILMs, to allow for the evaluation of the effect of spatially varying risk factors (e.g., education, social deprivation, environmental), as well as unobserved spatial structure, upon the transmission of infectious disease. Specifically, we consider a conditional autoregressive (CAR) model to capture the effects of unobserved spatially structured latent covariates or measurement error. This results in flexible infectious disease models that can be used for formulating etiological hypotheses and identifying geographical regions of unusually high risk to formulate preventive action. The reliability of these models is investigated on a combination of simulated epidemic data and Alberta seasonal influenza outbreak data ($2009$). This new class of models is fitted to data within a Bayesian statistical framework using Markov chain Monte Carlo methods.
Collapse
Affiliation(s)
- M D Mahsin
- Department of Mathematics and Statistics and Faculty of Veterinary Medicine, University of Calgary, 2500 University Dr NW, Calgary AB T2N 1N4, Canada
| | - Rob Deardon
- Department of Mathematics and Statistics and Faculty of Veterinary Medicine, University of Calgary, 2500 University Dr NW, Calgary AB T2N 1N4, Canada
| | - Patrick Brown
- Department of Statistical Sciences, University of Toronto, Canada
| |
Collapse
|
10
|
Chou WC, Lin Z. Bayesian evaluation of a physiologically based pharmacokinetic (PBPK) model for perfluorooctane sulfonate (PFOS) to characterize the interspecies uncertainty between mice, rats, monkeys, and humans: Development and performance verification. Environ Int 2019; 129:408-422. [PMID: 31152982 DOI: 10.1016/j.envint.2019.03.058] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2019] [Revised: 03/07/2019] [Accepted: 03/25/2019] [Indexed: 05/20/2023]
Abstract
A challenge in the risk assessment of perfluorooctane sulfonate (PFOS) is the large interspecies differences in its toxicokinetics that results in substantial uncertainty in the dosimetry and toxicity extrapolation from animals to humans. To address this challenge, the objective of this study was to develop an open-source physiologically based pharmacokinetic (PBPK) model accounting for species-specific toxicokinetic parameters of PFOS. Considering available knowledge about the toxicokinetic properties of PFOS, a PBPK model for PFOS in mice, rats, monkeys, and humans after intravenous and oral administrations was created. Available species-specific toxicokinetic data were used for model calibration and optimization, and independent datasets were used for model evaluation. Bayesian statistical analysis using Markov chain Monte Carlo (MCMC) simulation was performed to optimize the model and to characterize the uncertainty and interspecies variability of chemical-specific parameters. The model predictions well correlated with the majority of datasets for all four species, and the model was validated with independent data in rats, monkeys, and humans. The model was applied to predict human equivalent doses (HEDs) based on reported points of departure in selected critical toxicity studies in rats and monkeys following U.S. EPA's guidelines. The lower bounds of the model-derived HEDs were overall lower than the HEDs estimated by U.S. EPA (e.g., 0.2 vs. 1.3 μg/kg/day based on the rat plasma data). This integrated and comparative analysis provides an important step towards improving interspecies extrapolation and quantitative risk assessment of PFOS, and this open-source model provides a foundation for developing models for other perfluoroalkyl substances.
Collapse
Affiliation(s)
- Wei-Chun Chou
- Institute of Computational Comparative Medicine (ICCM), Department of Anatomy and Physiology, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, United States.
| | - Zhoumeng Lin
- Institute of Computational Comparative Medicine (ICCM), Department of Anatomy and Physiology, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, United States.
| |
Collapse
|
11
|
Abstract
Early infancy from at-birth to 3 years is critical for cognitive, emotional and social development of infants. During this period, infant's developmental tempo and outcomes are potentially impacted by in utero exposure to endocrine disrupting compounds (EDCs), such as bisphenol A (BPA) and phthalates. We investigate effects of ten ubiquitous EDCs on the infant growth dynamics of body mass index (BMI) in a birth cohort study.Modeling growth acceleration is proposed to understand the "force of growth" through a class of semiparametric stochastic velocity models. The great flexibility of such a dynamic model enables us to capture subject-specific dynamics of growth trajectories and to assess effects of the EDCs on potential delay of growth. We adopted a Bayesian method with the Ornstein-Uhlenbeck process as the prior for the growth rate function, in which the World Health Organization global infant's growth curves were integrated into our analysis. We found that BPA and most of phthalates exposed during the first trimester of pregnancy were inversely associated with BMI growth acceleration, resulting in a delayed achievement of infant BMI peak. Such early growth deficiency has been reported as a profound impact on health outcomes in puberty (e.g., timing of sexual maturation) and adulthood.
Collapse
Affiliation(s)
- Jonggyu Baek
- University of Michigan, University of Massachusetts Medical School and National Institutes of Health
| | - Bin Zhu
- University of Michigan, University of Massachusetts Medical School and National Institutes of Health
| | - Peter X K Song
- University of Michigan, University of Massachusetts Medical School and National Institutes of Health
| |
Collapse
|
12
|
Nagaraja K, Braga-Neto U. Bayesian Classification of Proteomics Biomarkers from Selected Reaction Monitoring Data using an Approximate Bayesian Computation-Markov Chain Monte Carlo Approach. Cancer Inform 2018; 17:1176935118786927. [PMID: 30083051 PMCID: PMC6071182 DOI: 10.1177/1176935118786927] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 05/24/2018] [Indexed: 11/16/2022] Open
Abstract
Selected reaction monitoring (SRM) has become one of the main methods for
low-mass-range–targeted proteomics by mass spectrometry (MS). However, in most
SRM-MS biomarker validation studies, the sample size is very small, and in
particular smaller than the number of proteins measured in the experiment.
Moreover, the data can be noisy due to a low number of ions detected per peptide
by the instrument. In this article, those issues are addressed by a model-based
Bayesian method for classification of SRM-MS data. The methodology is
likelihood-free, using approximate Bayesian computation implemented via a Markov
chain Monte Carlo procedure and a kernel-based Optimal Bayesian Classifier.
Extensive experimental results demonstrate that the proposed method outperforms
classical methods such as linear discriminant analysis and 3NN, when sample size
is small, dimensionality is large, the data are noisy, or a combination of
these.
Collapse
Affiliation(s)
- Kashyap Nagaraja
- Department of Electrical & Computer Engineering and Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| | - Ulisses Braga-Neto
- Department of Electrical & Computer Engineering and Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
13
|
Abstract
Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.
Collapse
Affiliation(s)
- Yong Luo
- National Center for Assessment in Higher
Education, Riyadh, Saudi Arabia
| | - Hong Jiao
- University of Maryland, College Park,
MD, USA
| |
Collapse
|
14
|
Zhao H, Hobbs BP, Ma H, Jiang Q, Carlin BP. Combining Non-randomized and Randomized Data in Clinical Trials Using Commensurate Priors. Health Serv Outcomes Res Methodol 2016; 16:154-71. [PMID: 28458614 DOI: 10.1007/s10742-016-0155-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Randomization eliminates selection bias, and attenuates imbalance among study arms with respect to prognostic factors, both known and unknown. Thus, information arising from randomized clinical trials (RCTs) is typically considered the gold standard for comparing therapeutic interventions in confirmatory studies. However, RCTs are limited in contexts wherein patients who are willing to accept a random treatment assignment represent only a subset of the patient population. By contrast, observational studies (OSs) often enroll patient cohorts that better reflect the broader patient population. However, OSs often suffer from selection bias, and may yield invalid treatment comparisons even after adjusting for known confounders. Therefore, combining information acquired from OSs with data from RCTs in research synthesis is often criticized due to the limitations of OSs. In this article, we combine randomized and non-randomized substudy data from FIRST, a recent HIV/AIDS drug trial. We develop hierarchical Bayesian approaches devised to combine data from all sources simultaneously while explicitly accounting for potential discrepancies in the sources' designs. Specifically, we describe a two-step approach combining propensity score matching and Bayesian hierarchical modeling to integrate information from non-randomized studies with data from RCTs, to an extent that depends on the estimated commensurability of the data sources. We investigate our procedure's operating characteristics via simulation. Our findings have implications for HIV/AIDS research, as well as elucidate the extent to which well-designed non-randomized studies can complement RCTs.
Collapse
|
15
|
Li D, Wang X, Dey DK. A flexible cure rate model for spatially correlated survival data based on generalized extreme value distribution and Gaussian process priors. Biom J 2016; 58:1178-97. [PMID: 27225466 DOI: 10.1002/bimj.201500040] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Revised: 11/11/2015] [Accepted: 12/14/2015] [Indexed: 12/22/2022]
Abstract
Our present work proposes a new survival model in a Bayesian context to analyze right-censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region-specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa.
Collapse
Affiliation(s)
- Dan Li
- Department of Mathematical Sciences, University of Cincinnati, 2815 Commons Way, Cincinnati, Ohio 45221-0025, USA
| | - Xia Wang
- Department of Mathematical Sciences, University of Cincinnati, 2815 Commons Way, Cincinnati, Ohio 45221-0025, USA.
| | - Dipak K Dey
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, Connecticut 06269-4098, USA
| |
Collapse
|
16
|
Abstract
We present a Bayesian model for area-level count data that uses Gaussian random effects with a novel type of G-Wishart prior on the inverse variance- covariance matrix. Specifically, we introduce a new distribution called the truncated G-Wishart distribution that has support over precision matrices that lead to positive associations between the random effects of neighboring regions while preserving conditional independence of non-neighboring regions. We describe Markov chain Monte Carlo sampling algorithms for the truncated G-Wishart prior in a disease mapping context and compare our results to Bayesian hierarchical models based on intrinsic autoregression priors. A simulation study illustrates that using the truncated G-Wishart prior improves over the intrinsic autoregressive priors when there are discontinuities in the disease risk surface. The new model is applied to an analysis of cancer incidence data in Washington State.
Collapse
Affiliation(s)
- Theresa R. Smith
- Department of Statistics, University of Washington, Seattle, WA 98195
| | - Jon Wakefield
- Departments of Statistics and Biostatistics, University of Washington, Seattle, WA 98195
| | - Adrian Dobra
- Departments of Statistics, Biobehavioral Nursing, and Health Systems and the Center for Statistics and the Social Sciences, University of Washington, Seattle, WA 98195
| |
Collapse
|
17
|
Critchlow R, Plumptre AJ, Driciru M, Rwetsiba A, Stokes EJ, Tumwesigye C, Wanyama F, Beale CM. Spatiotemporal trends of illegal activities from ranger-collected data in a Ugandan national park. Conserv Biol 2015; 29:1458-1470. [PMID: 25996571 DOI: 10.1111/cobi.12538] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 03/01/2015] [Indexed: 06/04/2023]
Abstract
Within protected areas, biodiversity loss is often a consequence of illegal resource use. Understanding the patterns and extent of illegal activities is therefore essential for effective law enforcement and prevention of biodiversity declines. We used extensive data, commonly collected by ranger patrols in many protected areas, and Bayesian hierarchical models to identify drivers, trends, and distribution of multiple illegal activities within the Queen Elizabeth Conservation Area (QECA), Uganda. Encroachment (e.g., by pastoralists with cattle) and poaching of noncommercial animals (e.g., snaring bushmeat) were the most prevalent illegal activities within the QECA. Illegal activities occurred in different areas of the QECA. Poaching of noncommercial animals was most widely distributed within the national park. Overall, ecological covariates, although significant, were not useful predictors for occurrence of illegal activities. Instead, the location of illegal activities in previous years was more important. There were significant increases in encroachment and noncommercial plant harvesting (nontimber products) during the study period (1999-2012). We also found significant spatiotemporal variation in the occurrence of all activities. Our results show the need to explicitly model ranger patrol effort to reduce biases from existing uncorrected or capture per unit effort analyses. Prioritization of ranger patrol strategies is needed to target illegal activities; these strategies are determined by protected area managers, and therefore changes at a site-level can be implemented quickly. These strategies should also be informed by the location of past occurrences of illegal activity: the most useful predictor of future events. However, because spatial and temporal changes in illegal activities occurred, regular patrols throughout the protected area, even in areas of low occurrence, are also required.
Collapse
Affiliation(s)
- R Critchlow
- Department of Biology, University of York, Wentworth Way, Y010 5DD, United Kingdom
| | - A J Plumptre
- Wildlife Conservation Society, Plot 802 Kiwafu Rd, Kansanga, P.O. Box 7487, Kampala, Uganda
| | - M Driciru
- Uganda Wildlife Authority, P.O. Box 3530, Kampala, Uganda
| | - A Rwetsiba
- Uganda Wildlife Authority, P.O. Box 3530, Kampala, Uganda
| | - E J Stokes
- Wildlife Conservation Society, Global Conservation, 2300 Southern Boulevard, Bronx, NY 10460, U.S.A
| | - C Tumwesigye
- Uganda Wildlife Authority, P.O. Box 3530, Kampala, Uganda
| | - F Wanyama
- Uganda Wildlife Authority, P.O. Box 3530, Kampala, Uganda
| | - C M Beale
- Department of Biology, University of York, Wentworth Way, Y010 5DD, United Kingdom
| |
Collapse
|
18
|
Abstract
As machine learning techniques mature and are used to tackle complex scientific problems, challenges arise such as the imbalanced class distribution problem, where one of the target class labels is under-represented in comparison with other classes. Existing oversampling approaches for addressing this problem typically do not consider the probability distribution of the minority class while synthetically generating new samples. As a result, the minority class is not well represented which leads to high misclassification error. We introduce two Gibbs sampling-based oversampling approaches, namely RACOG and wRACOG, to synthetically generating and strategically selecting new minority class samples. The Gibbs sampler uses the joint probability distribution of attributes of the data to generate new minority class samples in the form of Markov chain. While RACOG selects samples from the Markov chain based on a predefined lag, wRACOG selects those samples that have the highest probability of being misclassified by the existing learning model. We validate our approach using five UCI datasets that were carefully modified to exhibit class imbalance and one new application domain dataset with inherent extreme class imbalance. In addition, we compare the classification performance of the proposed methods with three other existing resampling techniques.
Collapse
|
19
|
Xu C, Wang W, Liu P, Zhang F. Development of a real-time crash risk prediction model incorporating the various crash mechanisms across different traffic states. Traffic Inj Prev 2014; 16:28-35. [PMID: 24697528 DOI: 10.1080/15389588.2014.909036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
OBJECTIVE This study aimed to identify the traffic flow variables contributing to crash risks under different traffic states and to develop a real-time crash risk model incorporating the varying crash mechanisms across different traffic states. METHODS The crash, traffic, and geometric data were collected on the I-880N freeway in California in 2008 and 2009. This study considered 4 different traffic states in Wu's 4-phase traffic theory. They are free fluid traffic, bunched fluid traffic, bunched congested traffic, and standing congested traffic. Several different statistical methods were used to accomplish the research objective. RESULTS The preliminary analysis showed that traffic states significantly affected crash likelihood, collision type, and injury severity. Nonlinear canonical correlation analysis (NLCCA) was conducted to identify the underlying phenomena that made certain traffic states more hazardous than others. The results suggested that different traffic states were associated with various collision types and injury severities. The matching of traffic flow characteristics and crash characteristics in NLCCA revealed how traffic states affected traffic safety. The logistic regression analyses showed that the factors contributing to crash risks were quite different across various traffic states. To incorporate the varying crash mechanisms across different traffic states, random parameters logistic regression was used to develop a real-time crash risk model. Bayesian inference based on Markov chain Monte Carlo simulations was used for model estimation. The parameters of traffic flow variables in the model were allowed to vary across different traffic states. Compared with the standard logistic regression model, the proposed model significantly improved the goodness-of-fit and predictive performance. CONCLUSIONS These results can promote a better understanding of the relationship between traffic flow characteristics and crash risks, which is valuable knowledge in the pursuit of improving traffic safety on freeways through the use of dynamic safety management systems.
Collapse
Affiliation(s)
- Chengcheng Xu
- a Jiangsu Key Laboratory of Urban ITS , Southeast University , Nanjing , China
| | | | | | | |
Collapse
|
20
|
McCauley P, Kalachev LV, Mollicone DJ, Banks S, Dinges DF, Van Dongen HPA. Dynamic circadian modulation in a biomathematical model for the effects of sleep and sleep loss on waking neurobehavioral performance. Sleep 2013; 36:1987-97. [PMID: 24293775 DOI: 10.5665/sleep.3246] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Recent experimental observations and theoretical advances have indicated that the homeostatic equilibrium for sleep/wake regulation--and thereby sensitivity to neurobehavioral impairment from sleep loss--is modulated by prior sleep/wake history. This phenomenon was predicted by a biomathematical model developed to explain changes in neurobehavioral performance across days in laboratory studies of total sleep deprivation and sustained sleep restriction. The present paper focuses on the dynamics of neurobehavioral performance within days in this biomathematical model of fatigue. Without increasing the number of model parameters, the model was updated by incorporating time-dependence in the amplitude of the circadian modulation of performance. The updated model was calibrated using a large dataset from three laboratory experiments on psychomotor vigilance test (PVT) performance, under conditions of sleep loss and circadian misalignment; and validated using another large dataset from three different laboratory experiments. The time-dependence of circadian amplitude resulted in improved goodness-of-fit in night shift schedules, nap sleep scenarios, and recovery from prior sleep loss. The updated model predicts that the homeostatic equilibrium for sleep/wake regulation--and thus sensitivity to sleep loss--depends not only on the duration but also on the circadian timing of prior sleep. This novel theoretical insight has important implications for predicting operator alertness during work schedules involving circadian misalignment such as night shift work.
Collapse
Affiliation(s)
- Peter McCauley
- Sleep and Performance Research Center, Washington State University, Spokane, WA ; Department of Mathematical Sciences, University of Montana, Missoula, MT
| | | | | | | | | | | |
Collapse
|
21
|
Abstract
Transmission events are the fundamental building blocks of the dynamics of any infectious disease. Much about the epidemiology of a disease can be learned when these individual transmission events are known or can be estimated. Such estimations are difficult and generally feasible only when detailed epidemiological data are available. The genealogy estimated from genetic sequences of sampled pathogens is another rich source of information on transmission history. Optimal inference of transmission events calls for the combination of genetic data and epidemiological data into one joint analysis. A key difficulty is that the transmission tree, which describes the transmission events between infected hosts, differs from the phylogenetic tree, which describes the ancestral relationships between pathogens sampled from these hosts. The trees differ both in timing of the internal nodes and in topology. These differences become more pronounced when a higher fraction of infected hosts is sampled. We show how the phylogenetic tree of sampled pathogens is related to the transmission tree of an outbreak of an infectious disease, by the within-host dynamics of pathogens. We provide a statistical framework to infer key epidemiological and mutational parameters by simultaneously estimating the phylogenetic tree and the transmission tree. We test the approach using simulations and illustrate its use on an outbreak of foot-and-mouth disease. The approach unifies existing methods in the emerging field of phylodynamics with transmission tree reconstruction methods that are used in infectious disease epidemiology.
Collapse
|
22
|
Elderd BD, Dwyer G, Dukic V. Population-level differences in disease transmission: a Bayesian analysis of multiple smallpox epidemics. Epidemics 2013; 5:146-56. [PMID: 24021521 PMCID: PMC3869526 DOI: 10.1016/j.epidem.2013.07.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2012] [Revised: 06/28/2013] [Accepted: 07/16/2013] [Indexed: 11/25/2022] Open
Abstract
Estimates of a disease's basic reproductive rate R0 play a central role in understanding outbreaks and planning intervention strategies. In many calculations of R0, a simplifying assumption is that different host populations have effectively identical transmission rates. This assumption can lead to an underestimate of the overall uncertainty associated with R0, which, due to the non-linearity of epidemic processes, may result in a mis-estimate of epidemic intensity and miscalculated expenditures associated with public-health interventions. In this paper, we utilize a Bayesian method for quantifying the overall uncertainty arising from differences in population-specific basic reproductive rates. Using this method, we fit spatial and non-spatial susceptible-exposed-infected-recovered (SEIR) models to a series of 13 smallpox outbreaks. Five outbreaks occurred in populations that had been previously exposed to smallpox, while the remaining eight occurred in Native-American populations that were naïve to the disease at the time. The Native-American outbreaks were close in a spatial and temporal sense. Using Bayesian Information Criterion (BIC), we show that the best model includes population-specific R0 values. These differences in R0 values may, in part, be due to differences in genetic background, social structure, or food and water availability. As a result of these inter-population differences, the overall uncertainty associated with the "population average" value of smallpox R0 is larger, a finding that can have important consequences for controlling epidemics. In general, Bayesian hierarchical models are able to properly account for the uncertainty associated with multiple epidemics, provide a clearer understanding of variability in epidemic dynamics, and yield a better assessment of the range of potential risks and consequences that decision makers face.
Collapse
Affiliation(s)
- Bret D Elderd
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA.
| | | | | |
Collapse
|
23
|
Abstract
Most modern population genetics inference methods are based on the coalescence framework. Methods that allow estimating parameters of structured populations commonly insert migration events into the genealogies. For these methods the calculation of the coalescence probability density of a genealogy requires a product over all time periods between events. Data sets that contain populations with high rates of gene flow among them require an enormous number of calculations. A new method, transition probability-structured coalescence (TPSC), replaces the discrete migration events with probability statements. Because the speed of calculation is independent of the amount of gene flow, this method allows calculating the coalescence densities efficiently. The current implementation of TPSC uses an approximation simplifying the interaction among lineages. Simulations and coverage comparisons of TPSC vs. MIGRATE show that TPSC allows estimation of high migration rates more precisely, but because of the approximation the estimation of low migration rates is biased. The implementation of TPSC into programs that calculate quantities on phylogenetic tree structures is straightforward, so the TPSC approach will facilitate more general inferences in many computer programs.
Collapse
|