1
|
Xue M, Chen Y. A Stan tutorial on Bayesian IRTree models: Conventional models and explanatory extension. Behav Res Methods 2024; 56:1817-1837. [PMID: 37095325 PMCID: PMC10124709 DOI: 10.3758/s13428-023-02121-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2023] [Indexed: 04/26/2023]
Abstract
IRTree models have been receiving increasing attention. However, to date, there are limited sources that provide a systematic introduction to Bayesian modeling techniques using modern probabilistic programming frameworks for the implementation of IRTree models. To facilitate the research and application of IRTree models, this paper introduces how to perform two families of Bayesian IRTree models (i.e., response tree models and latent tree models) in Stan and how to extend them in an explanatory way. Some suggestions on executing Stan codes and checking convergence are also provided. An empirical study based on the Oxford Achieving Resilience during COVID-19 data was conducted as an example to further illustrate how to apply Bayesian IRTree models to address research questions. Finally, strengths and future directions are discussed.
Collapse
Affiliation(s)
- Mingfeng Xue
- Berkeley School of Education, University of California Berkeley, Berkeley, CA, USA.
| | - Yi Chen
- Teachers College, Columbia University, New York, NY, USA
| |
Collapse
|
2
|
Andrade J, Duggan J. Anchoring the mean generation time in the SEIR to mitigate biases in ℜ 0 estimates due to uncertainty in the distribution of the epidemiological delays. R Soc Open Sci 2023; 10:230515. [PMID: 37538746 PMCID: PMC10394422 DOI: 10.1098/rsos.230515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 07/13/2023] [Indexed: 08/05/2023]
Abstract
The basic reproduction number, ℜ 0 , is of paramount importance in the study of infectious disease dynamics. Primarily, ℜ 0 serves as an indicator of the transmission potential of an emerging infectious disease and the effort required to control the invading pathogen. However, its estimates from compartmental models are strongly conditioned by assumptions in the model structure, such as the distributions of the latent and infectious periods (epidemiological delays). To further complicate matters, models with dissimilar delay structures produce equivalent incidence dynamics. Following a simulation study, we reveal that the nature of such equivalency stems from a linear relationship between ℜ 0 and the mean generation time, along with adjustments to other parameters in the model. Leveraging this knowledge, we propose and successfully test an alternative parametrization of the SEIR model that produces accurate ℜ 0 estimates regardless of the distribution of the epidemiological delays, at the expense of biases in other quantities deemed of lesser importance. We further explore this approach's robustness by testing various transmissibility levels, generation times and data fidelity (overdispersion). Finally, we apply the proposed approach to data from the 1918 influenza pandemic. We anticipate that this work will mitigate biases in estimating ℜ 0 .
Collapse
Affiliation(s)
- Jair Andrade
- Data Science Institute and School of Computer Science, University of Galway, Galway, Republic of Ireland
| | - Jim Duggan
- Insight Centre for Data Analytics and School of Computer Science, University of Galway, Galway, Republic of Ireland
| |
Collapse
|
3
|
Dorie V, Perrett G, Hill JL, Goodrich B. Stan and BART for Causal Inference: Estimating Heterogeneous Treatment Effects Using the Power of Stan and the Flexibility of Machine Learning. Entropy (Basel) 2022; 24:1782. [PMID: 36554187 PMCID: PMC9778579 DOI: 10.3390/e24121782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/22/2022] [Accepted: 11/06/2022] [Indexed: 06/17/2023]
Abstract
A wide range of machine-learning-based approaches have been developed in the past decade, increasing our ability to accurately model nonlinear and nonadditive response surfaces. This has improved performance for inferential tasks such as estimating average treatment effects in situations where standard parametric models may not fit the data well. These methods have also shown promise for the related task of identifying heterogeneous treatment effects. However, the estimation of both overall and heterogeneous treatment effects can be hampered when data are structured within groups if we fail to correctly model the dependence between observations. Most machine learning methods do not readily accommodate such structure. This paper introduces a new algorithm, stan4bart, that combines the flexibility of Bayesian Additive Regression Trees (BART) for fitting nonlinear response surfaces with the computational and statistical efficiencies of using Stan for the parametric components of the model. We demonstrate how stan4bart can be used to estimate average, subgroup, and individual-level treatment effects with stronger performance than other flexible approaches that ignore the multilevel structure of the data as well as multilevel approaches that have strict parametric forms.
Collapse
Affiliation(s)
| | - George Perrett
- Department of Applied Statistics, Social Science, and the Humanities, New York University, New York, NY 10003, USA
| | - Jennifer L. Hill
- Department of Applied Statistics, Social Science, and the Humanities, New York University, New York, NY 10003, USA
| | - Benjamin Goodrich
- Department of Political Science, Columbia University, New York, NY 10025, USA
| |
Collapse
|
4
|
Mingione M, Alaimo Di Loro P, Farcomeni A, Divino F, Lovison G, Maruotti A, Lasinio GJ. Spatio-temporal modelling of COVID-19 incident cases using Richards' curve: An application to the Italian regions. Spat Stat 2022; 49:100544. [PMID: 36407655 PMCID: PMC9643104 DOI: 10.1016/j.spasta.2021.100544] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 08/06/2021] [Accepted: 09/23/2021] [Indexed: 06/14/2023]
Abstract
We introduce an extended generalised logistic growth model for discrete outcomes, in which spatial and temporal dependence are dealt with the specification of a network structure within an Auto-Regressive approach. A major challenge concerns the specification of the network structure, crucial to consistently estimate the canonical parameters of the generalised logistic curve, e.g. peak time and height. We compared a network based on geographic proximity and one built on historical data of transport exchanges between regions. Parameters are estimated under the Bayesian framework, using Stan probabilistic programming language. The proposed approach is motivated by the analysis of both the first and the second wave of COVID-19 in Italy, i.e. from February 2020 to July 2020 and from July 2020 to December 2020, respectively. We analyse data at the regional level and, interestingly enough, prove that substantial spatial and temporal dependence occurred in both waves, although strong restrictive measures were implemented during the first wave. Accurate predictions are obtained, improving those of the model where independence across regions is assumed.
Collapse
Affiliation(s)
- Marco Mingione
- University of Rome "La Sapienza", Dpt. of Statistical Sciences, Rome, Italy
- Institute of Applied Computing "M. Picone" (IAC - CNR), Italy
| | | | - Alessio Farcomeni
- University of Rome "Tor Vergata", Dpt. of Economics and Finance, Italy
| | - Fabio Divino
- University of Molise, Dpt. of Bio-Sciences, Italy
| | - Gianfranco Lovison
- University of Palermo, Dpt. of Economics, Management and Statistics, Italy
- Swiss TPH, Dpt. of Epidemiology and Public Health, Switzerland
| | - Antonello Maruotti
- Libera Università Maria Ss Assunta, Dpt. GEPLI, Italy
- University of Bergen, Dpt. of Mathematics, Norway
| | | |
Collapse
|
5
|
Bollen M, Neyens T, Fajgenblat M, De Waele V, Licoppe A, Manet B, Casaer J, Beenaerts N. Managing African Swine Fever: Assessing the Potential of Camera Traps in Monitoring Wild Boar Occupancy Trends in Infected and Non-infected Zones, Using Spatio-Temporal Statistical Models. Front Vet Sci 2021; 8:726117. [PMID: 34712721 PMCID: PMC8546189 DOI: 10.3389/fvets.2021.726117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/13/2021] [Indexed: 11/13/2022] Open
Abstract
The recent spreading of African swine fever (ASF) over the Eurasian continent has been acknowledged as a serious economic threat for the pork industry. Consequently, an extensive body of research focuses on the epidemiology and control of ASF. Nevertheless, little information is available on the combined effect of ASF and ASF-related control measures on wild boar (Sus scrofa) population abundances. This is crucial information given the role of the remaining wild boar that act as an important reservoir of the disease. Given the high potential of camera traps as a non-invasive method for ungulate trend estimation, we assess the effectiveness of ASF control measures using a camera trap network. In this study, we focus on a major ASF outbreak in 2018-2020 in the South of Belgium. This outbreak elicited a strong management response, both in terms of fencing off a large infected zone as well as an intensive culling regime. We apply a Bayesian multi-season site-occupancy model to wild boar detection/non-detection data. Our results show that (1) occupancy rates at the onset of our monitoring period reflect the ASF infection status; (2) ASF-induced mortality and culling efforts jointly lead to decreased occupancy over time; and (3) the estimated mean total extinction rate ranges between 22.44 and 91.35%, depending on the ASF infection status. Together, these results confirm the effectiveness of ASF control measures implemented in Wallonia (Belgium), which has regained its disease-free status in December 2020, as well as the usefulness of a camera trap network to monitor these effects.
Collapse
Affiliation(s)
- Martijn Bollen
- Centre for Environmental Sciences, UHasselt – Hasselt University, Hasselt, Belgium
- Data Science Institute, UHasselt – Hasselt University, Hasselt, Belgium
- Research Institute Nature and Forest, Brussels, Belgium
| | - Thomas Neyens
- Data Science Institute, UHasselt – Hasselt University, Hasselt, Belgium
| | - Maxime Fajgenblat
- Data Science Institute, UHasselt – Hasselt University, Hasselt, Belgium
- Laboratory of Aquatic Ecology, Evolution and Conservation, KU Leuven – Leuven University, Leuven, Belgium
| | - Valérie De Waele
- Department of Natural and Agricultural Environment Studies, Public Service of Wallonia, Gembloux, Belgium
| | - Alain Licoppe
- Department of Natural and Agricultural Environment Studies, Public Service of Wallonia, Gembloux, Belgium
| | - Benoît Manet
- Department of Natural and Agricultural Environment Studies, Public Service of Wallonia, Gembloux, Belgium
| | - Jim Casaer
- Research Institute Nature and Forest, Brussels, Belgium
| | - Natalie Beenaerts
- Centre for Environmental Sciences, UHasselt – Hasselt University, Hasselt, Belgium
| |
Collapse
|
6
|
Abstract
A central theme in the field of survey statistics is estimating population-level quantities through data coming from potentially non-representative samples of the population. Multilevel regression and poststratification (MRP), a model-based approach, is gaining traction against the traditional weighted approach for survey estimates. MRP estimates are susceptible to bias if there is an underlying structure that the methodology does not capture. This work aims to provide a new framework for specifying structured prior distributions that lead to bias reduction in MRP estimates. We use simulation studies to explore the benefit of these prior distributions and demonstrate their efficacy on non-representative US survey data. We show that structured prior distributions offer absolute bias reduction and variance reduction for posterior MRP estimates in a large variety of data regimes.
Collapse
Affiliation(s)
- Yuxiang Gao
- Department of Statistical Sciences, University of Toronto, Canada
| | - Lauren Kennedy
- Columbia Population Research Center and Department of Statistics, Columbia University, New York, NY
| | - Daniel Simpson
- Department of Statistical Sciences, University of Toronto, Canada
| | - Andrew Gelman
- Department of Statistics and Department of Political Science, Columbia University, New York, NY
| |
Collapse
|
7
|
Leiva-Yamaguchi V, Alvares D. A Two-Stage Approach for Bayesian Joint Models of Longitudinal and Survival Data: Correcting Bias with Informative Prior. Entropy (Basel) 2020; 23:e23010050. [PMID: 33396212 PMCID: PMC7824570 DOI: 10.3390/e23010050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/21/2020] [Accepted: 12/27/2020] [Indexed: 11/28/2022]
Abstract
Joint models of longitudinal and survival outcomes have gained much popularity in recent years, both in applications and in methodological development. This type of modelling is usually characterised by two submodels, one longitudinal (e.g., mixed-effects model) and one survival (e.g., Cox model), which are connected by some common term. Naturally, sharing information makes the inferential process highly time-consuming. In particular, the Bayesian framework requires even more time for Markov chains to reach stationarity. Hence, in order to reduce the modelling complexity while maintaining the accuracy of the estimates, we propose a two-stage strategy that first fits the longitudinal submodel and then plug the shared information into the survival submodel. Unlike a standard two-stage approach, we apply a correction by incorporating an individual and multiplicative fixed-effect with informative prior into the survival submodel. Based on simulation studies and sensitivity analyses, we empirically compare our proposal with joint specification and standard two-stage approaches. The results show that our methodology is very promising, since it reduces the estimation bias compared to the other two-stage method and requires less processing time than the joint specification approach.
Collapse
|
8
|
Suzuki Y, Tanaka N, Akiyama H. Attempt of Bayesian Estimation from Left-censored Data Using the Markov Chain Monte Carlo Method: Exploring Cr(VI) Concentrations in Mineral Water Products. Food Saf (Tokyo) 2020; 8:67-89. [PMID: 33409115 PMCID: PMC7765759 DOI: 10.14252/foodsafetyfscj.d-20-00007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Accepted: 10/14/2020] [Indexed: 11/26/2022] Open
Abstract
Hexavalent chromium (Cr(VI)) is toxic, carcinogenic, and mutagenic substances. Oral exposure to Cr(VI) is thought to be primarily from drinking water. However, under the certain reporting limit (~0.1 µg/L), percentage of Cr(VI) concentration in mineral water products under the reporting limit were estimated higher than 50%. Data whose values are below certain limits and thus cannot be accurately determined are known as left-censored. The high censored percentage leads to estimation of Cr(VI) exposure uncertain. It is well known that conventional substitution method often used in food analytical science cause severe bias. To estimate appropriate summary statistics on Cr(VI) concentration in mineral water products, parameter estimation using the Markov chain Monte Carlo (MCMC) method under assumption of a lognormal distribution was performed. Stan, a probabilistic programming language, was used for MCMC. We evaluated the accuracy, coverage probability, and reliability of estimates with MCMC by comparison with other estimation methods (discard nondetects, substituting half of reporting limit, Kaplan-Meier, regression on order statistics, and maximum likelihood estimation) using 1000 randomly generated data subsets (n = 150) with the obtained parameters. The evaluation shows that MCMC is the best estimation method in this context with greater accuracy, coverage probability, and reliability over a censored percentage of 10-90%. The mean concentration, which was estimated with MCMC, was 0.289×10-3 mg/L and this value was sufficiently lower than the regulated value of 0.05 mg/L stipulated by the Food Sanitation Act.
Collapse
Affiliation(s)
- Yoshinari Suzuki
- Division of Foods, National Institute of Health Science,
Tonomachi 3-25-26, Kawasaki-ku, Kawasaki, Kanagawa 210-9501, Japan
| | - Noriko Tanaka
- Department of Health Data Science Research, Healthy Aging
Innovation Center, Tokyo Metropolitan Geriatric Medical Center, Sakae-cho 35-2,
Itabashi-ku, Tokyo 173-0015, Japan
| | - Hiroshi Akiyama
- Division of Foods, National Institute of Health Science,
Tonomachi 3-25-26, Kawasaki-ku, Kawasaki, Kanagawa 210-9501, Japan
| |
Collapse
|
9
|
Andrade J, Duggan J. An evaluation of Hamiltonian Monte Carlo performance to calibrate age-structured compartmental SEIR models to incidence data. Epidemics 2020; 33:100415. [PMID: 33212347 DOI: 10.1016/j.epidem.2020.100415] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 10/28/2020] [Accepted: 10/28/2020] [Indexed: 11/20/2022] Open
Abstract
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo method to estimate unknown quantities through sample generation from a target distribution for which an analytical solution is difficult. The strength of this method lies in its geometrical foundations, which render it efficient for traversing high-dimensional spaces. First, this paper analyses the performance of HMC in calibrating five variants of inputs to an age-structured SEIR model. Four of these variants are related to restriction assumptions that modellers devise to handle high-dimensional parameter spaces. The other one corresponds to the unrestricted symmetric variant. To provide a robust analysis, we compare HMC's performance to that of the Nelder-Mead algorithm (NMS), a common method for non-linear optimisation. Furthermore, the calibration is performed on synthetic data in order to avoid confounding effects from errors in model selection. Then, we explore the variation in the method's performance due to changes in the scale of the problem. Finally, we fit an SEIR model to real data. In all the experiments, the results show that HMC approximates both the synthetic and real data accurately, and provides reliable estimates for the basic reproduction number and the age-dependent transmission rates. HMC's performance is robust in the presence of underreported incidences and high-dimensional complexity. This study suggests that stringent assumptions on age-dependent transmission rates can be lifted in favour of more realistic representations. The supplementary section presents the full set of results.
Collapse
|
10
|
Bürkner PC, Charpentier E. Modelling monotonic effects of ordinal predictors in Bayesian regression models. Br J Math Stat Psychol 2020; 73:420-451. [PMID: 31943157 DOI: 10.1111/bmsp.12195] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 10/02/2019] [Indexed: 05/07/2023]
Abstract
Ordinal predictors are commonly used in regression models. They are often incorrectly treated as either nominal or metric, thus under- or overestimating the information contained. Such practices may lead to worse inference and predictions compared to methods which are specifically designed for this purpose. We propose a new method for modelling ordinal predictors that applies in situations in which it is reasonable to assume their effects to be monotonic. The parameterization of such monotonic effects is realized in terms of a scale parameter b representing the direction and size of the effect and a simplex parameter ς modelling the normalized differences between categories. This ensures that predictions increase or decrease monotonically, while changes between adjacent categories may vary across categories. This formulation generalizes to interaction terms as well as multilevel structures. Monotonic effects may be applied not only to ordinal predictors, but also to other discrete variables for which a monotonic relationship is plausible. In simulation studies we show that the model is well calibrated and, if there is monotonicity present, exhibits predictive performance similar to or even better than other approaches designed to handle ordinal predictors. Using Stan, we developed a Bayesian estimation method for monotonic effects which allows us to incorporate prior information and to check the assumption of monotonicity. We have implemented this method in the R package brms, so that fitting monotonic effects in a fully Bayesian framework is now straightforward.
Collapse
|
11
|
Abstract
We propose a dyadic Item Response Theory (dIRT) model for measuring interactions of pairs of individuals when the responses to items represent the actions (or behaviors, perceptions, etc.) of each individual (actor) made within the context of a dyad formed with another individual (partner). Examples of its use include the assessment of collaborative problem solving or the evaluation of intra-team dynamics. The dIRT model generalizes both Item Response Theory models for measurement and the Social Relations Model for dyadic data. The responses of an actor when paired with a partner are modeled as a function of not only the actor's inclination to act and the partner's tendency to elicit that action, but also the unique relationship of the pair, represented by two directional, possibly correlated, interaction latent variables. Generalizations are discussed, such as accommodating triads or larger groups. Estimation is performed using Markov-chain Monte Carlo implemented in Stan, making it straightforward to extend the dIRT model in various ways. Specifically, we show how the basic dIRT model can be extended to accommodate latent regressions, multilevel settings with cluster-level random effects, as well as joint modeling of dyadic data and a distal outcome. A simulation study demonstrates that estimation performs well. We apply our proposed approach to speed-dating data and find new evidence of pairwise interactions between participants, describing a mutual attraction that is inadequately characterized by individual properties alone.
Collapse
Affiliation(s)
- Brian Gin
- University of California, San Francisco, San Francisco, USA
| | - Nicholas Sim
- University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA, 94720, USA.
| | - Anders Skrondal
- Norwegian Institute of Public Health, Oslo, Norway
- University of Oslo, Oslo, Norway
- University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA, 94720, USA
| | - Sophia Rabe-Hesketh
- University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA, 94720, USA
| |
Collapse
|
12
|
Liu Y, Chen Q. Bayesian Inference of Finite Population Quantiles for Skewed Survey Data Using Skew-Normal Penalized Spline Regression. J Surv Stat Methodol 2020; 8:792-816. [PMID: 32923492 PMCID: PMC7473425 DOI: 10.1093/jssam/smz016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Skewed data are common in sample surveys. In probability proportional to size sampling, we propose two Bayesian model-based predictive methods for estimating finite population quantiles with skewed sample survey data. We assume the survey outcome to follow a skew-normal distribution given the probability of selection and model the location and scale parameters of the skew-normal distribution as functions of the probability of selection. To allow a flexible association between the survey outcome and the probability of selection, the first method models the location parameter with a penalized spline and the scale parameter with a polynomial function, while the second method models both the location and scale parameters with penalized splines. Using a fully Bayesian approach, we obtain the posterior predictive distributions of the nonsampled units in the population and thus the posterior distributions of the finite population quantiles. We show through simulations that our proposed methods are more efficient and yield shorter credible intervals with better coverage rates than the conventional weighted method in estimating finite population quantiles. We demonstrate the application of our proposed methods using data from the 2013 National Drug Abuse Treatment System Survey.
Collapse
Affiliation(s)
- Yutao Liu
- Yutao Liu is a PhD candidate and Qixuan Chen is Associate Professor of Biostatistics in the Department of Biostatistics, Mailman School of Public Health, Columbia University, New York City, NY, USA
| | - Qixuan Chen
- Yutao Liu is a PhD candidate and Qixuan Chen is Associate Professor of Biostatistics in the Department of Biostatistics, Mailman School of Public Health, Columbia University, New York City, NY, USA
| |
Collapse
|
13
|
Günhan BK, Weber S, Friede T. A Bayesian time-to-event pharmacokinetic model for phase I dose-escalation trials with multiple schedules. Stat Med 2020; 39:3986-4000. [PMID: 32797729 DOI: 10.1002/sim.8703] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 04/21/2020] [Accepted: 06/30/2020] [Indexed: 11/07/2022]
Abstract
Phase I dose-escalation trials must be guided by a safety model in order to avoid exposing patients to unacceptably high risk of toxicities. Traditionally, these trials are based on one type of schedule. In more recent practice, however, there is often a need to consider more than one schedule, which means that in addition to the dose itself, the schedule needs to be varied in the trial. Hence, the aim is finding an acceptable dose-schedule combination. However, most established methods for dose-escalation trials are designed to escalate the dose only and ad hoc choices must be made to adapt these to the more complicated setting of finding an acceptable dose-schedule combination. In this article, we introduce a Bayesian time-to-event model which takes explicitly the dose amount and schedule into account through the use of pharmacokinetic principles. The model uses a time-varying exposure measure to account for the risk of a dose-limiting toxicity over time. The dose-schedule decisions are informed by an escalation with overdose control criterion. The model is formulated using interpretable parameters which facilitates the specification of priors. In a simulation study, we compared the proposed method with an existing method. The simulation study demonstrates that the proposed method yields similar or better results compared with an existing method in terms of recommending acceptable dose-schedule combinations, yet reduces the number of patients enrolled in most of scenarios. The R and Stan code to implement the proposed method is publicly available from Github ( https://github.com/gunhanb/TITEPK_code).
Collapse
Affiliation(s)
- Burak Kürsad Günhan
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Sebastian Weber
- Advanced Exploratory Analytics, Novartis Pharma AG, Basel, Switzerland
| | - Tim Friede
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
14
|
Teng KTY, Martinez Avilés M, Ugarte-Ruiz M, Barcena C, de la Torre A, Lopez G, Moreno MA, Dominguez L, Alvarez J. Spatial Trends in Salmonella Infection in Pigs in Spain. Front Vet Sci 2020; 7:345. [PMID: 32656254 PMCID: PMC7325609 DOI: 10.3389/fvets.2020.00345] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 05/18/2020] [Indexed: 12/13/2022] Open
Abstract
Salmonella is one of the most important foodborne pathogens worldwide. Its main reservoirs are poultry and pigs, in which infection is endemic in many countries. Spain has one of the largest pig populations in the world. Even though Salmonella infection is commonly detected in pig farms, its spatial distribution at the national level is poorly understood. Here we aimed to report the spatial distribution of Salmonella-positive pig farms in Spain and investigate the presence of potential spatial trends over a 17-year period. For this, data on samples from pigs tested for Salmonella in 2002-2013, 2015, 2017, and 2019 as part of the Spanish Veterinary Antimicrobial Resistance Surveillance program, representing 3,730 farms were analyzed. The spatial distribution and clustering of Salmonella-positive pig farms at the province level were explored using spatial empirical Bayesian smoothing and global Moran's I, local Moran's I, and the Poisson model of the spatial scan statistics. Bayesian spatial regression using a reparameterized Besag-York-Mollié Poisson model (BYM2 model) was then performed to quantify the presence of spatially structured and unstructured effects while accounting for the effect of potential risk factors for Salmonella infection at the province level. The overall proportion of Salmonella-positive farms was 37.8% (95% confidence interval: 36.2-39.4). Clusters of positive farms were detected in the East and Northeast of Spain. The Bayesian spatial regression revealed a West-to-East increase in the risk of Salmonella infection at the province level, with 65.2% (50% highest density interval: 70-100.0%) of this spatial pattern being explained by the spatially structured component. Our results demonstrate the existence of a spatial variation in the risk of Salmonella infection in pig farms at the province level in Spain. This information can help to optimize risk-based Salmonella surveillance programs in Spain, although further research to identify farm-level factors explaining this pattern are needed.
Collapse
Affiliation(s)
- Kendy Tzu-yun Teng
- VISAVET Health Surveillance Center, Universidad Complutense, Madrid, Spain
| | - Marta Martinez Avilés
- Center for Animal Health Research, National Institute of Agricultural and Food Research and Technology, Madrid, Spain
| | - Maria Ugarte-Ruiz
- VISAVET Health Surveillance Center, Universidad Complutense, Madrid, Spain
| | - Carmen Barcena
- VISAVET Health Surveillance Center, Universidad Complutense, Madrid, Spain
| | - Ana de la Torre
- Center for Animal Health Research, National Institute of Agricultural and Food Research and Technology, Madrid, Spain
| | - Gema Lopez
- Ministerio de Agricultura, Alimentación y Medio Ambiente (Spain), Madrid, Spain
| | - Miguel A. Moreno
- VISAVET Health Surveillance Center, Universidad Complutense, Madrid, Spain
- Department of Animal Health, Faculty of Veterinary Medicine, Universidad Complutense, Madrid, Spain
| | - Lucas Dominguez
- VISAVET Health Surveillance Center, Universidad Complutense, Madrid, Spain
- Department of Animal Health, Faculty of Veterinary Medicine, Universidad Complutense, Madrid, Spain
| | - Julio Alvarez
- VISAVET Health Surveillance Center, Universidad Complutense, Madrid, Spain
- Department of Animal Health, Faculty of Veterinary Medicine, Universidad Complutense, Madrid, Spain
| |
Collapse
|
15
|
Wang C, Colantuoni E, Leroux A, Scharfstein DO. idem: An R Package for Inferences in Clinical Trials with Death and Missingness. J Stat Softw 2020; 93:12. [PMID: 33273895 PMCID: PMC7710152 DOI: 10.18637/jss.v093.i12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
In randomized controlled trials of seriously ill patients, death is common and often defined as the primary endpoint. Increasingly, non-mortality outcomes such as functional outcomes are co-primary or secondary endpoints. Functional outcomes are not defined for patients who die, referred to as "truncation due to death", and among survivors, functional outcomes are often unobserved due to missed clinic visits or loss to follow-up. It is well known that if the functional outcomes "truncated due to death" or missing are handled inappropriately, treatment effect estimation can be biased. In this paper, we describe the package idem that implements a procedure for comparing treatments that is based on a composite endpoint of mortality and the functional outcome among survivors. Among survivors, the procedure incorporates a missing data imputation procedure with a sensitivity analysis strategy. A web-based graphical user interface is provided in the idem package to facilitate users conducting the proposed analysis in an interactive and user-friendly manner. We demonstrate idem using data from a recent trial of sedation interruption among mechanically ventilated patients.
Collapse
Affiliation(s)
- Chenguang Wang
- Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, 550 N. Broadway Suite 1103, Baltimore MD, 21205, United States of America
| | - Elizabeth Colantuoni
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore MD, 21205, United States of America
| | - Andrew Leroux
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore MD, 21205, United States of America
| | - Daniel O Scharfstein
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore MD, 21205, United States of America
| |
Collapse
|
16
|
Mutshinda CM, Irwin AJ, Sillanpää MJ. A Bayesian Framework for Robust Quantitative Trait Locus Mapping and Outlier Detection. Int J Biostat 2020; 16:/j/ijb.ahead-of-print/ijb-2019-0038/ijb-2019-0038.xml. [PMID: 32061165 DOI: 10.1515/ijb-2019-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 02/04/2020] [Indexed: 02/28/2024]
Abstract
We introduce a Bayesian framework for simultaneous feature selection and outlier detection in sparse high-dimensional regression models, with a focus on quantitative trait locus (QTL) mapping in experimental crosses. More specifically, we incorporate the robust mean shift outlier handling mechanism into the multiple QTL mapping regression model and apply LASSO regularization concurrently to the genetic effects and the mean-shift terms through the flexible extended Bayesian LASSO (EBL) prior structure, thereby combining QTL mapping and outlier detection into a single sparse model representation problem. The EBL priors on the mean-shift terms prevent outlying phenotypic values from distorting the genotype-phenotype association and allow their detection as cases with outstanding mean shift values following the LASSO shrinkage. Simulation results demonstrate the effectiveness of our new methodology at mapping QTLs in the presence of outlying phenotypic values and simultaneously identifying the potential outliers, while maintaining a comparable performance to the standard EBL on outlier-free data.
Collapse
Affiliation(s)
- Crispin M Mutshinda
- Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada
| | - Andrew J Irwin
- Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada
| | - Mikko J Sillanpää
- Department of Mathematical Sciences, University of Oulu, Oulu, Finland
| |
Collapse
|
17
|
Abstract
Raven's Standard Progressive Matrices (SPM) test and related matrix-based tests are widely applied measures of cognitive ability. Using Bayesian Item Response Theory (IRT) models, I reanalyzed data of an SPM short form proposed by Myszkowski and Storme (2018) and, at the same time, illustrate the application of these models. Results indicate that a three-parameter logistic (3PL) model is sufficient to describe participants dichotomous responses (correct vs. incorrect) while persons' ability parameters are quite robust across IRT models of varying complexity. These conclusions are in line with the original results of Myszkowski and Storme (2018). Using Bayesian as opposed to frequentist IRT models offered advantages in the estimation of more complex (i.e., 3-4PL) IRT models and provided more sensible and robust uncertainty estimates.
Collapse
|
18
|
Abstract
The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such models is the implementation of bespoke inference methodologies, a process which can become quite laborious, when multiple candidate models are being considered. We describe automatic, black-box inference in such models using Stan, a popular probabilistic programming language. The utilization of systems like Stan can facilitate model prototyping and testing, thus accelerating the data modeling process. The code described in this chapter can be found at https://github.com/dvav/eQTLBookChapter .
Collapse
Affiliation(s)
- Dimitrios V Vavoulis
- Department of Oncology, University of Oxford, Oxford, UK.
- The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
- NHS Translational Molecular Diagnostics Centre, Oxford University Hospitals, Oxford, UK.
- NIHR Oxford Biomedical Research Centre, Oxford, UK.
| |
Collapse
|
19
|
Fourment M, Darling AE. Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 2019; 7:e8272. [PMID: 31976168 PMCID: PMC6966998 DOI: 10.7717/peerj.8272] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/22/2019] [Indexed: 12/21/2022] Open
Abstract
Recent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.
Collapse
Affiliation(s)
- Mathieu Fourment
- ithree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Aaron E. Darling
- ithree Institute, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
20
|
Hance DJ, Perry RW, Plumb JM, Pope AC. A temporally stratified extension of space-for-time Cormack-Jolly-Seber for migratory animals. Biometrics 2019; 76:900-912. [PMID: 31729008 DOI: 10.1111/biom.13171] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 10/15/2019] [Accepted: 10/21/2019] [Indexed: 11/27/2022]
Abstract
Understanding drivers of temporal variation in demographic parameters is a central goal of mark-recapture analysis. To estimate the survival of migrating animal populations in migration corridors, space-for-time mark-recapture models employ discrete sampling locations in space to monitor marked populations as they move past monitoring sites, rather than the standard practice of using fixed sampling points in time. Because these models focus on estimating survival over discrete spatial segments, model parameters are implicitly integrated over the temporal dimension. Furthermore, modeling the effect of time-varying covariates on model parameters is complicated by unknown passage times for individuals that are not detected at monitoring sites. To overcome these limitations, we extended the Cormack-Jolly-Seber (CJS) framework to estimate temporally stratified survival and capture probabilities by including a discretized arrival time process in a Bayesian framework. We allow for flexibility in the model form by including temporally stratified covariates and hierarchical structures. In addition, we provide tools for assessing model fit and comparing among alternative structural models for the parameters. We demonstrate our framework by fitting three competing models to estimate daily survival, capture, and arrival probabilities at four hydroelectric dams for over 200 000 individually tagged migratory juvenile salmon released into the Snake River, USA.
Collapse
Affiliation(s)
- Dalton J Hance
- U.S. Geological Survey, Western Fisheries Research Center, 5501-A Cook-Underwood Road, Cook, Washington, USA
| | - Russell W Perry
- U.S. Geological Survey, Western Fisheries Research Center, 5501-A Cook-Underwood Road, Cook, Washington, USA
| | - John M Plumb
- U.S. Geological Survey, Western Fisheries Research Center, 5501-A Cook-Underwood Road, Cook, Washington, USA
| | - Adam C Pope
- U.S. Geological Survey, Western Fisheries Research Center, 5501-A Cook-Underwood Road, Cook, Washington, USA
| |
Collapse
|
21
|
Chatzilena A, van Leeuwen E, Ratmann O, Baguelin M, Demiris N. Contemporary statistical inference for infectious disease models using Stan. Epidemics 2019; 29:100367. [PMID: 31591003 DOI: 10.1016/j.epidem.2019.100367] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 08/20/2019] [Accepted: 08/30/2019] [Indexed: 11/22/2022] Open
Abstract
This paper is concerned with the application of recent statistical advances to inference of infectious disease dynamics. We describe the fitting of a class of epidemic models using Hamiltonian Monte Carlo and variational inference as implemented in the freely available Stan software. We apply the two methods to real data from outbreaks as well as routinely collected observations. Our results suggest that both inference methods are computationally feasible in this context, and show a trade-off between statistical efficiency versus computational speed. The latter appears particularly relevant for real-time applications.
Collapse
|
22
|
Abstract
Forced-choice questionnaires have been proposed to avoid common response biases typically associated with rating scale questionnaires. To overcome ipsativity issues of trait scores obtained from classical scoring approaches of forced-choice items, advanced methods from item response theory (IRT) such as the Thurstonian IRT model have been proposed. For convenient model specification, we introduce the thurstonianIRT R package, which uses Mplus, lavaan, and Stan for model estimation. Based on practical considerations, we establish that items within one block need to be equally keyed to achieve similar social desirability, which is essential for creating forced-choice questionnaires that have the potential to resist faking intentions. According to extensive simulations, measuring up to five traits using blocks of only equally keyed items does not yield sufficiently accurate trait scores and inter-trait correlation estimates, neither for frequentist nor for Bayesian estimation methods. As a result, persons' trait scores remain partially ipsative and, thus, do not allow for valid comparisons between persons. However, we demonstrate that trait scores based on only equally keyed blocks can be improved substantially by measuring a sizable number of traits. More specifically, in our simulations of 30 traits, scores based on only equally keyed blocks were non-ipsative and highly accurate. We conclude that in high-stakes situations where persons are motivated to give fake answers, Thurstonian IRT models should only be applied to tests measuring a sizable number of traits.
Collapse
|
23
|
Chen G, Xiao Y, Taylor PA, Rajendra JK, Riggins T, Geng F, Redcay E, Cox RW. Handling Multiplicity in Neuroimaging Through Bayesian Lenses with Multilevel Modeling. Neuroinformatics 2019; 17:515-545. [PMID: 30649677 PMCID: PMC6635105 DOI: 10.1007/s12021-018-9409-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Here we address the current issues of inefficiency and over-penalization in the massively univariate approach followed by the correction for multiple testing, and propose a more efficient model that pools and shares information among brain regions. Using Bayesian multilevel (BML) modeling, we control two types of error that are more relevant than the conventional false positive rate (FPR): incorrect sign (type S) and incorrect magnitude (type M). BML also aims to achieve two goals: 1) improving modeling efficiency by having one integrative model and thereby dissolving the multiple testing issue, and 2) turning the focus of conventional null hypothesis significant testing (NHST) on FPR into quality control by calibrating type S errors while maintaining a reasonable level of inference efficiency. The performance and validity of this approach are demonstrated through an application at the region of interest (ROI) level, with all the regions on an equal footing: unlike the current approaches under NHST, small regions are not disadvantaged simply because of their physical size. In addition, compared to the massively univariate approach, BML may simultaneously achieve increased spatial specificity and inference efficiency, and promote results reporting in totality and transparency. The benefits of BML are illustrated in performance and quality checking using an experimental dataset. The methodology also avoids the current practice of sharp and arbitrary thresholding in the p-value funnel to which the multidimensional data are reduced. The BML approach with its auxiliary tools is available as part of the AFNI suite for general use.
Collapse
Affiliation(s)
- Gang Chen
- Scientific and Statistical Computing Core, National Institute of Mental Health, Bethesda, MD, USA.
| | - Yaqiong Xiao
- Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Paul A Taylor
- Scientific and Statistical Computing Core, National Institute of Mental Health, Bethesda, MD, USA
| | - Justin K Rajendra
- Scientific and Statistical Computing Core, National Institute of Mental Health, Bethesda, MD, USA
| | - Tracy Riggins
- Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Fengji Geng
- Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Elizabeth Redcay
- Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Robert W Cox
- Scientific and Statistical Computing Core, National Institute of Mental Health, Bethesda, MD, USA
| |
Collapse
|
24
|
Morris M, Wheeler-Martin K, Simpson D, Mooney SJ, Gelman A, DiMaggio C. Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan. Spat Spatiotemporal Epidemiol 2019; 31:100301. [PMID: 31677766 DOI: 10.1016/j.sste.2019.100301] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 08/05/2019] [Accepted: 08/06/2019] [Indexed: 10/26/2022]
Abstract
This report presents a new implementation of the Besag-York-Mollié (BYM) model in Stan, a probabilistic programming platform which does full Bayesian inference using Hamiltonian Monte Carlo (HMC). We review the spatial auto-correlation models used for areal data and disease risk mapping, and describe the corresponding Stan implementations. We also present a case study using Stan to fit a BYM model for motor vehicle crashes injuring school-age pedestrians in New York City from 2005 to 2014 localized to census tracts. Stan efficiently fit our multivariable BYM model having a large number of observations (n=2095 census tracts) with small outcome counts < 10 in the majority of tracts. Our findings reinforced that neighborhood income and social fragmentation are significant correlates of school-age pedestrian injuries. We also observed that nationally-available census tract estimates of commuting methods may serve as a useful indicator of underlying pedestrian densities.
Collapse
Affiliation(s)
- Mitzi Morris
- Institute for Social and Economic Research and Policy, Columbia University, New York, NY, United States
| | | | - Dan Simpson
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Stephen J Mooney
- Department of Epidemiology, University of Washington, Seattle, WA, United States
| | - Andrew Gelman
- Department of Statistics, Columbia University, New York, NY, United States
| | - Charles DiMaggio
- Department of Surgery, New York University School of Medicine, New York, NY, United States
| |
Collapse
|
25
|
Tsiros P, Bois FY, Dokoumetzidis A, Tsiliki G, Sarimveis H. Population pharmacokinetic reanalysis of a Diazepam PBPK model: a comparison of Stan and GNU MCSim. J Pharmacokinet Pharmacodyn 2019; 46:173-192. [PMID: 30949914 DOI: 10.1007/s10928-019-09630-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Accepted: 03/25/2019] [Indexed: 11/29/2022]
Abstract
The aim of this study is to benchmark two Bayesian software tools, namely Stan and GNU MCSim, that use different Markov chain Monte Carlo (MCMC) methods for the estimation of physiologically based pharmacokinetic (PBPK) model parameters. The software tools were applied and compared on the problem of updating the parameters of a Diazepam PBPK model, using time-concentration human data. Both tools produced very good fits at the individual and population levels, despite the fact that GNU MCSim is not able to consider multivariate distributions. Stan outperformed GNU MCSim in sampling efficiency, due to its almost uncorrelated sampling. However, GNU MCSim exhibited much faster convergence and performed better in terms of effective samples produced per unit of time.
Collapse
Affiliation(s)
- Periklis Tsiros
- School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechneiou Street, Zografou Campus, 15780, Athens, Greece
| | - Frederic Y Bois
- Unit Modles pour l'Ecotoxicologie et la Toxicologie (METO), Institut National de l'Environnement Industriel et des Risques (INERIS), Parc ALATA BP2, 60550, Verneuil en Halatte, France
| | - Aristides Dokoumetzidis
- Department of Pharmacy, University of Athens, Panepistimiopolis Zografou, 15784, Athens, Greece
| | - Georgia Tsiliki
- ATHENA Research and Innovation Centre, Artemidos 6 & Epidavrou, Marousi, Athens, 15125, Greece
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 9 Heroon Polytechneiou Street, Zografou Campus, 15780, Athens, Greece.
| |
Collapse
|
26
|
Liu Z, Diana A, Slater C, Preston T, Gibson RS, Houghton L, Duffull SB. Development of a nonlinear hierarchical model to describe the disposition of deuterium in mother-infant pairs to assess exclusive breastfeeding practice. J Pharmacokinet Pharmacodyn 2019; 46:1-13. [PMID: 30430351 PMCID: PMC6394541 DOI: 10.1007/s10928-018-9613-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 11/08/2018] [Indexed: 01/24/2023]
Abstract
The World Health Organization recommends exclusive breastfeeding (EBF) for the first 6 months after birth. The deuterium oxide dose-to-the-mother (DTM) technique is used to distinguish EBF based on a cut-off (< 25 g/day) of water intake from sources other than breastmilk. This value is based on a theoretical threshold and has not been verified in field studies. The aim of this study was to estimate the water intake cut-off value that can be used to define EBF practice. One hundred and twenty-one healthy infants, aged 2.5-5.5 months who were deemed to be EBF were recruited. After administration of deuterium to the mothers, saliva was sampled from mother and infant pairs over a 14-day period. Validation of infant feeding practices was conducted via home observation over six non-consecutive days with caregiver recall. A fully Bayesian framework using a gradient-based Markov chain Monte Carlo approach implemented in Stan was used to estimate the cut-off of non-milk water intake of EBF infants. From the original data set, 113 infants were determined to be EBF and provided 1500 paired mother-infant observations. The deuterium saliva concentrations were best described by two linked 1-compartment models (mother and infant), with body weight as a covariate on the mother's volume of distribution and infant's body weight on infant's water clearance rate. The cut-off value was based on the 90th percentile of the posterior distribution of non-milk water intake and was 86.6 g/day. This cut-off value can be used in future field studies in other geographic regions to determine exclusivity of breast feeding practices in order to determine their potential public health needs.
Collapse
Affiliation(s)
- Zheng Liu
- School of Pharmacy, University of Otago, Dunedin, New Zealand.
- School of Medicine and Public Health, Hunter Medical Research Institute, University of Newcastle, Kookaburra Circuit, Newcastle, NSW, 2305, Australia.
| | - Aly Diana
- Department of Human Nutrition, University of Otago, Dunedin, New Zealand
- Division of Medical Nutrition, Faculty of Medicine, Universitas Padjadjaran, Bandung, Indonesia
| | | | - Thomas Preston
- Scottish Universities Environmental Research Centre, University of Glasgow, Glasgow, UK
| | - Rosalind S Gibson
- Department of Human Nutrition, University of Otago, Dunedin, New Zealand
| | - Lisa Houghton
- Department of Human Nutrition, University of Otago, Dunedin, New Zealand
| | | |
Collapse
|
27
|
Anderson SC, Ward EJ. Black swans in space: modeling spatiotemporal processes with extremes. Ecology 2018; 100:e02403. [PMID: 29901233 DOI: 10.1002/ecy.2403] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 01/26/2018] [Accepted: 03/29/2018] [Indexed: 11/11/2022]
Abstract
In ecological systems, extremes can happen in time, such as population crashes, or in space, such as rapid range contractions. However, current methods for joint inference about temporal and spatial dynamics (e.g., spatiotemporal modeling with Gaussian random fields) may perform poorly when underlying processes include extreme events. Here we introduce a model that allows for extremes to occur simultaneously in time and space. Our model is a Bayesian predictive-process GLMM (generalized linear mixed-effects model) that uses a multivariate-t distribution to describe spatial random effects. The approach is easily implemented with our flexible R package glmmfields. First, using simulated data, we demonstrate the ability to recapture spatiotemporal extremes, and explore the consequences of fitting models that ignore such extremes. Second, we predict tree mortality from mountain pine beetle (Dendroctonus ponderosae) outbreaks in the U.S. Pacific Northwest over the last 16 yr. We show that our approach provides more accurate and precise predictions compared to traditional spatiotemporal models when extremes are present. Our R package makes these models accessible to a wide range of ecologists and scientists in other disciplines interested in fitting spatiotemporal GLMMs, with and without extremes.
Collapse
Affiliation(s)
- Sean C Anderson
- School of Aquatic and Fishery Sciences, University of Washington, Box 355020, Seattle, Washington, 98195, USA.,Pacific Biological Station, Fisheries and Oceans Canada, 3190 Hammond Bay Road, Nanaimo, British Columbia, V6T 6N7, Canada
| | - Eric J Ward
- Conservation Biology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanographic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle, Washington, 98112, USA
| |
Collapse
|
28
|
Makela S, Si Y, Gelman A. Bayesian inference under cluster sampling with probability proportional to size. Stat Med 2018; 37:3849-3868. [PMID: 29974495 DOI: 10.1002/sim.7892] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 05/29/2018] [Accepted: 06/08/2018] [Indexed: 11/07/2022]
Abstract
Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of the nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome, with computation performed in the open-source Bayesian inference engine Stan. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains, especially under informative cluster sampling design with small number of selected clusters. We apply the method to the Fragile Families and Child Wellbeing study as an illustration of inference for complex health surveys.
Collapse
Affiliation(s)
- Susanna Makela
- Department of Statistics, Columbia University, New York, New York
| | - Yajuan Si
- Survey Research Center, University of Michigan, Ann Arbor, Michigan
| | - Andrew Gelman
- Departments of Statistics and Political Science, Columbia University, New York, New York
| |
Collapse
|
29
|
Mahar RK, Carlin JB, Ranganathan S, Ponsonby AL, Vuillermin P, Vukcevic D. Bayesian modelling of lung function data from multiple-breath washout tests. Stat Med 2018; 37:2016-2033. [PMID: 29582453 DOI: 10.1002/sim.7650] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Revised: 10/30/2017] [Accepted: 02/09/2018] [Indexed: 11/10/2022]
Abstract
Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index, the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables the lung clearance index to be estimated by using tests with different degrees of completeness, something not possible with the standard approach. Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision. Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions.
Collapse
Affiliation(s)
- Robert K Mahar
- Data Science, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Department of Paediatrics, Faculty of Medicine, Dentistry and Health Services, University of Melbourne, Parkville, Victoria, Australia
| | - John B Carlin
- Data Science, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Department of Paediatrics, Faculty of Medicine, Dentistry and Health Services, University of Melbourne, Parkville, Victoria, Australia.,Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Services, University of Melbourne, Parkville, Victoria, Australia
| | - Sarath Ranganathan
- Department of Paediatrics, Faculty of Medicine, Dentistry and Health Services, University of Melbourne, Parkville, Victoria, Australia.,Infection and Immunity, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Department of Respiratory and Sleep Medicine, Royal Children's Hospital, Parkville, Victoria, Australia
| | - Anne-Louise Ponsonby
- Department of Paediatrics, Faculty of Medicine, Dentistry and Health Services, University of Melbourne, Parkville, Victoria, Australia.,Population Health, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Peter Vuillermin
- Population Health, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,School of Medicine, Faculty of Health, Deakin University, Geelong, Victoria, Australia.,Department of Paediatrics, Barwon Health, Geelong, Victoria, Australia
| | - Damjan Vukcevic
- Data Science, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,School of Mathematics and Statistics, Faculty of Science, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
30
|
Abstract
When evaluating cognitive models based on fits to observed data (or, really, any model that has free parameters), parameter estimation is critically important. Traditional techniques like hill climbing by minimizing or maximizing a fit statistic often result in point estimates. Bayesian approaches instead estimate parameters as posterior probability distributions, and thus naturally account for the uncertainty associated with parameter estimation; Bayesian approaches also offer powerful and principled methods for model comparison. Although software applications such as WinBUGS (Lunn, Thomas, Best, & Spiegelhalter, Statistics and Computing, 10, 325-337, 2000) and JAGS (Plummer, 2003) provide "turnkey"-style packages for Bayesian inference, they can be inefficient when dealing with models whose parameters are correlated, which is often the case for cognitive models, and they can impose significant technical barriers to adding custom distributions, which is often necessary when implementing cognitive models within a Bayesian framework. A recently developed software package called Stan (Stan Development Team, 2015) can solve both problems, as well as provide a turnkey solution to Bayesian inference. We present a tutorial on how to use Stan and how to add custom distributions to it, with an example using the linear ballistic accumulator model (Brown & Heathcote, Cognitive Psychology, 57, 153-178. doi: 10.1016/j.cogpsych.2007.12.002 , 2008).
Collapse
|
31
|
Feist BE, Buhle ER, Baldwin DH, Spromberg JA, Damm SE, Davis JW, Scholz NL. Roads to ruin: conservation threats to a sentinel species across an urban gradient. Ecol Appl 2017; 27:2382-2396. [PMID: 29044812 PMCID: PMC6084292 DOI: 10.1002/eap.1615] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 08/01/2017] [Accepted: 08/03/2017] [Indexed: 05/02/2023]
Abstract
Urbanization poses a global challenge to species conservation. This is primarily understood in terms of physical habitat loss, as agricultural and forested lands are replaced with urban infrastructure. However, aquatic habitats are also chemically degraded by urban development, often in the form of toxic stormwater runoff. Here we assess threats of urbanization to coho salmon throughout developed areas of the Puget Sound Basin in Washington, USA. Puget Sound coho are a sentinel species for freshwater communities and also a species of concern under the U.S. Endangered Species Act. Previous studies have demonstrated that stormwater runoff is unusually lethal to adult coho that return to spawn each year in urban watersheds. To further explore the relationship between land use and recurrent coho die-offs, we measured mortality rates in field surveys of 51 spawning sites across an urban gradient. We then used spatial analyses to measure landscape attributes (land use and land cover, human population density, roadways, traffic intensity, etc.) and climatic variables (annual summer and fall precipitation) associated with each site. Structural equation modeling revealed a latent urbanization gradient that was associated with road density and traffic intensity, among other variables, and positively related to coho mortality. Across years within sites, mortality increased with summer and fall precipitation, but the effect of rainfall was strongest in the least developed areas and was essentially neutral in the most urbanized streams. We used the best-supported structural equation model to generate a predictive mortality risk map for the entire Puget Sound Basin. This map indicates an ongoing and widespread loss of spawners across much of the Puget Sound population segment, particularly within the major regional north-south corridor for transportation and development. Our findings identify current and future urbanization-related threats to wild coho, and show where green infrastructure and similar clean water strategies could prove most useful for promoting species conservation and recovery.
Collapse
Affiliation(s)
- Blake E. Feist
- Conservation Biology DivisionNorthwest Fisheries Science CenterNational Marine Fisheries Service, NOAA2725 Montlake Boulevard EastSeattleWashington98112USA
| | - Eric R. Buhle
- Quantitative Consultants, Inc.Under contract to Northwest Fisheries Science CenterNational Marine Fisheries Service, NOAA2725 Montlake Boulevard EastSeattleWashington98112USA
| | - David H. Baldwin
- Environmental and Fisheries Sciences DivisionNorthwest Fisheries Science CenterNational Marine Fisheries Service, NOAA2725 Montlake Boulevard EastSeattleWashington98112USA
| | - Julann A. Spromberg
- Environmental and Fisheries Sciences DivisionNorthwest Fisheries Science CenterNational Marine Fisheries Service, NOAA2725 Montlake Boulevard EastSeattleWashington98112USA
| | - Steven E. Damm
- Washington Fish and Wildlife OfficeUnited States Fish and Wildlife Service510 Desmond Drive SELaceyWashington98392USA
| | - Jay W. Davis
- Washington Fish and Wildlife OfficeUnited States Fish and Wildlife Service510 Desmond Drive SELaceyWashington98392USA
| | - Nathaniel L. Scholz
- Environmental and Fisheries Sciences DivisionNorthwest Fisheries Science CenterNational Marine Fisheries Service, NOAA2725 Montlake Boulevard EastSeattleWashington98112USA
| |
Collapse
|
32
|
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker MA, Guo J, Li P, Riddell A. Stan: A Probabilistic Programming Language. J Stat Softw 2017; 76:1. [PMID: 36568334 PMCID: PMC9788645 DOI: 10.18637/jss.v076.i01] [Citation(s) in RCA: 2153] [Impact Index Per Article: 307.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.
Collapse
|
33
|
Li C, Yun X, Hu X, Zhang Y, Sang M, Liu X, Wu W, Li B. Identification of G protein-coupled receptors in the pea aphid, Acyrthosiphon pisum. Genomics 2013; 102:345-54. [PMID: 23792713 DOI: 10.1016/j.ygeno.2013.06.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Revised: 06/05/2013] [Accepted: 06/13/2013] [Indexed: 11/29/2022]
Abstract
GPCRs play crucial roles in the growth, development and reproduction of organisms. In insects, a large number of GPCRs have been reported for Holometabola but not Hemimetabola. The recently sequenced pea aphid genome provides us with the opportunity to analyze the evolution and potential functions of GPCRs in Hemimetabola. 82 GPCRs were identified from the representative model hemimetabolous insect Acyrthosiphon pisum, 37 of which have ESTs evidence, and 73 are annotated for the first time. A striking difference between A. pisum, Drosophila melanogaster and Tribolium castaneum is the duplication of the kinin and SIFamide receptors in A. pisum. Another divergence is the loss of the sulfakinin receptor in A. pisum. These duplications/losses are likely involved in the osmoregulation, reproduction and energy metabolism of A. pisum. Moreover, this work will promote functional analyses of GPCRs in A. pisum and may advance new drug target discovery for biological control of the aphid.
Collapse
Affiliation(s)
- Chengjun Li
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing 210023, China.
| | | | | | | | | | | | | | | |
Collapse
|