1
|
Ghatari AH, Aminghafari M. A new type of generalized information criterion for regularization parameter selection in penalized regression with application to treatment process data. J Biopharm Stat 2024; 34:488-512. [PMID: 37455635 DOI: 10.1080/10543406.2023.2228399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 06/13/2023] [Indexed: 07/18/2023]
Abstract
We propose a new approach to select the regularization parameter using a new version of the generalized information criterion (GIC ) in the subject of penalized regression. We prove the identifiability of bridge regression model as a prerequisite of statistical modeling. Then, we propose asymptotically efficient generalized information criterion (AGIC ) and prove that it has asymptotic loss efficiency. Also, we verified the better performance of AGIC in comparison to the older versions of GIC . Furthermore, we propose MSE search paths to order the selected features by lasso regression based on numerical studies. The MSE search paths provide a way to cover the lack of feature ordering in lasso regression model. The performance of AGIC with other types of GIC is compared using MSE and model utility in simulation study. We exert AGIC and other criteria to analyze breast and prostate cancer and Parkinson disease datasets. The results confirm the superiority of AGIC in almost all situations.
Collapse
Affiliation(s)
- Amir Hossein Ghatari
- Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Mina Aminghafari
- Department of Statistics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| |
Collapse
|
2
|
Brunello GHV, Nakano EY. A Bayesian Measure of Model Accuracy. ENTROPY (BASEL, SWITZERLAND) 2024; 26:510. [PMID: 38920519 PMCID: PMC11202794 DOI: 10.3390/e26060510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 06/03/2024] [Accepted: 06/10/2024] [Indexed: 06/27/2024]
Abstract
Ensuring that the proposed probabilistic model accurately represents the problem is a critical step in statistical modeling, as choosing a poorly fitting model can have significant repercussions on the decision-making process. The primary objective of statistical modeling often revolves around predicting new observations, highlighting the importance of assessing the model's accuracy. However, current methods for evaluating predictive ability typically involve model comparison, which may not guarantee a good model selection. This work presents an accuracy measure designed for evaluating a model's predictive capability. This measure, which is straightforward and easy to understand, includes a decision criterion for model rejection. The development of this proposal adopts a Bayesian perspective of inference, elucidating the underlying concepts and outlining the necessary procedures for application. To illustrate its utility, the proposed methodology was applied to real-world data, facilitating an assessment of its practicality in real-world scenarios.
Collapse
|
3
|
Welbanks L, Bell TJ, Beatty TG, Line MR, Ohno K, Fortney JJ, Schlawin E, Greene TP, Rauscher E, McGill P, Murphy M, Parmentier V, Tang Y, Edelman I, Mukherjee S, Wiser LS, Lagage PO, Dyrek A, Arnold KE. A high internal heat flux and large core in a warm Neptune exoplanet. Nature 2024; 630:836-840. [PMID: 38768634 DOI: 10.1038/s41586-024-07514-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024]
Abstract
Interactions between exoplanetary atmospheres and internal properties have long been proposed to be drivers of the inflation mechanisms of gaseous planets and apparent atmospheric chemical disequilibrium conditions1. However, transmission spectra of exoplanets have been limited in their ability to observationally confirm these theories owing to the limited wavelength coverage of the Hubble Space Telescope (HST) and inferences of single molecules, mostly H2O (ref. 2). In this work, we present the panchromatic transmission spectrum of the approximately 750 K, low-density, Neptune-sized exoplanet WASP-107b using a combination of HST Wide Field Camera 3 (WFC3) and JWST Near-Infrared Camera (NIRCam) and Mid-Infrared Instrument (MIRI). From this spectrum, we detect spectroscopic features resulting from H2O (21σ), CH4 (5σ), CO (7σ), CO2 (29σ), SO2 (9σ) and NH3 (6σ). The presence of these molecules enables constraints on the atmospheric metal enrichment (M/H is 10-18× solar3), vertical mixing strength (log10Kzz = 8.4-9.0 cm2 s-1) and internal temperature (>345 K). The high internal temperature is suggestive of tidally driven inflation4 acting on a Neptune-like internal structure, which can naturally explain the large radius and low density of the planet. These findings suggest that eccentricity-driven tidal heating is a critical process governing atmospheric chemistry and interior-structure inferences for most of the cool (<1,000 K) super-Earth-to-Saturn-mass exoplanet population.
Collapse
Affiliation(s)
- Luis Welbanks
- School of Earth and Space Exploration, Arizona State University, Tempe, AZ, USA.
| | - Taylor J Bell
- Bay Area Environmental Research Institute, NASA's Ames Research Center, Moffett Field, CA, USA
- Space Science and Astrobiology Division, NASA's Ames Research Center, Moffett Field, CA, USA
| | - Thomas G Beatty
- Department of Astronomy, University of Wisconsin-Madison, Madison, WI, USA
| | - Michael R Line
- School of Earth and Space Exploration, Arizona State University, Tempe, AZ, USA
| | - Kazumasa Ohno
- Department of Astronomy and Astrophysics, University of California, Santa Cruz, Santa Cruz, CA, USA
- Division of Science, National Astronomical Observatory of Japan (NAOJ), Tokyo, Japan
| | - Jonathan J Fortney
- Department of Astronomy and Astrophysics, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Thomas P Greene
- Space Science and Astrobiology Division, NASA's Ames Research Center, Moffett Field, CA, USA
| | - Emily Rauscher
- Department of Astronomy, University of Michigan, Ann Arbor, MI, USA
| | - Peter McGill
- Space Science Institute, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Matthew Murphy
- Steward Observatory, University of Arizona, Tucson, AZ, USA
| | - Vivien Parmentier
- Laboratoire Lagrange, Observatoire de la Côte d'Azur, Université Côte d'Azur, Nice, France
| | - Yao Tang
- Department of Astronomy and Astrophysics, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Isaac Edelman
- Bay Area Environmental Research Institute, NASA's Ames Research Center, Moffett Field, CA, USA
| | - Sagnick Mukherjee
- Department of Astronomy and Astrophysics, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Lindsey S Wiser
- School of Earth and Space Exploration, Arizona State University, Tempe, AZ, USA
| | - Pierre-Olivier Lagage
- Université Paris-Saclay, Université Paris Cité, CEA, CNRS, AIM, Gif-sur-Yvette, France
| | - Achrène Dyrek
- Université Paris-Saclay, Université Paris Cité, CEA, CNRS, AIM, Gif-sur-Yvette, France
| | - Kenneth E Arnold
- Department of Astronomy, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
4
|
Jenniches L, Michaux C, Popella L, Reichardt S, Vogel J, Westermann AJ, Barquist L. Improved RNA stability estimation through Bayesian modeling reveals most Salmonella transcripts have subminute half-lives. Proc Natl Acad Sci U S A 2024; 121:e2308814121. [PMID: 38527194 PMCID: PMC10998600 DOI: 10.1073/pnas.2308814121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 02/16/2024] [Indexed: 03/27/2024] Open
Abstract
RNA decay is a crucial mechanism for regulating gene expression in response to environmental stresses. In bacteria, RNA-binding proteins (RBPs) are known to be involved in posttranscriptional regulation, but their global impact on RNA half-lives has not been extensively studied. To shed light on the role of the major RBPs ProQ and CspC/E in maintaining RNA stability, we performed RNA sequencing of Salmonella enterica over a time course following treatment with the transcription initiation inhibitor rifampicin (RIF-seq) in the presence and absence of these RBPs. We developed a hierarchical Bayesian model that corrects for confounding factors in rifampicin RNA stability assays and enables us to identify differentially decaying transcripts transcriptome-wide. Our analysis revealed that the median RNA half-life in Salmonella in early stationary phase is less than 1 min, a third of previous estimates. We found that over half of the 500 most long-lived transcripts are bound by at least one major RBP, suggesting a general role for RBPs in shaping the transcriptome. Integrating differential stability estimates with cross-linking and immunoprecipitation followed by RNA sequencing (CLIP-seq) revealed that approximately 30% of transcripts with ProQ binding sites and more than 40% with CspC/E binding sites in coding or 3' untranslated regions decay differentially in the absence of the respective RBP. Analysis of differentially destabilized transcripts identified a role for ProQ in the oxidative stress response. Our findings provide insights into posttranscriptional regulation by ProQ and CspC/E, and the importance of RBPs in regulating gene expression.
Collapse
Affiliation(s)
- Laura Jenniches
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg97080, Germany
| | - Charlotte Michaux
- Institute of Molecular Infection Biology, University of Würzburg, Würzburg97080, Germany
| | - Linda Popella
- Institute of Molecular Infection Biology, University of Würzburg, Würzburg97080, Germany
| | - Sarah Reichardt
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg97080, Germany
| | - Jörg Vogel
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg97080, Germany
- Institute of Molecular Infection Biology, University of Würzburg, Würzburg97080, Germany
- Faculty of Medicine, University of Würzburg, Würzburg97080, Germany
| | - Alexander J. Westermann
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg97080, Germany
- Institute of Molecular Infection Biology, University of Würzburg, Würzburg97080, Germany
| | - Lars Barquist
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg97080, Germany
- Faculty of Medicine, University of Würzburg, Würzburg97080, Germany
- Department of Biology, University of Toronto Mississauga, Mississauga, ONL5L 1C6Canada
| |
Collapse
|
5
|
Davoudabadi MJ, Pagendam D, Drovandi C, Baldock J, White G. Innovative approaches in soil carbon sequestration modelling for better prediction with limited data. Sci Rep 2024; 14:3191. [PMID: 38326402 PMCID: PMC10850547 DOI: 10.1038/s41598-024-53516-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 02/01/2024] [Indexed: 02/09/2024] Open
Abstract
Soil carbon accounting and prediction play a key role in building decision support systems for land managers selling carbon credits, in the spirit of the Paris and Kyoto protocol agreements. Land managers typically rely on computationally complex models fit using sparse datasets to make these accounts and predictions. The model complexity and sparsity of the data can lead to over-fitting, leading to inaccurate results when making predictions with new data. Modellers address over-fitting by simplifying their models and reducing the number of parameters, and in the current context this could involve neglecting some soil organic carbon (SOC) components. In this study, we introduce two novel SOC models and a new RothC-like model and investigate how the SOC components and complexity of the SOC models affect the SOC prediction in the presence of small and sparse time series data. We develop model selection methods that can identify the soil carbon model with the best predictive performance, in light of the available data. Through this analysis we reveal that commonly used complex soil carbon models can over-fit in the presence of sparse time series data, and our simpler models can produce more accurate predictions.
Collapse
Affiliation(s)
- Mohammad Javad Davoudabadi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia.
- Australian Research Council Centre of Excellence for Mathematical & Statistical Frontiers (ACEMS), Victoria, Australia.
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia.
- CSIRO Data61, GPO Box 2583, Brisbane, QLD, 4001, Australia.
| | | | - Christopher Drovandi
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- Australian Research Council Centre of Excellence for Mathematical & Statistical Frontiers (ACEMS), Victoria, Australia
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia
| | - Jeff Baldock
- CSIRO Agriculture and Food, Glen Osmond, SA, Australia
| | - Gentry White
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
- Australian Research Council Centre of Excellence for Mathematical & Statistical Frontiers (ACEMS), Victoria, Australia
- QUT Centre for Data Science, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
6
|
Wang Z, Murray TA, Xiao M, Lin L, Alemayehu D, Chu H. Bayesian hierarchical models incorporating study-level covariates for multivariate meta-analysis of diagnostic tests without a gold standard with application to COVID-19. Stat Med 2023; 42:5085-5099. [PMID: 37724773 DOI: 10.1002/sim.9902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 05/25/2023] [Accepted: 09/01/2023] [Indexed: 09/21/2023]
Abstract
When evaluating a diagnostic test, it is common that a gold standard may not be available. One example is the diagnosis of SARS-CoV-2 infection using saliva sampling or nasopharyngeal swabs. Without a gold standard, a pragmatic approach is to postulate a "reference standard," defined as positive if either test is positive, or negative if both are negative. However, this pragmatic approach may overestimate sensitivities because subjects infected with SARS-CoV-2 may still have double-negative test results even when both tests exhibit perfect specificity. To address this limitation, we propose a Bayesian hierarchical model for simultaneously estimating sensitivity, specificity, and disease prevalence in the absence of a gold standard. The proposed model allows adjusting for study-level covariates. We evaluate the model performance using an example based on a recently published meta-analysis on the diagnosis of SARS-CoV-2 infection and extensive simulations. Compared with the pragmatic reference standard approach, we demonstrate that the proposed Bayesian method provides a more accurate evaluation of prevalence, specificity, and sensitivity in a meta-analytic framework.
Collapse
Affiliation(s)
- Zheng Wang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| | - Thomas A Murray
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mengli Xiao
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Lifeng Lin
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona, USA
| | - Demissie Alemayehu
- Global Biometrics and Data Management, Pfizer Inc., New York, New York, USA
| | - Haitao Chu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
- Global Biometrics and Data Management, Pfizer Inc., New York, New York, USA
| |
Collapse
|
7
|
Zhang WH, Dai L, Chen W, Sun A, Zhu WL, Ju BF. Novel Information-Driven Smoothing Spline Linearization Method for High-Precision Displacement Sensors Based on Information Criterions. SENSORS (BASEL, SWITZERLAND) 2023; 23:9268. [PMID: 38005654 PMCID: PMC10674328 DOI: 10.3390/s23229268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/10/2023] [Accepted: 11/15/2023] [Indexed: 11/26/2023]
Abstract
A noise-resistant linearization model that reveals the true nonlinearity of the sensor is essential for retrieving accurate physical displacement from the signals captured by sensing electronics. In this paper, we propose a novel information-driven smoothing spline linearization method, which innovatively integrates one new and three standard information criterions into a smoothing spline for the high-precision displacement sensors' linearization. Using theoretical analysis and Monte Carlo simulation, the proposed linearization method is demonstrated to outperform traditional polynomial and spline linearization methods for high-precision displacement sensors with a low noise to range ratio in the 10-5 level. Validation experiments were carried out on two different types of displacement sensors to benchmark the performance of the proposed method compared to the polynomial models and the the non-smoothing cubic spline. The results show that the proposed method with the new modified Akaike Information Criterion stands out compared to the other linearization methods and can improve the residual nonlinearity by over 50% compared to the standard polynomial model. After being linearized via the proposed method, the residual nonlinearities reach as low as ±0.0311% F.S. (Full Scale of Range), for the 1.5 mm range chromatic confocal displacement sensor, and ±0.0047% F.S., for the 100 mm range laser triangulation displacement sensor.
Collapse
Affiliation(s)
| | | | | | | | - Wu-Le Zhu
- State Key Laboratory of Fluid Power & Mechatronic Systems, Zhejiang University, Hangzhou 310058, China; (W.-H.Z.); (L.D.); (W.C.); (A.S.); (B.-F.J.)
| | | |
Collapse
|
8
|
Hopkins WA, Case BF, Groffen J, Brooks GC, Bodinof Jachowski CM, Button ST, Hallagan JJ, O'Brien RSM, Kindsvater HK. Filial Cannibalism Leads to Chronic Nest Failure of Eastern Hellbender Salamanders ( Cryptobranchus alleganiensis). Am Nat 2023; 202:92-106. [PMID: 37384763 DOI: 10.1086/724819] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2023]
Abstract
AbstractIn species that provide parental care, parents will sometimes cannibalize their own young (i.e., filial cannibalism). Here, we quantified the frequency of whole-clutch filial cannibalism in a species of giant salamander (eastern hellbender; Cryptobranchus alleganiensis) that has experienced precipitous population declines with unknown causes. We used underwater artificial nesting shelters deployed across a gradient of upstream forest cover to assess the fates of 182 nests at 10 sites over 8 years. We found strong evidence that nest failure rates increased at sites with low riparian forest cover in the upstream catchment. At several sites, reproductive failure was 100%, mainly due to cannibalism by the caring male. The high incidence of filial cannibalism at degraded sites was not explained by evolutionary hypotheses for filial cannibalism based on poor adult body condition or low reproductive value of small clutches. Instead, larger clutches at degraded sites were most vulnerable to cannibalism. We hypothesize that high frequencies of filial cannibalism of large clutches in areas with low forest cover could be related to changes in water chemistry or siltation that influence parental physiology or that reduce the viability of eggs. Importantly, our results identify chronic nest failure as a possible mechanism contributing to population declines and observed geriatric age structure in this imperiled species.
Collapse
|
9
|
Järvenpää M, Corander J. On predictive inference for intractable models via approximate Bayesian computation. STATISTICS AND COMPUTING 2023; 33:42. [PMID: 36785730 PMCID: PMC9911513 DOI: 10.1007/s11222-022-10163-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/02/2022] [Indexed: 06/18/2023]
Abstract
UNLABELLED Approximate Bayesian computation (ABC) is commonly used for parameter estimation and model comparison for intractable simulator-based statistical models whose likelihood function cannot be evaluated. In this paper we instead investigate the feasibility of ABC as a generic approximate method for predictive inference, in particular, for computing the posterior predictive distribution of future observations or missing data of interest. We consider three complementary ABC approaches for this goal, each based on different assumptions regarding which predictive density of the intractable model can be sampled from. The case where only simulation from the joint density of the observed and future data given the model parameters can be used for inference is given particular attention and it is shown that the ideal summary statistic in this setting is minimal predictive sufficient instead of merely minimal sufficient (in the ordinary sense). An ABC prediction approach that takes advantage of a certain latent variable representation is also investigated. We additionally show how common ABC sampling algorithms can be used in the predictive settings considered. Our main results are first illustrated by using simple time-series models that facilitate analytical treatment, and later by using two common intractable dynamic models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11222-022-10163-6.
Collapse
Affiliation(s)
- Marko Järvenpää
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), University of Helsinki, Helsinki, Finland
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
10
|
Bilancia M, Vitale D, Manca F, Perchinunno P, Santacroce L. A dynamic causal modeling of the second outbreak of COVID-19 in Italy. ADVANCES IN STATISTICAL ANALYSIS : ASTA : A JOURNAL OF THE GERMAN STATISTICAL SOCIETY 2023:1-30. [PMID: 36776481 PMCID: PMC9904269 DOI: 10.1007/s10182-023-00469-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 01/04/2023] [Indexed: 02/10/2023]
Abstract
While the vaccination campaign against COVID-19 is having its positive impact, we retrospectively analyze the causal impact of some decisions made by the Italian government on the second outbreak of the SARS-CoV-2 pandemic in Italy, when no vaccine was available. First, we analyze the causal impact of reopenings after the first lockdown in 2020. In addition, we also analyze the impact of reopening schools in September 2020. Our results provide an unprecedented opportunity to evaluate the causal relationship between the relaxation of restrictions and the transmission in the community of a highly contagious respiratory virus that causes severe illness in the absence of prophylactic vaccination programs. We present a purely data-analytic approach based on a Bayesian methodology and discuss possible interpretations of the results obtained and implications for policy makers.
Collapse
Affiliation(s)
- Massimo Bilancia
- Department of Precision and Regenerative Medicine and Ionian Area (DiMePRe-J), University of Bari Aldo Moro, Policlinic University Hospital – Piazza G. Cesare 11, 70124 Bari, Italy
| | - Domenico Vitale
- MEMOTEF Department, University of Roma La Sapienza, Via del Castro Laurenziano 9, 00161 Rome, Italy
| | - Fabio Manca
- Department of Education, Psychology, Communication (ForPsiCom), University of Bari Aldo Moro, Palazzo Chiaia Napolitano – Via S. Crisanzio 42, 70122 Bari, Italy
| | - Paola Perchinunno
- Department of Business and Law Studies (DEMDI), University of Bari Aldo Moro, Largo Abbazia di Santa Scolastica 53, 70124 Bari, Italy
| | - Luigi Santacroce
- Department of Interdisciplinary Medicine (DIM) and Microbiology and Virology Unit, University of Bari Aldo Moro, Policlinic University Hospital – Piazza G. Cesare 11, 70124 Bari, Italy
| |
Collapse
|
11
|
Acharyya S, Pati D, Sun S, Bandyopadhyay D. A monotone single index model for missing-at-random longitudinal proportion data. J Appl Stat 2023; 51:1023-1040. [PMID: 38628451 PMCID: PMC11018042 DOI: 10.1080/02664763.2023.2173156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 01/21/2023] [Indexed: 02/10/2023]
Abstract
Beta distributions are commonly used to model proportion valued response variables, often encountered in longitudinal studies. In this article, we develop semi-parametric Beta regression models for proportion valued responses, where the aggregate covariate effect is summarized and flexibly modeled, using a interpretable monotone time-varying single index transform of a linear combination of the potential covariates. We utilize the potential of single index models, which are effective dimension reduction tools and accommodate link function misspecification in generalized linear mixed models. Our Bayesian methodology incorporates the missing-at-random feature of the proportion response and utilize Hamiltonian Monte Carlo sampling to conduct inference. We explore finite-sample frequentist properties of our estimates and assess the robustness via detailed simulation studies. Finally, we illustrate our methodology via application to a motivating longitudinal dataset on obesity research recording proportion body fat.
Collapse
Affiliation(s)
- Satwik Acharyya
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Debdeep Pati
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Shumei Sun
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | | |
Collapse
|
12
|
Kelter R. How to Choose between Different Bayesian Posterior Indices for Hypothesis Testing in Practice. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:160-188. [PMID: 34582284 DOI: 10.1080/00273171.2021.1967716] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Hypothesis testing is an essential statistical method in experimental psychology and the cognitive sciences. The problems of traditional null hypothesis significance testing (NHST) have been discussed widely, and among the proposed solutions to the replication problems caused by the inappropriate use of significance tests and p-values is a shift toward Bayesian data analysis. However, Bayesian hypothesis testing is concerned with various posterior indices for significance and the size of an effect. This complicates Bayesian hypothesis testing in practice, as the availability of multiple Bayesian alternatives to the traditional p-value causes confusion which one to select and why. In this paper, various Bayesian posterior indices which have been proposed in the literature are compared and their benefits and limitations are discussed. The comparison shows that conceptually not all proposed Bayesian alternatives to NHST and p-values are beneficial, and the usefulness of some indices strongly depends on the study design and research goal. However, the comparison also reveals that there exist at least two candidates among the available Bayesian posterior indices which have appealing theoretical properties and are widely underused in the cognitive sciences.
Collapse
Affiliation(s)
- Riko Kelter
- Department of Mathematics, University of Siegen
| |
Collapse
|
13
|
Analysis of social interactions in group-housed animals using dyadic linear modelsf. Appl Anim Behav Sci 2022. [DOI: 10.1016/j.applanim.2022.105747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
14
|
Lee S, Le T, Tran P, Li C. Investigating the association of a sensitive attribute with a random variable using the Christofides generalised randomised response design and Bayesian methods. J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Shen‐Ming Lee
- Department of Statistics Feng Chia University Taichung Taiwan, ROC
| | - Truong‐Nhat Le
- Department of Statistics Feng Chia University Taichung Taiwan, ROC
- Faculty of Mathematics and Statistics Ton Duc Thang University Ho Chi Minh Vietnam
| | - Phuoc‐Loc Tran
- Department of Mathematics, College of Natural Science Can Tho University Can Tho Vietnam
| | - Chin‐Shang Li
- School of Nursing The State University of New York University at Buffalo Buffalo New York USA
| |
Collapse
|
15
|
Using reference models in variable selection. Comput Stat 2022. [DOI: 10.1007/s00180-022-01231-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractVariable selection, or more generally, model reduction is an important aspect of the statistical workflow aiming to provide insights from data. In this paper, we discuss and demonstrate the benefits of using a reference model in variable selection. A reference model acts as a noise-filter on the target variable by modeling its data generating mechanism. As a result, using the reference model predictions in the model selection procedure reduces the variability and improves stability, leading to improved model selection performance. Assuming that a Bayesian reference model describes the true distribution of future data well, the theoretically preferred usage of the reference model is to project its predictive distribution to a reduced model, leading to projection predictive variable selection approach. We analyse how much the great performance of the projection predictive variable is due to the use of reference model and show that other variable selection methods can also be greatly improved by using the reference model as target instead of the original data. In several numerical experiments, we investigate the performance of the projective prediction approach as well as alternative variable selection methods with and without reference models. Our results indicate that the use of reference models generally translates into better and more stable variable selection.
Collapse
|
16
|
Jamhiri B, Xu Y, Jalal FE. Cracking propagation in expansive soils under desiccation and stabilization planning using Bayesian inference and Markov decision chains. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:36740-36762. [PMID: 35064516 DOI: 10.1007/s11356-022-18690-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
Desiccation cracking endangers the stability of expansive soils subjected to cyclic moisture variations. In the current research, prominent cracking prediction models including linear, linear elastic, linear elastoplastic, and linear elastic fracture were studied. Then, Monte Carlo limit state functions were generated based on predictions. Results indicate that there is less than 5% chance of cracking for depths beyond 0.5, 6, 8, and 9 m as predicted by the linear elastoplastic, linear elastic, linear, and linear elastic fracture models, respectively. Moreover, a series of sensitivity analysis was performed to evaluate model and parameter uncertainties. Comparatively, it was found that the linear model exhibits the highest uncertainty while linear elastoplastic model possesses the least uncertainty thus yielding a reasonable prediction. Additionally, soil parameters including matric suction followed by dry density were identified to govern the overall cracking. Using Bayesian inference, numerous conditional probabilities of variation of soil properties were investigated. Then, several cracking probabilities under history of low to high matric suction and dry density were obtained. Accordingly, Monte Carlo Markov decision chains were established based on several ecofriendly and feasible stabilization policies and their performance was also evaluated. The obtained safety factors (SF) suggest that stabilization plans resulting in high moisture and dry density have the least likelihood of cracking with a SF equal to 5.1. However, stabilization policies having low dry density and moisture yield have the least SF of 0.39. Findings of this study can improve the decision-making processes for expansive soil stabilization by considering a variety of environmental conditional probabilities.
Collapse
Affiliation(s)
- Babak Jamhiri
- Department of Civil Engineering, State Key Laboratory of Ocean Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yongfu Xu
- Department of Civil Engineering, State Key Laboratory of Ocean Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Fazal E Jalal
- Department of Civil Engineering, State Key Laboratory of Ocean Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
17
|
Maradesa A, Py B, Quattrocchi E, Ciucci F. The probabilistic deconvolution of the distribution of relaxation times with finite Gaussian processes. Electrochim Acta 2022. [DOI: 10.1016/j.electacta.2022.140119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Patterson C, Schumacher PB, Nicenboim B, Hagen J, Kehler A. A Bayesian Approach to German Personal and Demonstrative Pronouns. Front Psychol 2022; 12:672927. [PMID: 35308073 PMCID: PMC8927811 DOI: 10.3389/fpsyg.2021.672927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 12/10/2021] [Indexed: 11/13/2022] Open
Abstract
When faced with an ambiguous pronoun, an addressee must interpret it by identifying a suitable referent. It has been proposed that the interpretation of pronouns can be captured using Bayes' Rule: P(referent|pronoun) ∝ P(pronoun|referent)P(referent). This approach has been successful in English and Mandarin Chinese. In this study, we further the cross-linguistic evidence for the Bayesian model by applying it to German personal and demonstrative pronouns, and provide novel quantitative support for the model by assessing model performance in a Bayesian statistical framework that allows implementation of a fully hierarchical structure, providing the most conservative estimates of uncertainty. Data from two story-continuation experiments showed that the Bayesian model overall made more accurate predictions for pronoun interpretation than production and next-mention biases separately. Furthermore, the model accounts for the demonstrative pronoun dieser as well as the personal pronoun, despite the demonstrative having different, and more rigid, resolution preferences.
Collapse
Affiliation(s)
- Clare Patterson
- Department of German Language and Literature I, Linguistics, University of Cologne, Cologne, Germany
| | - Petra B. Schumacher
- Department of German Language and Literature I, Linguistics, University of Cologne, Cologne, Germany
| | - Bruno Nicenboim
- Department of Cognitive Science and Artificial Intelligence, Tilburg School of Humanities and Digital Sciences, Tilburg University, Tilburg, Netherlands
| | - Johannes Hagen
- Department of German Language and Literature I, Linguistics, University of Cologne, Cologne, Germany
| | - Andrew Kehler
- Department of Linguistics, University of California San Diego, La Jolla, CA, United States
| |
Collapse
|
19
|
Dora J, Schultz ME, Shoda Y, Lee CM, King KM. No evidence for trait- and state-level urgency moderating the daily association between negative affect and subsequent alcohol use in two college samples. Brain Neurosci Adv 2022; 6:23982128221079556. [PMID: 35237726 PMCID: PMC8883372 DOI: 10.1177/23982128221079556] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 01/22/2022] [Indexed: 11/16/2022] Open
Abstract
It remains unclear whether the negative reinforcement pathway to problematic drinking exists, and if so, for whom. One idea that has received some support recently is that people who tend to act impulsively in response to negative emotions (i.e. people high in negative urgency) may specifically respond to negative affect with increased alcohol consumption. We tested this idea in a preregistered secondary data analysis of two ecological momentary assessment studies using college samples. Participants (N = 226) reported on their current affective state multiple times per day and also the following morning reported alcohol use of the previous night. We assessed urgency both at baseline and during the momentary affect assessments. Results from our Bayesian model comparison procedure, which penalises increasing model complexity, indicate that no combination of the variables of interest (negative affect, urgency, and the respective interactions) outperformed a baseline model that included two known demographic predictors of alcohol use. A non-preregistered exploratory analysis provided some evidence for the effect of daily positive affect, positive urgency, as well as their interaction on subsequent alcohol use. Taken together, our results suggest that college students' drinking may be better described by a positive rather than negative reinforcement cycle.
Collapse
Affiliation(s)
- Jonas Dora
- Jonas Dora, Department of Psychology, University of Washington, Box 351525, Seattle, WA 98195-1525, USA.
| | | | | | | | | |
Collapse
|
20
|
Sivula T, Magnusson M, Vehtari A. Unbiased estimator for the variance of the leave-one-out cross-validation estimator for a Bayesian normal model with fixed variance. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2021.2021240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Tuomas Sivula
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Måns Magnusson
- Department of Statistics, Uppsala University, Uppsala, Sweden
| | - Aki Vehtari
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
21
|
Revuelta J, Hidalgo B, Alcazar-Córcoles MÁ. Bayesian Estimation and Testing of a Beta Factor Model for Bounded Continuous Variables. MULTIVARIATE BEHAVIORAL RESEARCH 2022; 57:57-78. [PMID: 32804553 DOI: 10.1080/00273171.2020.1805582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The particularities of bounded data are often overlooked. This type of data is likely to display a pattern of skewness because of the existence of an upper and lower limit that cannot be exceeded. In the context of factor analysis, when variables are skewed in opposite directions, using normal-theory factor analysis might lead to over-factoring. We propose a Bayesian beta factor model to analyze doubly bounded data. A simulation study was conducted to evaluate the performance of the normal and beta factor models in the presence of skewed variables. Two Bayesian approaches to model evaluation methods are considered, posterior predictive checking and three information criterion measures (DIC, WAIC, and LOO). The number of estimated factors based on the Bayesian methods is compared for the normal and beta factor models. An application of the model using real data is also presented. We found that the beta factor model constitutes a suitable alternative to analyze data with a pattern of mixed skewness. Posterior predictive checking appears to be a viable option to select the optimal number of factors in Bayesian factor analysis.
Collapse
|
22
|
|
23
|
Vana L, Hornik K. Dynamic modelling of corporate credit ratings and defaults. STAT MODEL 2021. [DOI: 10.1177/1471082x211057610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In this article, we propose a longitudinal multivariate model for binary and ordinal outcomes to describe the dynamic relationship among firm defaults and credit ratings from various raters. The latent probability of default is modelled as a dynamic process which contains additive firm-specific effects, a latent systematic factor representing the business cycle and idiosyncratic observed and unobserved factors. The joint set-up also facilitates the estimation of a bias for each rater which captures changes in the rating standards of the rating agencies. Bayesian estimation techniques are employed to estimate the parameters of interest. Several models are compared based on their out-of-sample prediction ability and we find that the proposed model outperforms simpler specifications. The joint framework is illustrated on a sample of publicly traded US corporates which are rated by at least one of the credit rating agencies S&P, Moody's and Fitch during the period 1995–2014.
Collapse
Affiliation(s)
- Laura Vana
- Department of Finance, Accounting and Statistics, Institute for Statistics and Mathematics, Vienna University of Economics and Business, Austria
| | - Kurt Hornik
- Department of Finance, Accounting and Statistics, Institute for Statistics and Mathematics, Vienna University of Economics and Business, Austria
| |
Collapse
|
24
|
Yates LA, Brook BW, Buettel JC. Spatial pattern analysis of line-segment data in ecology. Ecology 2021; 103:e03597. [PMID: 34816432 DOI: 10.1002/ecy.3597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 08/10/2021] [Accepted: 09/10/2021] [Indexed: 11/09/2022]
Abstract
The spatial analysis of linear features (lines and curves) is a challenging and rarely attempted problem in ecology. Existing methods are typically expressed in abstract mathematical formalism, making it difficult to assess their relevance and transferability into an ecological setting. We introduce a set of concrete and accessible methods to analyze the spatial patterning of line-segment data. The methods include Monte Carlo techniques based on a new generalization of Ripley's K -function and a class of line-segment processes that can be used to specify parametric models: parameters are estimated using maximum likelihood and models compared using information-theoretic principles. We apply the new methods to fallen tree (dead log) data collected from two 1-ha Australian tall eucalypt forest plots. Our results show that the spatial pattern of the fallen logs is best explained by plot-level spatial heterogeneity in combination with a slope-dependent nonuniform distribution of fallen-log orientations. These methods are of a general nature and are applicable to any line-segment data. In the context of forest ecology, the integration of fallen logs as linear structural features in a landscape with the point locations of living trees, and a quantification of their interactions, can yield new insights into the functional and structural role of tree fall in forest communities and their enduring post-mortem ecological legacy as spatially distributed decomposing logs.
Collapse
Affiliation(s)
- Luke A Yates
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, 7005, Australia
| | - Barry W Brook
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, 7005, Australia
| | - Jessie C Buettel
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, 7005, Australia
| |
Collapse
|
25
|
Radcliffe AJ, Reklaitis GV. Bayesian hierarchical modeling for online process monitoring and quality control, with application to real time image data. Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107446] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
Kowal DR, Bravo M, Leong H, Bui A, Griffin RJ, Ensor KB, Miranda ML. Bayesian variable selection for understanding mixtures in environmental exposures. Stat Med 2021; 40:4850-4871. [PMID: 34132416 PMCID: PMC8440371 DOI: 10.1002/sim.9099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 05/26/2021] [Accepted: 05/26/2021] [Indexed: 11/10/2022]
Abstract
Social and environmental stressors are crucial factors in child development. However, there exists a multitude of measurable social and environmental factors-the effects of which may be cumulative, interactive, or null. Using a comprehensive cohort of children in North Carolina, we study the impact of social and environmental variables on 4th end-of-grade exam scores in reading and mathematics. To identify the essential factors that predict these educational outcomes, we design new tools for Bayesian linear variable selection using decision analysis. We extract a predictive optimal subset of explanatory variables by coupling a loss function with a novel model-based penalization scheme, which leads to coherent Bayesian decision analysis and empirically improves variable selection, estimation, and prediction on simulated data. The Bayesian linear model propagates uncertainty quantification to all predictive evaluations, which is important for interpretable and robust model comparisons. These predictive comparisons are conducted out-of-sample with a customized approximation algorithm that avoids computationally intensive model refitting. We apply our variable selection techniques to identify the joint collection of social and environmental stressors-and their interactions-that offer clear and quantifiable improvements in prediction of reading and mathematics exam scores.
Collapse
Affiliation(s)
| | - Mercedes Bravo
- Biostatistics and Epidemiology Division, RTI International,
North Carolina, U.S.A
- Children’s Environmental Health Initiative,
University of Notre Dame, Indiana, U.S.A
| | - Henry Leong
- Children’s Environmental Health Initiative,
University of Notre Dame, Indiana, U.S.A
| | - Alexander Bui
- Department of Civil and Environmental Engineering, Rice
University, Texas, U.S.A
| | - Robert J. Griffin
- Department of Civil and Environmental Engineering, Rice
University, Texas, U.S.A
| | | | - Marie Lynn Miranda
- Children’s Environmental Health Initiative,
University of Notre Dame, Indiana, U.S.A
- Department of Applied and Computational Mathematics and
Statistics, University of Notre Dame, Indiana, U.S.A
| |
Collapse
|
27
|
Carota C, Filippone M, Polettini S. Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation. Int Stat Rev 2021. [DOI: 10.1111/insr.12471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Cinzia Carota
- Dipartimento di Economia e Statistica “Cognetti de Martiis” Università di Torino Lungo Dora Siena 100 A Turin 10153 Italy
| | - Maurizio Filippone
- Department of Data Science, EURECOM Campus SophiaTech, 450 Route des Chappes Biot 06410 France
| | - Silvia Polettini
- Dipartimento di Scienze Sociali ed Economiche Sapienza Università di Roma P.le Aldo Moro 5 Rome 00185 Italy
| |
Collapse
|
28
|
Paape D, Avetisyan S, Lago S, Vasishth S. Modeling Misretrieval and Feature Substitution in Agreement Attraction: A Computational Evaluation. Cogn Sci 2021; 45:e13019. [PMID: 34379348 DOI: 10.1111/cogs.13019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 05/30/2021] [Accepted: 06/25/2021] [Indexed: 12/12/2022]
Abstract
We present computational modeling results based on a self-paced reading study investigating number attraction effects in Eastern Armenian. We implement three novel computational models of agreement attraction in a Bayesian framework and compare their predictive fit to the data using k-fold cross-validation. We find that our data are better accounted for by an encoding-based model of agreement attraction, compared to a retrieval-based model. A novel methodological contribution of our study is the use of comprehension questions with open-ended responses, so that both misinterpretation of the number feature of the subject phrase and misassignment of the thematic subject role of the verb can be investigated at the same time. We find evidence for both types of misinterpretation in our study, sometimes in the same trial. However, the specific error patterns in our data are not fully consistent with any previously proposed model.
Collapse
Affiliation(s)
- Dario Paape
- Department of Linguistics, University of Potsdam
| | | | - Sol Lago
- Institute for Romance Languages and Literatures, Goethe University Frankfurt
| | | |
Collapse
|
29
|
Abstract
In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.
Collapse
Affiliation(s)
- Gerhard Jäger
- Department of Linguistics, University of Tübingen, Tübingen, Germany
| | - Johannes Wahle
- Department of Linguistics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
30
|
Yates LA, Richards SA, Brook BW. Parsimonious model selection using information theory: a modified selection rule. Ecology 2021; 102:e03475. [PMID: 34272730 DOI: 10.1002/ecy.3475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/16/2021] [Accepted: 05/13/2021] [Indexed: 11/08/2022]
Abstract
Information-theoretic approaches to model selection, such as Akaike's information criterion (AIC) and cross validation, provide a rigorous framework to select among candidate hypotheses in ecology, yet the persistent concern of overfitting undermines the interpretation of inferred processes. A common misconception is that overfitting is due to the choice of criterion or model score, despite research demonstrating that selection uncertainty associated with score estimation is the predominant influence. Here we introduce a novel selection rule that identifies a parsimonious model by directly accounting for estimation uncertainty, while still retaining an information-theoretic interpretation. The new rule, which is a modification of the existing one-standard-error rule, mitigates overfitting and reduces the likelihood that spurious effects will be included in the selected model, thereby improving its inferential properties. We present the rule and illustrative examples in the context of maximum-likelihood estimation and Kullback-Leibler discrepancy, although the rule is applicable in a more general setting, including Bayesian model selection and other types of discrepancy.
Collapse
Affiliation(s)
- Luke A Yates
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, 7005, Australia
| | - Shane A Richards
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, 7005, Australia
| | - Barry W Brook
- School of Natural Sciences, University of Tasmania, Hobart, Tasmania, 7005, Australia
| |
Collapse
|
31
|
Kowal DR. Fast, Optimal, and Targeted Predictions using Parametrized Decision Analysis. J Am Stat Assoc 2021; 117:1875-1886. [PMID: 36855685 PMCID: PMC9970289 DOI: 10.1080/01621459.2021.1891926] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 01/23/2021] [Accepted: 02/11/2021] [Indexed: 10/22/2022]
Abstract
Prediction is critical for decision-making under uncertainty and lends validity to statistical inference. With targeted prediction, the goal is to optimize predictions for specific decision tasks of interest, which we represent via functionals. Although classical decision analysis extracts predictions from a Bayesian model, these predictions are often difficult to interpret and slow to compute. Instead, we design a class of parametrized actions for Bayesian decision analysis that produce optimal, scalable, and simple targeted predictions. For a wide variety of action parametrizations and loss functions-including linear actions with sparsity constraints for targeted variable selection-we derive a convenient representation of the optimal targeted prediction that yields efficient and interpretable solutions. Customized out-of-sample predictive metrics are developed to evaluate and compare among targeted predictors. Through careful use of the posterior predictive distribution, we introduce a procedure that identifies a set of near-optimal, or acceptable targeted predictors, which provide unique insights into the features and level of complexity needed for accurate targeted prediction. Simulations demonstrate excellent prediction, estimation, and variable selection capabilities. Targeted predictions are constructed for physical activity data from the National Health and Nutrition Examination Survey (NHANES) to better predict and understand the characteristics of intraday physical activity.
Collapse
|
32
|
Slanzi D, Mameli V, Brown PJ. A comparative study on high-dimensional bayesian regression with binary predictors. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1894337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Debora Slanzi
- Department of Management, Ca’ Foscari University of Venice, Venice, Italy
- European Centre for Living Technology, Venice, Italy
| | - Valentina Mameli
- Department of Economics and Statistics, University of Udine, Udine, Italy
| | - Philip J. Brown
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, United Kingdom
| |
Collapse
|
33
|
Kaplan D. On the Quantification of Model Uncertainty: A Bayesian Perspective. PSYCHOMETRIKA 2021; 86:215-238. [PMID: 33721184 PMCID: PMC7958145 DOI: 10.1007/s11336-021-09754-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 02/15/2021] [Indexed: 06/09/2023]
Abstract
Issues of model selection have dominated the theoretical and applied statistical literature for decades. Model selection methods such as ridge regression, the lasso, and the elastic net have replaced ad hoc methods such as stepwise regression as a means of model selection. In the end, however, these methods lead to a single final model that is often taken to be the model considered ahead of time, thus ignoring the uncertainty inherent in the search for a final model. One method that has enjoyed a long history of theoretical developments and substantive applications, and that accounts directly for uncertainty in model selection, is Bayesian model averaging (BMA). BMA addresses the problem of model selection by not selecting a final model, but rather by averaging over a space of possible models that could have generated the data. The purpose of this paper is to provide a detailed and up-to-date review of BMA with a focus on its foundations in Bayesian decision theory and Bayesian predictive modeling. We consider the selection of parameter and model priors as well as methods for evaluating predictions based on BMA. We also consider important assumptions regarding BMA and extensions of model averaging methods to address these assumptions, particularly the method of Bayesian stacking. Simple empirical examples are provided and directions for future research relevant to psychometrics are discussed.
Collapse
|
34
|
|
35
|
Pan I, Bester D. Marginal Likelihood Based Model Comparison in Fuzzy Bayesian Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2018.2868253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
36
|
Abstract
AbstractCross-validation can be used to measure a model’s predictive accuracy for the purpose of model comparison, averaging, or selection. Standard leave-one-out cross-validation (LOO-CV) requires that the observation model can be factorized into simple terms, but a lot of important models in temporal and spatial statistics do not have this property or are inefficient or unstable when forced into a factorized form. We derive how to efficiently compute and validate both exact and approximate LOO-CV for any Bayesian non-factorized model with a multivariate normal or Student-$$t$$
t
distribution on the outcome values. We demonstrate the method using lagged simultaneously autoregressive (SAR) models as a case study.
Collapse
|
37
|
Afrabandpey H, Peltola T, Piironen J, Vehtari A, Kaski S. A decision-theoretic approach for model interpretability in Bayesian framework. Mach Learn 2020. [DOI: 10.1007/s10994-020-05901-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractA salient approach to interpretable machine learning is to restrict modeling to simple models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users’ preferences, not the data generation mechanism; it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model which does not compromise accuracy, is fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability utility function. The approach is model agnostic—neither the interpretable model nor the reference model are restricted to a certain class of models—and the optimization problem can be solved using standard tools. Through experiments on real-word data sets, using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the alternative of restricting the prior. We also propose a systematic way to measure stability of interpretabile models constructed by different interpretability approaches and show that our proposed approach generates more stable models.
Collapse
|
38
|
Bürkner PC, Gabry J, Vehtari A. Approximate leave-future-out cross-validation for Bayesian time series models. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1783262] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
| | - Jonah Gabry
- Applied Statistics Center and ISERP, Columbia University, New York, NY, USA
| | - Aki Vehtari
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
39
|
Hinne M, Gronau QF, van den Bergh D, Wagenmakers EJ. A Conceptual Introduction to Bayesian Model Averaging. ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE 2020. [DOI: 10.1177/2515245919898657] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Many statistical scenarios initially involve several candidate models that describe the data-generating process. Analysis often proceeds by first selecting the best model according to some criterion and then learning about the parameters of this selected model. Crucially, however, in this approach the parameter estimates are conditioned on the selected model, and any uncertainty about the model-selection process is ignored. An alternative is to learn the parameters for all candidate models and then combine the estimates according to the posterior probabilities of the associated models. This approach is known as Bayesian model averaging (BMA). BMA has several important advantages over all-or-none selection methods, but has been used only sparingly in the social sciences. In this conceptual introduction, we explain the principles of BMA, describe its advantages over all-or-none model selection, and showcase its utility in three examples: analysis of covariance, meta-analysis, and network analysis.
Collapse
Affiliation(s)
- Max Hinne
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen
| | | | | | | |
Collapse
|
40
|
Zheng H, Hampson LV. A Bayesian decision-theoretic approach to incorporate preclinical information into phase I oncology trials. Biom J 2020; 62:1408-1427. [PMID: 32285511 DOI: 10.1002/bimj.201900161] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 12/05/2019] [Accepted: 01/31/2020] [Indexed: 11/10/2022]
Abstract
Leveraging preclinical animal data for a phase I oncology trial is appealing yet challenging. In this paper, we use animal data to improve decision-making in a model-based dose-escalation procedure. We make a proposal for how to measure and address a prior-data conflict in a sequential study with a small sample size. Animal data are incorporated via a robust two-component mixture prior for the parameters of the human dose-toxicity relationship. The weights placed on each component of the prior are chosen empirically and updated dynamically as the trial progresses and more data accrue. After completion of each cohort, we use a Bayesian decision-theoretic approach to evaluate the predictive utility of the animal data for the observed human toxicity outcomes, reflecting the degree of agreement between dose-toxicity relationships in animals and humans. The proposed methodology is illustrated through several data examples and an extensive simulation study.
Collapse
Affiliation(s)
- Haiyan Zheng
- Biostatistics Research Group, Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK.,Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| | - Lisa V Hampson
- Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland
| |
Collapse
|
41
|
Samorodnitsky S, Hoadley KA, Lock EF. A Pan-Cancer and Polygenic Bayesian Hierarchical Model for the Effect of Somatic Mutations on Survival. Cancer Inform 2020; 19:1176935120907399. [PMID: 32116467 PMCID: PMC7029540 DOI: 10.1177/1176935120907399] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 01/26/2020] [Indexed: 11/17/2022] Open
Abstract
We built a novel Bayesian hierarchical survival model based on the somatic mutation profile of patients across 50 genes and 27 cancer types. The pan-cancer quality allows for the model to “borrow” information across cancer types, motivated by the assumption that similar mutation profiles may have similar (but not necessarily identical) effects on survival across different tissues of origin or tumor types. The effect of a mutation at each gene was allowed to vary by cancer type, whereas the mean effect of each gene was shared across cancers. Within this framework, we considered 4 parametric survival models (normal, log-normal, exponential, and Weibull), and we compared their performance via a cross-validation approach in which we fit each model on training data and estimate the log-posterior predictive likelihood on test data. The log-normal model gave the best fit, and we investigated the partial effect of each gene on survival via a forward selection procedure. Through this we determined that mutations at TP53 and FAT4 were together the most useful for predicting patient survival. We validated the model via simulation to ensure that our algorithm for posterior computation gave nominal coverage rates. The code used for this analysis can be found at https://github.com/sarahsamorodnitsky/Pan-Cancer-Survival-Modeling.git, and the results are summarized at http://ericfrazerlock.com/surv_figs/SurvivalDisplay.html.
Collapse
Affiliation(s)
- Sarah Samorodnitsky
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Katherine A Hoadley
- Department of Genetics, Computational Medicine Program, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Eric F Lock
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
42
|
Thomson W, Jabbari S, Taylor AE, Arlt W, Smith DJ. Simultaneous parameter estimation and variable selection via the logit-normal continuous analogue of the spike-and-slab prior. J R Soc Interface 2020; 16:20180572. [PMID: 30958174 PMCID: PMC6364637 DOI: 10.1098/rsif.2018.0572] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We introduce a Bayesian prior distribution, the logit-normal continuous analogue of the spike-and-slab, which enables flexible parameter estimation and variable/model selection in a variety of settings. We demonstrate its use and efficacy in three case studies—a simulation study and two studies on real biological data from the fields of metabolomics and genomics. The prior allows the use of classical statistical models, which are easily interpretable and well known to applied scientists, but performs comparably to common machine learning methods in terms of generalizability to previously unseen data.
Collapse
Affiliation(s)
- W Thomson
- 1 School of Mathematics, University of Birmingham , Birmingham , UK
| | - S Jabbari
- 1 School of Mathematics, University of Birmingham , Birmingham , UK.,2 Institute of Microbiology and Infection, University of Birmingham , Birmingham , UK
| | - A E Taylor
- 3 Institute of Metabolism and Systems Research, University of Birmingham , Birmingham , UK.,4 Centre for Endocrinology, Diabetes and Metabolism, Birmingham Health Partners , Birmingham B15 2TT , UK
| | - W Arlt
- 3 Institute of Metabolism and Systems Research, University of Birmingham , Birmingham , UK.,4 Centre for Endocrinology, Diabetes and Metabolism, Birmingham Health Partners , Birmingham B15 2TT , UK
| | - D J Smith
- 1 School of Mathematics, University of Birmingham , Birmingham , UK.,3 Institute of Metabolism and Systems Research, University of Birmingham , Birmingham , UK.,4 Centre for Endocrinology, Diabetes and Metabolism, Birmingham Health Partners , Birmingham B15 2TT , UK
| |
Collapse
|
43
|
Abstract
Summary
In Bayesian statistics, the marginal likelihood, also known as the evidence, is used to evaluate model fit as it quantifies the joint probability of the data under the prior. In contrast, non-Bayesian models are typically compared using cross-validation on held-out data, either through $k$-fold partitioning or leave-$p$-out subsampling. We show that the marginal likelihood is formally equivalent to exhaustive leave-$p$-out crossvalidation averaged over all values of $p$ and all held-out test sets when using the log posterior predictive probability as the scoring rule. Moreover, the log posterior predictive score is the only coherent scoring rule under data exchangeability. This offers new insight into the marginal likelihood and cross-validation, and highlights the potential sensitivity of the marginal likelihood to the choice of the prior. We suggest an alternative approach using cumulative cross-validation following a preparatory training phase. Our work has connections to prequential analysis and intrinsic Bayes factors, but is motivated in a different way.
Collapse
Affiliation(s)
- E Fong
- Department of Statistics, University of Oxford, 24–29 St Giles’, Oxford OX1 3LB, UK
| | - C C Holmes
- Department of Statistics, University of Oxford, 24–29 St Giles’, Oxford OX1 3LB, UK
| |
Collapse
|
44
|
Piironen J, Paasiniemi M, Vehtari A. Projective inference in high-dimensional problems: Prediction and feature selection. Electron J Stat 2020. [DOI: 10.1214/20-ejs1711] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
45
|
|
46
|
Vasishth S, Nicenboim B, Engelmann F, Burchert F. Computational Models of Retrieval Processes in Sentence Processing. Trends Cogn Sci 2019; 23:968-982. [DOI: 10.1016/j.tics.2019.09.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 08/31/2019] [Accepted: 09/03/2019] [Indexed: 11/28/2022]
|
47
|
Dey S, Delampady M, Gopalaswamy AM. Bayesian model selection for spatial capture-recapture models. Ecol Evol 2019; 9:11569-11583. [PMID: 31695869 PMCID: PMC6822056 DOI: 10.1002/ece3.5551] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 07/12/2019] [Accepted: 07/22/2019] [Indexed: 11/09/2022] Open
Abstract
A vast amount of ecological knowledge generated over the past two decades has hinged upon the ability of model selection methods to discriminate among various ecological hypotheses. The last decade has seen the rise of Bayesian hierarchical models in ecology. Consequently, commonly used tools, such as the AIC, become largely inapplicable and there appears to be no consensus about a particular model selection tool that can be universally applied. We focus on a specific class of competing Bayesian spatial capture-recapture (SCR) models and apply and evaluate some of the recommended Bayesian model selection tools: (1) Bayes Factor-using (a) Gelfand-Dey and (b) harmonic mean methods, (2) Deviance Information Criterion (DIC), (3) Watanabe-Akaike's Information Criterion (WAIC) and (4) posterior predictive loss criterion. In all, we evaluate 25 variants of model selection tools in our study. We evaluate these model selection tools from the standpoint of selecting the "true" model and parameter estimation. In all, we generate 120 simulated data sets using the true model and assess the frequency with which the true model is selected and how well the tool estimates N (population size), a parameter of much importance to ecologists. We find that when information content is low in the data, no particular model selection tool can be recommended to help realize, simultaneously, both the goals of model selection and parameter estimation. But, in general (when we consider both the objectives together), we recommend the use of our application of the Bayes Factor (Gelfand-Dey with MAP approximation) for Bayesian SCR models. Our study highlights the point that although new model selection tools are emerging (e.g., WAIC) in the applied statistics literature, those tools based on sound theory even under approximation may still perform much better.
Collapse
Affiliation(s)
- Soumen Dey
- Statistics and Mathematics UnitIndian Statistical InstituteBangaloreIndia
| | - Mohan Delampady
- Statistics and Mathematics UnitIndian Statistical InstituteBangaloreIndia
| | | |
Collapse
|
48
|
LaMont CH, Wiggins PA. Correspondence between thermodynamics and inference. Phys Rev E 2019; 99:052140. [PMID: 31212576 DOI: 10.1103/physreve.99.052140] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Indexed: 11/07/2022]
Abstract
We expand upon a natural analogy between Bayesian statistics and statistical physics in which sample size corresponds to inverse temperature. This analogy motivates the definition of two statistical quantities: a learning capacity and a Gibbs entropy. The analysis of the learning capacity, corresponding to the heat capacity in thermal physics, leads to insight into the mechanism of learning and explains why some models have anomalously high learning performance. We explore the properties of the learning capacity in a number of examples, including a sloppy model. Next, we propose that the Gibbs entropy provides a natural device for counting distinguishable distributions in the context of Bayesian inference. We use this device to define a generalized principle of indifference in which every distinguishable model is assigned equal a priori probability. This principle results in a solution to a long-standing problem in Bayesian inference: the definition of an objective or uninformative prior. A key characteristic of this approach is that it can be applied to analyses where the model dimension is unknown and circumvents the automatic rejection of higher-dimensional models in Bayesian inference.
Collapse
Affiliation(s)
- Colin H LaMont
- Department of Bioengineering, University of Washington, 3910 15th Avenue Northeast, Box 351560, Seattle, Washington 98195, USA and Department of Microbiology, University of Washington, 3910 15th Avenue Northeast, Box 351560, Seattle, Washington 98195, USA
| | - Paul A Wiggins
- Department of Bioengineering, University of Washington, 3910 15th Avenue Northeast, Box 351560, Seattle, Washington 98195, USA and Department of Microbiology, University of Washington, 3910 15th Avenue Northeast, Box 351560, Seattle, Washington 98195, USA
| |
Collapse
|
49
|
Stevens BS, Conway CJ. Predicting species distributions: unifying model selection and scale optimization for multi‐scale occupancy models. Ecosphere 2019. [DOI: 10.1002/ecs2.2748] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Bryan S. Stevens
- Idaho Cooperative Fish and Wildlife Research Unit Department of Fish and Wildlife Sciences University of Idaho Moscow Idaho 83844 USA
| | - Courtney J. Conway
- U.S. Geological Survey Idaho Cooperative Fish and Wildlife Research Unit University of Idaho Moscow Idaho 83844 USA
| |
Collapse
|
50
|
Abstract
We introduce the fundamental tenets of Bayesian inference, which derive from two basic laws of probability theory. We cover the interpretation of probabilities, discrete and continuous versions of Bayes' rule, parameter estimation, and model comparison. Using seven worked examples, we illustrate these principles and set up some of the technical background for the rest of this special issue of Psychonomic Bulletin & Review. Supplemental material is available via https://osf.io/wskex/ .
Collapse
|