1
|
Arima S, Polettini S, Pasculli G, Gesualdo L, Pesce F, Procaccini DA. A Bayesian nonparametric approach to correct for underreporting in count data. Biostatistics 2024; 25:904-918. [PMID: 37811675 DOI: 10.1093/biostatistics/kxad027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 06/06/2023] [Accepted: 08/21/2023] [Indexed: 10/10/2023] Open
Abstract
We propose a nonparametric compound Poisson model for underreported count data that introduces a latent clustering structure for the reporting probabilities. The latter are estimated with the model's parameters based on experts' opinion and exploiting a proxy for the reporting process. The proposed model is used to estimate the prevalence of chronic kidney disease in Apulia, Italy, based on a unique statistical database covering information on m = 258 municipalities obtained by integrating multisource register information. Accurate prevalence estimates are needed for monitoring, surveillance, and management purposes; yet, counts are deemed to be considerably underreported, especially in some areas of Apulia, one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data on early neonatal mortality risk in Brazil, described in previous research: the proposed approach proves to be accurate and particularly suitable when partial information about data quality is available.
Collapse
Affiliation(s)
- Serena Arima
- Department of Human and Social Sciences, University of Salento, Via di Valesio, 73100, LECCE, Italy
| | - Silvia Polettini
- Department of Social and Economic Sciences, Sapienza University, P.le Aldo Moro, 5, 00185 ROMA, Italy
| | - Giuseppe Pasculli
- Department of Computer, Control, and Management Engineering "Antonio Ruberti", Sapienza University, Via Ariosto, 25, 00185 Roma RM, Italy
| | - Loreto Gesualdo
- Section of Nephrology, Department of Precision and Regenerative Medicine and Ionian Area (DiMePre-J), Azienda Ospedaliero Universitaria Consorziale Policlinico di Bari, Piazza Giulio Cesare, 11 - 70124 Bari, Italy
| | - Francesco Pesce
- Division of Renal Medicine, "Fatebenefratelli Isola Tiberina-Gemelli Isola", 00186 Rome, Italy
| | - Deni-Aldo Procaccini
- Section of Nephrology, Department of Precision and Regenerative Medicine and Ionian Area (DiMePre-J), Azienda Ospedaliero Universitaria Consorziale Policlinico di Bari, Piazza Giulio Cesare, 11 - 70124 Bari, Italy
| |
Collapse
|
2
|
Areed WD, Price A, Thompson H, Malseed R, Mengersen K. Spatial non-parametric Bayesian clustered coefficients. Sci Rep 2024; 14:9677. [PMID: 38678077 PMCID: PMC11055928 DOI: 10.1038/s41598-024-59973-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 04/17/2024] [Indexed: 04/29/2024] Open
Abstract
In the field of population health research, understanding the similarities between geographical areas and quantifying their shared effects on health outcomes is crucial. In this paper, we synthesise a number of existing methods to create a new approach that specifically addresses this goal. The approach is called a Bayesian spatial Dirichlet process clustered heterogeneous regression model. This non-parametric framework allows for inference on the number of clusters and the clustering configurations, while simultaneously estimating the parameters for each cluster. We demonstrate the efficacy of the proposed algorithm using simulated data and further apply it to analyse influential factors affecting children's health development domains in Queensland. The study provides valuable insights into the contributions of regional similarities in education and demographics to health outcomes, aiding targeted interventions and policy design.
Collapse
Affiliation(s)
- Wala Draidi Areed
- School of Mathematical Science, Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia.
| | - Aiden Price
- School of Mathematical Science, Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia
| | - Helen Thompson
- School of Mathematical Science, Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia
| | - Reid Malseed
- Children's Health Queensland, Brisbane, QLD, Australia
| | - Kerrie Mengersen
- School of Mathematical Science, Centre for Data Science, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
3
|
Müller P, Flores B. Discussion on "Bayesian meta-analysis of penetrance for cancer risk" by Thanthirige Lakshika M. Ruberu, Danielle Braun, Giovanni Parmigiani, and Swati Biswas. Biometrics 2024; 80:ujae042. [PMID: 38819313 DOI: 10.1093/biomtc/ujae042] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/25/2023] [Accepted: 05/11/2024] [Indexed: 06/01/2024]
Abstract
Ruberu et al. (2023) introduce an elegant approach to fit a complicated meta-analysis problem with diverse reporting modalities into the framework of hierarchical Bayesian inference. We discuss issues related to some of the involved parametric model assumptions.
Collapse
Affiliation(s)
- Peter Müller
- Department of Statistics and Data Science, University of Texas, Austin, TX 78712, United States
| | - Bernardo Flores
- Department of Statistics and Data Science, University of Texas, Austin, TX 78712, United States
| |
Collapse
|
4
|
Zorzetto D, Bargagli-Stoffi FJ, Canale A, Dominici. F. Confounder-dependent Bayesian mixture model: Characterizing heterogeneity of causal effects in air pollution epidemiology. Biometrics 2024; 80:ujae025. [PMID: 38640436 PMCID: PMC11028589 DOI: 10.1093/biomtc/ujae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 01/24/2024] [Accepted: 03/14/2024] [Indexed: 04/21/2024]
Abstract
Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.
Collapse
Affiliation(s)
- Dafne Zorzetto
- Department of Statistics, University of Padova, Padova 35121, Italy
| | - Falco J Bargagli-Stoffi
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115 MA, United States
| | - Antonio Canale
- Department of Statistics, University of Padova, Padova 35121, Italy
| | - Francesca Dominici.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115 MA, United States
| |
Collapse
|
5
|
Wade S. Bayesian cluster analysis. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220149. [PMID: 36970819 PMCID: PMC10041359 DOI: 10.1098/rsta.2022.0149] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 01/03/2023] [Indexed: 06/18/2023]
Abstract
Bayesian cluster analysis offers substantial benefits over algorithmic approaches by providing not only point estimates but also uncertainty in the clustering structure and patterns within each cluster. An overview of Bayesian cluster analysis is provided, including both model-based and loss-based approaches, along with a discussion on the importance of the kernel or loss selected and prior specification. Advantages are demonstrated in an application to cluster cells and discover latent cell types in single-cell RNA sequencing data to study embryonic cellular development. Lastly, we focus on the ongoing debate between finite and infinite mixtures in a model-based approach and robustness to model misspecification. While much of the debate and asymptotic theory focuses on the marginal posterior of the number of clusters, we empirically show that quite a different behaviour is obtained when estimating the full clustering structure. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- S. Wade
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, James Clerk Maxwell Building, Edinburgh, UK
| |
Collapse
|
6
|
Covariate dependent Beta-GOS process. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Baerenbold O, Meis M, Martínez‐Hernández I, Euán C, Burr WS, Tremper A, Fuller G, Pirani M, Blangiardo M. A dependent Bayesian Dirichlet process model for source apportionment of particle number size distribution. ENVIRONMETRICS 2023; 34:e2763. [PMID: 37035022 PMCID: PMC10077992 DOI: 10.1002/env.2763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/16/2022] [Indexed: 06/19/2023]
Abstract
The relationship between particle exposure and health risks has been well established in recent years. Particulate matter (PM) is made up of different components coming from several sources, which might have different level of toxicity. Hence, identifying these sources is an important task in order to implement effective policies to improve air quality and population health. The problem of identifying sources of particulate pollution has already been studied in the literature. However, current methods require an a priori specification of the number of sources and do not include information on covariates in the source allocations. Here, we propose a novel Bayesian nonparametric approach to overcome these limitations. In particular, we model source contribution using a Dirichlet process as a prior for source profiles, which allows us to estimate the number of components that contribute to particle concentration rather than fixing this number beforehand. To better characterize them we also include meteorological variables (wind speed and direction) as covariates within the allocation process via a flexible Gaussian kernel. We apply the model to apportion particle number size distribution measured near London Gatwick Airport (UK) in 2019. When analyzing this data, we are able to identify the most common PM sources, as well as new sources that have not been identified with the commonly used methods.
Collapse
Affiliation(s)
- Oliver Baerenbold
- Department of Epidemiology and Biostatistics, MRC Centre for Environment and HealthImperial CollegeLondonUK
| | - Melanie Meis
- Department of Atmospheric and Oceanic SciencesConsejo Nacional de Investigaciones Cientinficas y Tecnologicas (CONICET), Centro del Mar y la Atmósfera y los Océanos (CIMA‐UBA‐CONICET), Universidad de Buenos AiresBuenos AiresArgentina
| | | | - Carolina Euán
- Department of Mathematics and StatisticsLancaster UniversityLancasterUK
| | - Wesley S. Burr
- Department of MathematicsTrent UniversityPeterboroughOntarioCanada
| | - Anja Tremper
- Department of Epidemiology and Biostatistics, MRC Centre for Environment and HealthImperial CollegeLondonUK
| | - Gary Fuller
- Department of Epidemiology and Biostatistics, MRC Centre for Environment and HealthImperial CollegeLondonUK
| | - Monica Pirani
- Department of Epidemiology and Biostatistics, MRC Centre for Environment and HealthImperial CollegeLondonUK
| | - Marta Blangiardo
- Department of Epidemiology and Biostatistics, MRC Centre for Environment and HealthImperial CollegeLondonUK
| |
Collapse
|
8
|
Zito A, Rigon T, Ovaskainen O, Dunson DB. Bayesian Modeling of Sequential Discoveries. J Am Stat Assoc 2022; 118:2521-2532. [PMID: 38501061 PMCID: PMC10947068 DOI: 10.1080/01621459.2022.2060835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/23/2022] [Indexed: 10/18/2022]
Abstract
We aim at modeling the appearance of distinct tags in a sequence of labeled objects. Common examples of this type of data include words in a corpus or distinct species in a sample. These sequential discoveries are often summarized via accumulation curves, which count the number of distinct entities observed in an increasingly large set of objects. We propose a novel Bayesian method for species sampling modeling by directly specifying the probability of a new discovery, therefore, allowing for flexible specifications. The asymptotic behavior and finite sample properties of such an approach are extensively studied. Interestingly, our enlarged class of sequential processes includes highly tractable special cases. We present a subclass of models characterized by appealing theoretical and computational properties, including one that shares the same discovery probability with the Dirichlet process. Moreover, due to strong connections with logistic regression models, the latter subclass can naturally account for covariates. We finally test our proposal on both synthetic and real data, with special emphasis on a large fungal biodiversity study in Finland. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Alessandro Zito
- Department of Statistical Science, Duke University, Durham,
NC
| | - Tommaso Rigon
- Department of Economics, Management and Statistics,
University of Milano–Bicocca, Milan, Italy
| | - Otso Ovaskainen
- Department of Biological and Environmental Science,
University of Jyväskylä, Jyväskylä, Finland
- Organismal and Evolutionary Biology Research Programme,
University of Helsinki, Helsinki, Finland
- Centre for Biodiversity Dynamics, Department of Biology,
Norwegian University of Science and Technology, Trondheim, Norway
| | - David B. Dunson
- Department of Statistical Science, Duke University, Durham,
NC
| |
Collapse
|
9
|
Lijoi A, Prünster I, Rebaudo G. Flexible clustering via hidden hierarchical Dirichlet priors. Scand Stat Theory Appl 2022. [DOI: 10.1111/sjos.12578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Antonio Lijoi
- Department of Decision Sciences and BIDSA Bocconi University via Röntgen 1 Milan 20136 Italy
| | - Igor Prünster
- Department of Decision Sciences and BIDSA Bocconi University via Röntgen 1 Milan 20136 Italy
| | - Giovanni Rebaudo
- Department of Statistics and Data Sciences University of Texas at Austin Austin 78712‐1823 TX USA
| |
Collapse
|
10
|
Wehrhahn C, Barrientos AF, Jara A. Dependent Bayesian nonparametric modeling of compositional data using random Bernstein polynomials. Electron J Stat 2022. [DOI: 10.1214/22-ejs2002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Claudia Wehrhahn
- Department of Statistics, University of California Santa Cruz, Santa Cruz, U.S.A
| | | | - Alejandro Jara
- MiDaS-Center for the Discovery of Structures in Complex Data and Department of Statistics, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|