1
|
Wade S. Bayesian cluster analysis. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220149. [PMID: 36970819 PMCID: PMC10041359 DOI: 10.1098/rsta.2022.0149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 01/03/2023] [Indexed: 06/18/2023]
Abstract
Bayesian cluster analysis offers substantial benefits over algorithmic approaches by providing not only point estimates but also uncertainty in the clustering structure and patterns within each cluster. An overview of Bayesian cluster analysis is provided, including both model-based and loss-based approaches, along with a discussion on the importance of the kernel or loss selected and prior specification. Advantages are demonstrated in an application to cluster cells and discover latent cell types in single-cell RNA sequencing data to study embryonic cellular development. Lastly, we focus on the ongoing debate between finite and infinite mixtures in a model-based approach and robustness to model misspecification. While much of the debate and asymptotic theory focuses on the marginal posterior of the number of clusters, we empirically show that quite a different behaviour is obtained when estimating the full clustering structure. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- S. Wade
- School of Mathematics and Maxwell Institute for Mathematical Sciences, University of Edinburgh, James Clerk Maxwell Building, Edinburgh, UK
| |
Collapse
|
2
|
Franzolini B, Lijoi A, Prünster I. Model selection for maternal hypertensive disorders with symmetric hierarchical Dirichlet processes. Ann Appl Stat 2023. [DOI: 10.1214/22-aoas1628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Beatrice Franzolini
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR)
| | | | | |
Collapse
|
3
|
Barone R, Dalla Valle L. Bayesian Nonparametric Modelling of Conditional Multidimensional Dependence Structures. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2173604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
4
|
Yan Y, Luo X. Bayesian Tree-Structured Two-Level Clustering for Nested Data Analysis. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2130927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
Affiliation(s)
- Yinqiao Yan
- Institute of Statistics and Big Data, Renmin University of China
| | - Xiangyu Luo
- Institute of Statistics and Big Data, Renmin University of China
| |
Collapse
|
5
|
Wei Y, Nguyen X. Convergence of de Finetti’s mixing measure in latent structure models for observed exchangeable sequences. Ann Stat 2022. [DOI: 10.1214/21-aos2120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yun Wei
- Department of Mathematics, University of Michigan
| | | |
Collapse
|
6
|
Riva-Palacio A, Leisen F, Griffin J. Survival Regression Models With Dependent Bayesian Nonparametric Priors. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2020.1864381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Alan Riva-Palacio
- Departamento de Probabilidad y Estadística, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Fabrizio Leisen
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| | - Jim Griffin
- Department of Statistical Science, University College London, London, UK
| |
Collapse
|
7
|
D'Angelo L, Canale A, Yu Z, Guindani M. Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data. Biometrics 2022. [PMID: 35191539 DOI: 10.1111/biom.13626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 01/13/2022] [Indexed: 11/30/2022]
Abstract
Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian finite mixture specification that allows the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Laura D'Angelo
- Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy
| | - Antonio Canale
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine Irvine, U.S.A
| | - Michele Guindani
- Department of Statistics, University of California, Irvine Irvine, U.S.A
| |
Collapse
|
8
|
Quintana FA, Müller P, Jara A, MacEachern SN. The Dependent Dirichlet Process and Related Models. Stat Sci 2022. [DOI: 10.1214/20-sts819] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Fernando A. Quintana
- Fernando A. Quintana is Professor, Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile, and Deputy Director, ANID—Millennium Science Initiative Program—Millennium Nucleus Center for the Discovery of Structures in Co
| | - Peter Müller
- Peter Müller is Professor, Department of Statistics and Data Science, University of Texas at Austin, Austin, Texas, USA
| | - Alejandro Jara
- Alejandro Jara is Associate Professor, Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile, and Director, ANID—Millennium Science Initiative Program—Millennium Nucleus Center for the Discovery of Structures in Compl
| | - Steven N. MacEachern
- Steven N. MacEachern is Professor, Department of Statistics, Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
9
|
Lijoi A, Prünster I, Rebaudo G. Flexible clustering via hidden hierarchical Dirichlet priors. Scand Stat Theory Appl 2022. [DOI: 10.1111/sjos.12578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Antonio Lijoi
- Department of Decision Sciences and BIDSA Bocconi University via Röntgen 1 Milan 20136 Italy
| | - Igor Prünster
- Department of Decision Sciences and BIDSA Bocconi University via Röntgen 1 Milan 20136 Italy
| | - Giovanni Rebaudo
- Department of Statistics and Data Sciences University of Texas at Austin Austin 78712‐1823 TX USA
| |
Collapse
|
10
|
Franssen SEMP, van der Vaart AW. Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative type. Electron J Stat 2022. [DOI: 10.1214/22-ejs2077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
11
|
Nieto-Barajas LE. A class of dependent Dirichlet processes via latent multinomial processes. STATISTICS-ABINGDON 2021. [DOI: 10.1080/02331888.2021.1991348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Di Benedetto G, Caron F, Teh YW. Nonexchangeable random partition models for microclustering. Ann Stat 2021. [DOI: 10.1214/20-aos2003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Denti F, Camerlenghi F, Guindani M, Mira A. A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data. J Am Stat Assoc 2021; 118:405-416. [PMID: 37089274 PMCID: PMC10120855 DOI: 10.1080/01621459.2021.1933499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 05/12/2021] [Accepted: 05/19/2021] [Indexed: 09/30/2022]
Abstract
The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.
Collapse
Affiliation(s)
- Francesco Denti
- Department of Statistics, University of California, Irvine, CA
| | - Federico Camerlenghi
- Department of Economics, Management and Statistics, University of Milano - Bicocca, Milan, Italy
| | | | - Antonietta Mira
- Università della Svizzera italiana, Lugano, Switzerland
- University of Insubria, Como, Italy
| |
Collapse
|
14
|
|
15
|
Hart B, Guindani M, Malone S, Fiecas M. A nonparametric Bayesian model for estimating spectral densities of resting-state EEG twin data. Biometrics 2020; 78:313-323. [PMID: 33058149 DOI: 10.1111/biom.13393] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/16/2020] [Accepted: 10/05/2020] [Indexed: 11/27/2022]
Abstract
Electroencephalography (EEG) is a noninvasive neuroimaging modality that captures electrical brain activity many times per second. We seek to estimate power spectra from EEG data that ware gathered for 557 adolescent twin pairs through the Minnesota Twin Family Study (MTFS). Typically, spectral analysis methods treat time series from each subject separately, and independent spectral densities are fit to each time series. Since the EEG data were collected on twins, it is reasonable to assume that the time series have similar underlying characteristics, so borrowing information across subjects can significantly improve estimation. We propose a Nested Bernstein Dirichlet prior model to estimate the power spectrum of the EEG signal for each subject by smoothing periodograms within and across subjects while requiring minimal user input to tuning parameters. Furthermore, we leverage the MTFS twin study design to estimate the heritability of EEG power spectra with the hopes of establishing new endophenotypes. Through simulation studies designed to mimic the MTFS, we show our method out-performs a set of other popular methods.
Collapse
Affiliation(s)
- Brian Hart
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Michele Guindani
- Department of Statistics, University of California Irvine, Irvine, California, USA
| | - Stephen Malone
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Mark Fiecas
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
16
|
Camerlenghi F, Dunson DB, Lijoi A, Prünster I, Rodríguez A. Latent Nested Nonparametric Priors (with Discussion). BAYESIAN ANALYSIS 2019; 14:1303-1356. [PMID: 35978607 PMCID: PMC9381042 DOI: 10.1214/19-ba1169] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalizing to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop a Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by-product. The results and their inferential implications are showcased on synthetic and real data.
Collapse
Affiliation(s)
- Federico Camerlenghi
- Department of Economics, Management and Statistics, University of Milano - Bicocca, Piazza dell'Ateneo Nuovo 1, 20126 Milano, Italy
- Also affiliated to Collegio Carlo Alberto, Torino and BIDSA, Bocconi University, Milano, Italy
| | - David B Dunson
- Department of Statistical Science, Duke University, Durham, NC 27708-0251 U.S.A
| | - Antonio Lijoi
- Department of Decision Sciences and BIDSA, Bocconi University, via Röntgen 1, 20136 Milano, Italy
| | - Igor Prünster
- Department of Decision Sciences and BIDSA, Bocconi University, via Röntgen 1, 20136 Milano, Italy
| | - Abel Rodríguez
- Department of Applied Mathematics and Statistics, University of California at Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, U.S.A
| |
Collapse
|
17
|
Christensen J, Ma L. A Bayesian hierarchical model for related densities by using Pólya trees. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12346] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - Li Ma
- Duke University; Durham USA
| |
Collapse
|
18
|
Nipoti B, Jara A, Guindani M. A Bayesian semiparametric partially PH model for clustered time-to-event data. Scand Stat Theory Appl 2018. [DOI: 10.1111/sjos.12332] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Bernardo Nipoti
- School of Computer Science and Statistics; Trinity College; Dublin Ireland
| | - Alejandro Jara
- Department of Statistics; Pontificia Universidad Católica de Chile; Santiago Chile
| | - Michele Guindani
- Department of Statistics; The University of California; Irvine CA USA
| |
Collapse
|