1
|
Grabski IN, Vito RD, Trippa L, Parmigiani G. Bayesian combinatorial MultiStudy factor analysis. Ann Appl Stat 2023; 17:2212-2235. [PMID: 37786772 PMCID: PMC10543692 DOI: 10.1214/22-aoas1715] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
Mutations in the BRCA1 and BRCA2 genes are known to be highly associated with breast cancer. Identifying both shared and unique transcript expression patterns in blood samples from these groups can shed insight into if and how the disease mechanisms differ among individuals by mutation status, but this is challenging in the high-dimensional setting. A recent method, Bayesian Multi-Study Factor Analysis (BMSFA), identifies latent factors common to all studies (or equivalently, groups) and latent factors specific to individual studies. However, BMSFA does not allow for factors shared by more than one but less than all studies. This is critical in our context, as we may expect some but not all signals to be shared by BRCA1-and BRCA2-mutation carriers but not necessarily other high-risk groups. We extend BMSFA by introducing a new method, Tetris, for Bayesian combinatorial multi-study factor analysis, which identifies latent factors that any combination of studies or groups can share. We model the subsets of studies that share latent factors with an Indian Buffet Process, and offer a way to summarize uncertainty in the sharing patterns using credible balls. We test our method with an extensive range of simulations, and showcase its utility not only in dimension reduction but also in covariance estimation. When applied to transcript expression data from high-risk families grouped by mutation status, Tetris reveals the features and pathways characterizing each group and the sharing patterns among them. Finally, we further extend Tetris to discover groupings of samples when group labels are not provided, which can elucidate additional structure in these data.
Collapse
Affiliation(s)
- Isabella N Grabski
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Roberta De Vito
- Department of Biostatistics and Data Science Initiative, Brown University, Providence, RI
| | - Lorenzo Trippa
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
| | - Giovanni Parmigiani
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
| |
Collapse
|
2
|
Frühwirth-Schnatter S. Generalized cumulative shrinkage process priors with applications to sparse Bayesian factor analysis. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220148. [PMID: 36970824 DOI: 10.1098/rsta.2022.0148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
The paper discusses shrinkage priors which impose increasing shrinkage in a sequence of parameters. We review the cumulative shrinkage process (CUSP) prior of Legramanti et al. (Legramanti et al. 2020 Biometrika 107, 745-752. (doi:10.1093/biomet/asaa008)), which is a spike-and-slab shrinkage prior where the spike probability is stochastically increasing and constructed from the stick-breaking representation of a Dirichlet process prior. As a first contribution, this CUSP prior is extended by involving arbitrary stick-breaking representations arising from beta distributions. As a second contribution, we prove that exchangeable spike-and-slab priors, which are popular and widely used in sparse Bayesian factor analysis, can be represented as a finite generalized CUSP prior, which is easily obtained from the decreasing order statistics of the slab probabilities. Hence, exchangeable spike-and-slab shrinkage priors imply increasing shrinkage as the column index in the loading matrix increases, without imposing explicit order constraints on the slab probabilities. An application to sparse Bayesian factor analysis illustrates the usefulness of the findings of this paper. A new exchangeable spike-and-slab shrinkage prior based on the triple gamma prior of Cadonna et al. (Cadonna et al. 2020 Econometrics 8, 20. (doi:10.3390/econometrics8020020)) is introduced and shown to be helpful for estimating the unknown number of factors in a simulation study. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Sylvia Frühwirth-Schnatter
- Department of Finance, Accounting and Statistics, Institute for Statistics and Mathematics, WU Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria
| |
Collapse
|
3
|
Casa A, O’Callaghan TF, Murphy TB. Parsimonious Bayesian factor analysis for modelling latent structures in spectroscopy data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Alessandro Casa
- School of Mathematics & Statistics, University College Dublin
| | | | | |
Collapse
|
4
|
Yang D, Choi T, Lavigne E, Chung Y. Non‐parametric Bayesian covariate‐dependent multivariate functional clustering: An application to time‐series data for multiple air pollutants. J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Daewon Yang
- Department of Mathematical Sciences Korea Advanced Institute of Science and Technology Daejeon South Korea
| | - Taeryon Choi
- Department of Statistics Korea University Seoul South Korea
| | - Eric Lavigne
- School of Epidemiology and Public Health University of Ottawa Ottawa Canada
- Air Sectors Assessment and Exposure Science Division Health Canada Ottawa Canada
| | - Yeonseung Chung
- Department of Mathematical Sciences Korea Advanced Institute of Science and Technology Daejeon South Korea
| |
Collapse
|
5
|
Lee J, Jo S, Lee J. Robust sparse Bayesian infinite factor models. Comput Stat 2022. [DOI: 10.1007/s00180-022-01208-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
6
|
De Vito R, Bellio R, Trippa L, Parmigiani G. Bayesian multistudy factor analysis for high-throughput biological data. Ann Appl Stat 2021. [DOI: 10.1214/21-aoas1456] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Ruggero Bellio
- Department of Economics and Statistics, University of Udine
| | - Lorenzo Trippa
- Department of Data Science, Dana Farber Cancer Institute
| | | |
Collapse
|
7
|
Abstract
Summary
Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative impact of the different components as well as the number of components. A popular idea is to include infinitely many components having impact decreasing with the component index. This article is motivated by two limitations of existing methods: (i) the lack of careful consideration of the within component sparsity structure; and (ii) no accommodation for grouped variables and other non-exchangeable structures. We propose a general class of infinite factorization models that address these limitations. Theoretical support is provided, practical gains are shown in simulation studies, and an ecology application focusing on modelling bird species occurrence is discussed.
Collapse
Affiliation(s)
- L Schiavon
- Department of Statistical Sciences, University of Padova, Via Cesare Battisti 241, 35121 Padova, Italy
| | - A Canale
- Department of Statistical Sciences, University of Padova, Via Cesare Battisti 241, 35121 Padova, Italy
| | - D B Dunson
- Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708, U.S.A
| |
Collapse
|
8
|
ROY ARKAPRAVA, LAVINE ISAAC, HERRING AMYH, DUNSON DAVIDB. PERTURBED FACTOR ANALYSIS: ACCOUNTING FOR GROUP DIFFERENCES IN EXPOSURE PROFILES. Ann Appl Stat 2021; 15:1386-1404. [PMID: 36324423 PMCID: PMC9624461 DOI: 10.1214/20-aoas1435] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
In this article we investigate group differences in phthalate exposure profiles using NHANES data. Phthalates are a family of industrial chemicals used in plastics and as solvents. There is increasing evidence of adverse health effects of exposure to phthalates on reproduction and neurodevelopment and concern about racial disparities in exposure. We would like to identify a single set of low-dimensional factors summarizing exposure to different chemicals, while allowing differences across groups. Improving on current multigroup additive factor models, we propose a class of Perturbed Factor Analysis (PFA) models that assume a common factor structure after perturbing the data via multiplication by a group-specific matrix. Bayesian inference algorithms are defined using a matrix normal hierarchical model for the perturbation matrices. The resulting model is just as flexible as current approaches in allowing arbitrarily large differences across groups but has substantial advantages that we illustrate in simulation studies. Applying PFA to NHANES data, we learn common factors summarizing exposures to phthalates, while showing clear differences across groups.
Collapse
Affiliation(s)
| | - ISAAC LAVINE
- Department of Statistical Science, Duke University
| | | | | |
Collapse
|
9
|
Moran KR, Dunson D, Wheeler MW, Herring AH. Bayesian joint modeling of chemical structure and dose response curves. Ann Appl Stat 2021; 15:1405-1430. [DOI: 10.1214/21-aoas1461] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - David Dunson
- Department of Statistical Science, Duke University
| | - Matthew W. Wheeler
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences
| | | |
Collapse
|
10
|
Moran KR, Turner EL, Dunson D, Herring AH. Bayesian hierarchical factor regression models to infer cause of death from verbal autopsy data. J R Stat Soc Ser C Appl Stat 2021; 70:532-557. [PMID: 34334826 PMCID: PMC8320757 DOI: 10.1111/rssc.12468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In low-resource settings where vital registration of death is not routine it is often of critical interest to determine and study the cause of death (COD) for individuals and the cause-specific mortality fraction (CSMF) for populations. Post-mortem autopsies, considered the gold standard for COD assignment, are often difficult or impossible to implement due to deaths occurring outside the hospital, expense, and/or cultural norms. For this reason, Verbal Autopsies (VAs) are commonly conducted, consisting of a questionnaire administered to next of kin recording demographic information, known medical conditions, symptoms, and other factors for the decedent. This article proposes a novel class of hierarchical factor regression models that avoid restrictive assumptions of standard methods, allow both the mean and covariance to vary with COD category, and can include covariate information on the decedent, region, or events surrounding death. Taking a Bayesian approach to inference, this work develops an MCMC algorithm and validates the FActor Regression for Verbal Autopsy (FARVA) model in simulation experiments. An application of FARVA to real VA data shows improved goodness-of-fit and better predictive performance in inferring COD and CSMF over competing methods. Code and a user manual are made available at https://github.com/kelrenmor/farva.
Collapse
Affiliation(s)
- Kelly R. Moran
- Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Elizabeth L. Turner
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
- Duke Global Health Institute, Duke University, Durham, NC, USA
| | - David Dunson
- Department of Statistical Science, Duke University, Durham, NC, USA
- Department of Mathematics, Duke University, Durham, NC, USA
| | - Amy H. Herring
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
- Duke Global Health Institute, Duke University, Durham, NC, USA
- Department of Statistical Science, Duke University, Durham, NC, USA
| |
Collapse
|
11
|
Zeng S, Rosenbaum S, Alberts SC, Archie EA, Li F. Causal mediation analysis for sparse and irregular longitudinal data. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1427] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Shuxi Zeng
- Department of Statistical Science, Duke University
| | | | - Susan C. Alberts
- Departments of Biology and Evolutionary Anthropology, Duke University
| | | | - Fan Li
- Department of Statistical Science, Duke University
| |
Collapse
|
12
|
Bayesian Matrix Completion Approach to Causal Inference with Panel Data. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2021. [DOI: 10.1007/s42519-021-00188-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Legramanti S, Durante D, Dunson DB. Bayesian cumulative shrinkage for infinite factorizations. Biometrika 2020; 107:745-752. [PMID: 32831355 DOI: 10.1093/biomet/asaa008] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Indexed: 11/13/2022] Open
Abstract
The dimension of the parameter space is typically unknown in a variety of models that rely on factorizations. For example, in factor analysis the number of latent factors is not known and has to be inferred from the data. Although classical shrinkage priors are useful in such contexts, increasing shrinkage priors can provide a more effective approach that progressively penalizes expansions with growing complexity. In this article we propose a novel increasing shrinkage prior, called the cumulative shrinkage process, for the parameters that control the dimension in overcomplete formulations. Our construction has broad applicability and is based on an interpretable sequence of spike-and-slab distributions which assign increasing mass to the spike as the model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and practical advantages relative to current competitors, including an improved ability to recover the model dimension. An adaptive Markov chain Monte Carlo algorithm is proposed, and the performance gains are outlined in simulations and in an application to personality data.
Collapse
Affiliation(s)
- Sirio Legramanti
- Department of Decision Sciences, Bocconi University, Via Röntgen 1, 20136 Milan, Italy
| | - Daniele Durante
- Department of Decision Sciences, Bocconi University, Via Röntgen 1, 20136 Milan, Italy
| | - David B Dunson
- Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27707, USA
| |
Collapse
|
14
|
Durante D, Dunson DB, Vogelstein JT. Rejoinder: Nonparametric Bayes Modeling of Populations of Networks. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1395643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Daniele Durante
- Department of Decision Sciences, Bocconi University, Milan, Italy
| | - David B. Dunson
- Department of Statistical Science, Duke University, Durham, NC
| | - Joshua T. Vogelstein
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
- Child Mind Institute, New York, NY
| |
Collapse
|