1
|
Xi X, Ruffieux H. A modeling framework for detecting and leveraging node-level information in Bayesian network inference. Biostatistics 2024:kxae021. [PMID: 38916966 DOI: 10.1093/biostatistics/kxae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 03/11/2024] [Accepted: 06/02/2024] [Indexed: 06/27/2024] Open
Abstract
Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
Collapse
Affiliation(s)
- Xiaoyue Xi
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom
| | - Hélène Ruffieux
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom
| |
Collapse
|
2
|
Mulgrave JJ, Ghosal S. Bayesian analysis of nonparanormal graphical models using rank-likelihood. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
3
|
Rejoinder to the discussion of “Bayesian graphical models for modern biological applications”. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-022-00634-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
Dai W, Hu T, Jin B, Shi X. Incorporating grouping information into Bayesian Gaussian graphical model selection. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2022.2053864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Wei Dai
- Department of Statistics and Finance, University of Science and Technology of China, Anhui, China
| | - Taizhong Hu
- Department of Statistics and Finance, University of Science and Technology of China, Anhui, China
| | - Baisuo Jin
- Department of Statistics and Finance, University of Science and Technology of China, Anhui, China
| | - Xiaoping Shi
- Irving K. Barber School of Arts and Sciences, University of British Columbia, Kelowna, British Columbia, Canada
| |
Collapse
|
5
|
Mulgrave JJ, Ghosal S. Regression‐based
Bayesian estimation and structure learning for nonparanormal graphical models. Stat Anal Data Min 2022; 15:611-629. [PMID: 36090618 PMCID: PMC9455150 DOI: 10.1002/sam.11576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A nonparanormal graphical model is a semiparametric generalization of a Gaussian graphical model for continuous variables in which it is assumed that the variables follow a Gaussian graphical model only after some unknown smooth monotone transformations. We consider a Bayesian approach to inference in a nonparanormal graphical model in which we put priors on the unknown transformations through a random series based on B‐splines. We use a regression formulation to construct the likelihood through the Cholesky decomposition on the underlying precision matrix of the transformed variables and put shrinkage priors on the regression coefficients. We apply a plug‐in variational Bayesian algorithm for learning the sparse precision matrix and compare the performance to a posterior Gibbs sampling scheme in a simulation study. We finally apply the proposed methods to a microarray dataset. The proposed methods have better performance as the dimension increases, and in particular, the variational Bayesian approach has the potential to speed up the estimation in the Bayesian nonparanormal graphical model without the Gaussianity assumption while retaining the information to construct the graph.
Collapse
Affiliation(s)
- Jami J. Mulgrave
- Department of Statistics North Carolina State University Raleigh North Carolina
| | - Subhashis Ghosal
- Department of Statistics North Carolina State University Raleigh North Carolina
| |
Collapse
|
6
|
Osborne N, Peterson CB, Vannucci M. Latent Network Estimation and Variable Selection for Compositional Data Via Variational EM. J Comput Graph Stat 2022; 31:163-175. [PMID: 36776345 PMCID: PMC9909885 DOI: 10.1080/10618600.2021.1935971] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation-maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.
Collapse
Affiliation(s)
| | - Christine B. Peterson
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| | | |
Collapse
|
7
|
Kuang Y, Xie J. Distributed testing on mutual independence of massive multivariate data. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.2006232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Yongxin Kuang
- School of Mathematics and Statistics, Henan University, Kaifeng, P.R. China
| | - Junshan Xie
- School of Mathematics and Statistics, Henan University, Kaifeng, P.R. China
| |
Collapse
|
8
|
Bayesian inference of clustering and multiple Gaussian graphical models selection. J Korean Stat Soc 2021. [DOI: 10.1007/s42952-021-00147-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
Li ZR, McComick TH, Clark SJ. Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. BAYESIAN ANALYSIS 2020; 15:781-807. [PMID: 33273996 PMCID: PMC7709479 DOI: 10.1214/19-ba1172] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Learning dependence relationships among variables of mixed types provides insights in a variety of scientific settings and is a well-studied problem in statistics. Existing methods, however, typically rely on copious, high quality data to accurately learn associations. In this paper, we develop a method for scientific settings where learning dependence structure is essential, but data are sparse and have a high fraction of missing values. Specifically, our work is motivated by survey-based cause of death assessments known as verbal autopsies (VAs). We propose a Bayesian approach to characterize dependence relationships using a latent Gaussian graphical model that incorporates informative priors on the marginal distributions of the variables. We demonstrate such information can improve estimation of the dependence structure, especially in settings with little training data. We show that our method can be integrated into existing probabilistic cause-of-death assignment algorithms and improves model performance while recovering dependence patterns between symptoms that can inform efficient questionnaire design in future data collection.
Collapse
Affiliation(s)
- Zehang Richard Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
| | - Tyler H McComick
- Department of Statistics and Department of Sociology, University of Washington, Seattle, WA
| | - Samuel J Clark
- Department of Sociology, The Ohio State University, Columbus, OH
| |
Collapse
|
10
|
Peterson CB, Osborne N, Stingo FC, Bourgeat P, Doecke JD, Vannucci M. Bayesian modeling of multiple structural connectivity networks during the progression of Alzheimer's disease. Biometrics 2020; 76:1120-1132. [PMID: 32026459 DOI: 10.1111/biom.13235] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 01/10/2020] [Accepted: 01/14/2020] [Indexed: 11/29/2022]
Abstract
Alzheimer's disease is the most common neurodegenerative disease. The aim of this study is to infer structural changes in brain connectivity resulting from disease progression using cortical thickness measurements from a cohort of participants who were either healthy control, or with mild cognitive impairment, or Alzheimer's disease patients. For this purpose, we develop a novel approach for inference of multiple networks with related edge values across groups. Specifically, we infer a Gaussian graphical model for each group within a joint framework, where we rely on Bayesian hierarchical priors to link the precision matrix entries across groups. Our proposal differs from existing approaches in that it flexibly learns which groups have the most similar edge values, and accounts for the strength of connection (rather than only edge presence or absence) when sharing information across groups. Our results identify key alterations in structural connectivity that may reflect disruptions to the healthy brain, such as decreased connectivity within the occipital lobe with increasing disease severity. We also illustrate the proposed method through simulations, where we demonstrate its performance in structure learning and precision matrix estimation with respect to alternative approaches.
Collapse
Affiliation(s)
| | - Nathan Osborne
- Department of Statistics, Rice University, Houston, Texas
| | - Francesco C Stingo
- Department of Statistics, Computer Science, Applications "G. Parenti", University of Florence, Florence, Italy
| | - Pierrick Bourgeat
- Australian eHealth Research Centre, CSIRO Health and Biosecurity, Herston, Queensland, Australia
| | - James D Doecke
- Australian eHealth Research Centre, CSIRO Health and Biosecurity, Herston, Queensland, Australia
| | | |
Collapse
|
11
|
Richard Li Z, McCormick TH, Clark SJ. Bayesian Joint Spike-and-Slab Graphical Lasso. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2019; 97:3877-3885. [PMID: 33521648 PMCID: PMC7845917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a new class of priors for Bayesian inference with multiple Gaussian graphical models. We introduce Bayesian treatments of two popular procedures, the group graphical lasso and the fused graphical lasso, and extend them to a continuous spike-and-slab framework to allow self-adaptive shrinkage and model selection simultaneously. We develop an EM algorithm that performs fast and dynamic explorations of posterior modes. Our approach selects sparse models efficiently and automatically with substantially smaller bias than would be induced by alternative regularization procedures. The performance of the proposed methods are demonstrated through simulation and two real data examples.
Collapse
Affiliation(s)
- Zehang Richard Li
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Tyler H. McCormick
- Department of Statistics, University of Washington, Seattle, Washington, USA
- Department of Sociology, University of Washington, Seattle, Washington, USA
| | - Samuel J. Clark
- Department of Sociology, Ohio State University, Columbus, Ohio, USA
| |
Collapse
|