1
|
Xi X, Ruffieux H. A modeling framework for detecting and leveraging node-level information in Bayesian network inference. Biostatistics 2024:kxae021. [PMID: 38916966 DOI: 10.1093/biostatistics/kxae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 03/11/2024] [Accepted: 06/02/2024] [Indexed: 06/27/2024] Open
Abstract
Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.
Collapse
Affiliation(s)
- Xiaoyue Xi
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom
| | - Hélène Ruffieux
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR, United Kingdom
| |
Collapse
|
2
|
Niu Y, Ni Y, Pati D, Mallick BK. Covariate-Assisted Bayesian Graph Learning for Heterogeneous Data. J Am Stat Assoc 2023; 119:1985-1999. [PMID: 39507103 PMCID: PMC11536292 DOI: 10.1080/01621459.2023.2233744] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 06/01/2023] [Accepted: 06/25/2023] [Indexed: 11/08/2024]
Abstract
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates. Building on product partition models, we propose a novel covariate-dependent Gaussian graphical model that allows graphs to vary with covariates so that observations whose covariates are similar share a similar undirected graph. To efficiently embed Gaussian graphical models into our proposed framework, we explore both Gaussian likelihood and pseudo-likelihood functions. For Gaussian likelihood, a G-Wishart distribution is used as a natural conjugate prior, and for the pseudo-likelihood, a product of Gaussianconditionals is used. Moreover, the proposed model has large prior support and is flexible to approximate any v-Hölder conditional variance-covariance matrices with v ∈ ( 0,1 ] . We further show that based on the theory of fractional likelihood, the rate of posterior contraction is minimax optimal assuming the true density to be a Gaussian mixture with a known number of components. The efficacy of the approach is demonstrated via simulation studies and an analysis of a protein network for a breast cancer dataset assisted by mRNA gene expression as covariates.
Collapse
Affiliation(s)
- Yabo Niu
- Department of Mathematics, University of Houston
| | - Yang Ni
- Department of Statistics, Texas A&M University
| | | | | |
Collapse
|
3
|
Zuber V, Lewin A, Levin MG, Haglund A, Ben-Aicha S, Emanueli C, Damrauer S, Burgess S, Gill D, Bottolo L. Multi-response Mendelian randomization: Identification of shared and distinct exposures for multimorbidity and multiple related disease outcomes. Am J Hum Genet 2023; 110:1177-1199. [PMID: 37419091 PMCID: PMC10357504 DOI: 10.1016/j.ajhg.2023.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 06/11/2023] [Accepted: 06/11/2023] [Indexed: 07/09/2023] Open
Abstract
The existing framework of Mendelian randomization (MR) infers the causal effect of one or multiple exposures on one single outcome. It is not designed to jointly model multiple outcomes, as would be necessary to detect causes of more than one outcome and would be relevant to model multimorbidity or other related disease outcomes. Here, we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses. MR2 uses a sparse Bayesian Gaussian copula regression framework to detect causal effects while estimating the residual correlation between summary-level outcomes, i.e., the correlation that cannot be explained by the exposures, and vice versa. We show both theoretically and in a comprehensive simulation study how unmeasured shared pleiotropy induces residual correlation between outcomes irrespective of sample overlap. We also reveal how non-genetic factors that affect more than one outcome contribute to their correlation. We demonstrate that by accounting for residual correlation, MR2 has higher power to detect shared exposures causing more than one outcome. It also provides more accurate causal effect estimates than existing methods that ignore the dependence between related responses. Finally, we illustrate how MR2 detects shared and distinct causal exposures for five cardiovascular diseases in two applications considering cardiometabolic and lipidomic exposures and uncovers residual correlation between summary-level outcomes reflecting known relationships between cardiovascular diseases.
Collapse
Affiliation(s)
- Verena Zuber
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK; MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK; UK Dementia Research Institute, Imperial College London, London, UK.
| | - Alex Lewin
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | - Michael G Levin
- Division of Cardiovascular Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Corporal Michael J. Crescenz VA Medical Center, Philadelphia, USA
| | - Alexander Haglund
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK
| | - Soumaya Ben-Aicha
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Costanza Emanueli
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Scott Damrauer
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Corporal Michael J. Crescenz VA Medical Center, Philadelphia, USA
| | - Stephen Burgess
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK; Cardiovascular Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Dipender Gill
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK; Chief Scientific Advisor Office, Research and Early Development, Novo Nordisk, Copenhagen, Denmark
| | - Leonardo Bottolo
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK; Alan Turing Institute, London, UK; MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
4
|
Yang Y, Xia S, Yang H. Multivariate sparse Laplacian shrinkage for joint estimation of two graphical structures. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Huang YJ, Mukherjee R, Hsiao CK. Probabilistic edge inference of gene networks with markov random field-based bayesian learning. Front Genet 2022; 13:1034946. [DOI: 10.3389/fgene.2022.1034946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/24/2022] [Indexed: 11/11/2022] Open
Abstract
Current algorithms for gene regulatory network construction based on Gaussian graphical models focuses on the deterministic decision of whether an edge exists. Both the probabilistic inference of edge existence and the relative strength of edges are often overlooked, either because the computational algorithms cannot account for this uncertainty or because it is not straightforward in implementation. In this study, we combine the Bayesian Markov random field and the conditional autoregressive (CAR) model to tackle simultaneously these two tasks. The uncertainty of edge existence and the relative strength of edges can be measured and quantified based on a Bayesian model such as the CAR model and the spike-and-slab lasso prior. In addition, the strength of the edges can be utilized to prioritize the importance of the edges in a network graph. Simulations and a glioblastoma cancer study were carried out to assess the proposed model’s performance and to compare it with existing methods when a binary decision is of interest. The proposed approach shows stable performance and may provide novel structures with biological insights.
Collapse
|
6
|
Multivariate phenotype analysis enables genome-wide inference of mammalian gene function. PLoS Biol 2022; 20:e3001723. [PMID: 35944064 PMCID: PMC9391051 DOI: 10.1371/journal.pbio.3001723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 08/19/2022] [Accepted: 06/22/2022] [Indexed: 11/23/2022] Open
Abstract
The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines. There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype-gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene-phenotype map with good sensitivity. A factor analysis of the MV model's fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.
Collapse
|
7
|
|
8
|
Samanta S, Khare K, Michailidis G. A generalized likelihood-based Bayesian approach for scalable joint regression and covariance selection in high dimensions. STATISTICS AND COMPUTING 2022; 32:47. [PMID: 36713060 PMCID: PMC9881595 DOI: 10.1007/s11222-022-10102-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 04/27/2022] [Indexed: 06/05/2023]
Abstract
The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the latter. While Bayesian methods have the advantage of providing natural uncertainty quantification through posterior inclusion probabilities and credible intervals, current Bayesian approaches either restrict to specific sub-classes of sparsity patterns and/or are not scalable to settings with hundreds of responses and predictors. Bayesian approaches which only focus on estimating the posterior mode are scalable, but do not generate samples from the posterior distribution for uncertainty quantification. Using a bi-convex regression based generalized likelihood and spike-and-slab priors, we develop an algorithm called Joint Regression Network Selector (JRNS) for joint regression and covariance selection which (a) can accommodate general sparsity patterns, (b) provides posterior samples for uncertainty quantification, and (c) is scalable and orders of magnitude faster than the state-of-the-art Bayesian approaches providing uncertainty quantification. We demonstrate the statistical and computational efficacy of the proposed approach on synthetic data and through the analysis of selected cancer data sets. We also establish high-dimensional posterior consistency for one of the developed algorithms.
Collapse
|
9
|
Molinari M, Cremaschi A, De Iorio M, Chaturvedi N, Hughes AD, Tillin T. Bayesian nonparametric modelling of multiple graphs with an application to ethnic metabolic differences. J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Andrea Cremaschi
- Singapore Institute of Clinical SciencesAgency for Science, Technology and Research SingaporeSingapore
| | - Maria De Iorio
- Department of Statistical ScienceUCL LondonUK
- Singapore Institute of Clinical SciencesAgency for Science, Technology and Research SingaporeSingapore
- Yong Loo Lin School of MedicineNational University of Singapore SingaporeSingapore
- Yale‐NUS College SingaporeSingapore
| | - Nishi Chaturvedi
- Department of Population Science & Experimental MedicineInstitute of Cardiovascular ScienceUCL LondonUK
| | - Alun D. Hughes
- Department of Population Science & Experimental MedicineInstitute of Cardiovascular ScienceUCL LondonUK
| | - Therese Tillin
- Department of Population Science & Experimental MedicineInstitute of Cardiovascular ScienceUCL LondonUK
| |
Collapse
|
10
|
Nie L, Ročková V. Bayesian Bootstrap Spike-and-Slab LASSO. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2025815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Lizhen Nie
- Department of Statistics, University of Chicago
| | - Veronika Ročková
- Econometrics and Statistics and James S. Kemper Faculty Scholar at the Booth School of Business, University of Chicago
| |
Collapse
|
11
|
Zhang R, Ghosh M. Ultra high-dimensional multivariate posterior contraction rate under shrinkage priors. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104835] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Osborne N, Peterson CB, Vannucci M. Latent Network Estimation and Variable Selection for Compositional Data Via Variational EM. J Comput Graph Stat 2022; 31:163-175. [PMID: 36776345 PMCID: PMC9909885 DOI: 10.1080/10618600.2021.1935971] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation-maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.
Collapse
Affiliation(s)
| | - Christine B. Peterson
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| | | |
Collapse
|
13
|
Liu Y, Baggerly KA, Orouji E, Manyam G, Chen H, Lam M, Davis JS, Lee MS, Broom BM, Menter DG, Rai K, Kopetz S, Morris JS. Methylation-eQTL Analysis in Cancer Research. Bioinformatics 2021; 37:4014-4022. [PMID: 34117863 PMCID: PMC9188481 DOI: 10.1093/bioinformatics/btab443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 03/15/2021] [Accepted: 06/11/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION DNA methylation is a key epigenetic factor regulating gene expression. While promoter methylation has been well studied, recent publications have revealed that functionally important methylation also occurs in intergenic and distal regions, and varies across genes and tissue types. Given the growing importance of inter-platform integrative genomic analyses, there is an urgent need to develop methods to discover and characterize gene-level relationships between methylation and expression. RESULTS We introduce a novel sequential penalized regression approach to identify methylation-expression quantitative trait loci (methyl-eQTLs), a term that we have coined to represent, for each gene and tissue type, a sparse set of CpG loci best explaining gene expression and accompanying weights indicating direction and strength of association. Using TCGA and MD Anderson colorectal cohorts to build and validate our models, we demonstrate our strategy better explains expression variability than current commonly used gene-level methylation summaries. The methyl-eQTLs identified by our approach can be used to construct gene-level methylation summaries that are maximally correlated with gene expression for use in integrative models, and produce a tissue-specific summary of which genes appear to be strongly regulated by methylation. Our results introduce an important resource to the biomedical community for integrative genomics analyses involving DNA methylation. AVAILABILITY AND IMPLEMENTATION We produce an R Shiny app (https://rstudio-prd-c1.pmacs.upenn.edu/methyl-eQTL/) that interactively presents methyl-eQTL results for colorectal, breast, and pancreatic cancer. The source R code for this work is provided in the supplement. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yusha Liu
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| | - Keith A Baggerly
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Elias Orouji
- Department of Genomic Medicine, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Ganiraju Manyam
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Huiqin Chen
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Michael Lam
- Department of Gastrointestinal Medical Oncology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Jennifer S Davis
- Department of Epidemiology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Michael S Lee
- Department of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bradley M Broom
- Department of Bioinformatics and Computational Biology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - David G Menter
- Department of Gastrointestinal Medical Oncology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Kunal Rai
- Department of Genomic Medicine, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Scott Kopetz
- Department of Gastrointestinal Medical Oncology, M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, The University of Pennsylvania, Philadelphia, PA 19104-6021, USA
| |
Collapse
|
14
|
Li Y, Datta J, Craig BA, Bhadra A. Joint mean–covariance estimation via the horseshoe. J MULTIVARIATE ANAL 2021. [DOI: 10.1016/j.jmva.2020.104716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
15
|
|
16
|
Alexopoulos A, Bottolo L. Bayesian Variable Selection for Gaussian Copula Regression Models. J Comput Graph Stat 2020; 30:578-593. [PMID: 37051045 PMCID: PMC7614421 DOI: 10.1080/10618600.2020.1840997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We develop a novel Bayesian method to select important predictors in regression models with multiple responses of diverse types. A sparse Gaussian copula regression model is used to account for the multivariate dependencies between any combination of discrete and/or continuous responses and their association with a set of predictors. We utilize the parameter expansion for data augmentation strategy to construct a Markov chain Monte Carlo algorithm for the estimation of the parameters and the latent variables of the model. Based on a centered parametrization of the Gaussian latent variables, we design a fixed-dimensional proposal distribution to update jointly the latent binary vectors of important predictors and the corresponding non-zero regression coefficients. For Gaussian responses and for outcomes that can be modeled as a dependent version of a Gaussian response, this proposal leads to a Metropolis-Hastings step that allows an efficient exploration of the predictors' model space. The proposed strategy is tested on simulated data and applied to real data sets in which the responses consist of low-intensity counts, binary, ordinal and continuous variables.
Collapse
Affiliation(s)
| | - Leonardo Bottolo
- Department of Medical Genetics, University of Cambridge, UK
- The Alan Turing Institute, London, UK
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
17
|
Affiliation(s)
- Edward I. George
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | | |
Collapse
|
18
|
Bai R, Moran GE, Antonelli JL, Chen Y, Boland MR. Spike-and-Slab Group Lassos for Grouped Regression and Sparse Generalized Additive Models. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1765784] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Ray Bai
- Department of Statistics, University of South Carolina, Columbia, SC
| | - Gemma E. Moran
- Data Science Institute, Columbia University, New York, NY
| | | | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA
| | - Mary R. Boland
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
19
|
Cao X, Ding L, Mersha TB. Joint variable selection and network modeling for detecting eQTLs. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0032/sagmb-2019-0032.xml. [PMID: 32078577 DOI: 10.1515/sagmb-2019-0032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this study, we conduct a comparison of three most recent statistical methods for joint variable selection and covariance estimation with application of detecting expression quantitative trait loci (eQTL) and gene network estimation, and introduce a new hierarchical Bayesian method to be included in the comparison. Unlike the traditional univariate regression approach in eQTL, all four methods correlate phenotypes and genotypes by multivariate regression models that incorporate the dependence information among phenotypes, and use Bayesian multiplicity adjustment to avoid multiple testing burdens raised by traditional multiple testing correction methods. We presented the performance of three methods (MSSL - Multivariate Spike and Slab Lasso, SSUR - Sparse Seemingly Unrelated Bayesian Regression, and OBFBF - Objective Bayes Fractional Bayes Factor), along with the proposed, JDAG (Joint estimation via a Gaussian Directed Acyclic Graph model) method through simulation experiments, and publicly available HapMap real data, taking asthma as an example. Compared with existing methods, JDAG identified networks with higher sensitivity and specificity under row-wise sparse settings. JDAG requires less execution in small-to-moderate dimensions, but is not currently applicable to high dimensional data. The eQTL analysis in asthma data showed a number of known gene regulations such as STARD3, IKZF3 and PGAP3, all reported in asthma studies. The code of the proposed method is freely available at GitHub (https://github.com/xuan-cao/Joint-estimation-for-eQTL).
Collapse
Affiliation(s)
- Xuan Cao
- Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH45221,USA
| | - Lili Ding
- Division of Biostatistics and Epidemiology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH45229,USA
| | - Tesfaye B Mersha
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH45229,USA
| |
Collapse
|
20
|
Bhadra A, Datta J, Li Y, Polson N. Horseshoe Regularisation for Machine Learning in Complex and Deep Models
1. Int Stat Rev 2020. [DOI: 10.1111/insr.12360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Anindya Bhadra
- Department of Statistics Purdue University 250 N. University St. West Lafayette IN 47907 USA
| | - Jyotishka Datta
- Department of Mathematical Sciences University of Arkansas Fayetteville AR 72704 USA
| | - Yunfan Li
- Department of Statistics Purdue University 250 N. University St. West Lafayette IN 47907 USA
| | - Nicholas Polson
- Booth School of Business University of Chicago, 5807 S. Woodlawn Ave. Chicago IL 60637 USA
| |
Collapse
|
21
|
Li ZR, McCormick TH. An Expectation Conditional Maximization approach for Gaussian graphical models. J Comput Graph Stat 2019; 28:767-777. [PMID: 33033426 PMCID: PMC7540244 DOI: 10.1080/10618600.2019.1609976] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2017] [Revised: 04/02/2019] [Accepted: 04/09/2019] [Indexed: 10/26/2022]
Abstract
Bayesian graphical models are a useful tool for understanding dependence relationships among many variables, particularly in situations with external prior information. In high-dimensional settings, the space of possible graphs becomes enormous, rendering even state-of-the-art Bayesian stochastic search computationally infeasible. We propose a deterministic alternative to estimate Gaussian and Gaussian copula graphical models using an Expectation Conditional Maximization (ECM) algorithm, extending the EM approach from Bayesian variable selection to graphical model estimation. We show that the ECM approach enables fast posterior exploration under a sequence of mixture priors, and can incorporate multiple sources of information.
Collapse
|
22
|
Li Y, Craig BA, Bhadra A. The Graphical Horseshoe Estimator for Inverse Covariance Matrices. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1575744] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Yunfan Li
- Department of Statistics, Purdue University, West Lafayette, IN
| | - Bruce A. Craig
- Department of Statistics, Purdue University, West Lafayette, IN
| | - Anindya Bhadra
- Department of Statistics, Purdue University, West Lafayette, IN
| |
Collapse
|