1
|
Park HG. Bayesian estimation of covariate assisted principal regression for brain functional connectivity. Biostatistics 2024:kxae023. [PMID: 38981041 DOI: 10.1093/biostatistics/kxae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 03/25/2024] [Accepted: 06/02/2024] [Indexed: 07/11/2024] Open
Abstract
This paper presents a Bayesian reformulation of covariate-assisted principal regression for covariance matrix outcomes to identify low-dimensional components in the covariance associated with covariates. By introducing a geometric approach to the covariance matrices and leveraging Euclidean geometry, we estimate dimension reduction parameters and model covariance heterogeneity based on covariates. This method enables joint estimation and uncertainty quantification of relevant model parameters associated with heteroscedasticity. We demonstrate our approach through simulation studies and apply it to analyze associations between covariates and brain functional connectivity using data from the Human Connectome Project.
Collapse
Affiliation(s)
- Hyung G Park
- Division of Biostatistics, Department of Population Health, New York University Grossman School of Medicine, 180 Madison Ave., New York, NY 10016, USA
| |
Collapse
|
2
|
Gan D, Yin G, Zhang YD. The GR2D2 estimator for the precision matrices. Brief Bioinform 2022; 23:6731716. [PMID: 36184191 DOI: 10.1093/bib/bbac426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/31/2022] [Accepted: 09/02/2022] [Indexed: 12/14/2022] Open
Abstract
Biological networks are important for the analysis of human diseases, which summarize the regulatory interactions and other relationships between different molecules. Understanding and constructing networks for molecules, such as DNA, RNA and proteins, can help elucidate the mechanisms of complex biological systems. The Gaussian Graphical Models (GGMs) are popular tools for the estimation of biological networks. Nonetheless, reconstructing GGMs from high-dimensional datasets is still challenging. The current methods cannot handle the sparsity and high-dimensionality issues arising from datasets very well. Here, we developed a new GGM, called the GR2D2 (Graphical $R^2$-induced Dirichlet Decomposition) model, based on the R2D2 priors for linear models. Besides, we provided a data-augmented block Gibbs sampler algorithm. The R code is available at https://github.com/RavenGan/GR2D2. The GR2D2 estimator shows superior performance in estimating the precision matrices compared with the existing techniques in various simulation settings. When the true precision matrix is sparse and of high dimension, the GR2D2 provides the estimates with smallest information divergence from the underlying truth. We also compare the GR2D2 estimator with the graphical horseshoe estimator in five cancer RNA-seq gene expression datasets grouped by three cancer types. Our results show that GR2D2 successfully identifies common cancer pathways and cancer-specific pathways for each dataset.
Collapse
Affiliation(s)
- Dailin Gan
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China.,Centre for PanorOmic Sciences, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
3
|
Münch MM, van de Wiel MA, van der Vaart AW, Peeters CFW. Semi-supervised empirical Bayes group-regularized factor regression. Biom J 2022; 64:1289-1306. [PMID: 35730912 PMCID: PMC9796498 DOI: 10.1002/bimj.202100105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 03/16/2022] [Accepted: 03/20/2022] [Indexed: 01/01/2023]
Abstract
The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications. Examples are annotation of genes, metabolites, or p-values from a previous study. We employ a Bayesian factor regression model that jointly models the features and the outcome using Gaussian latent variables. We fit the model using a computationally efficient variational Bayes method, which scales to high dimensions. We use the extra information to set up a prior model for the features in terms of hyperparameters, which are then estimated through empirical Bayes. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predicts oral cancer metastasis from RNAseq data.
Collapse
Affiliation(s)
- Magnus M. Münch
- Department of Epidemiology & Data ScienceAmsterdam UMCAmsterdamThe Netherlands,Mathematical InstituteLeiden UniversityLeidenThe Netherlands
| | - Mark A. van de Wiel
- Department of Epidemiology & Data ScienceAmsterdam UMCAmsterdamThe Netherlands,MRC Biostatistics UnitCambridge Institute of Public HealthCambridgeUK
| | | | - Carel F. W. Peeters
- Department of Epidemiology & Data ScienceAmsterdam UMCAmsterdamThe Netherlands,Mathematical & Statistical Methods Group (Biometris)Wageningen University & ResearchWageningenThe Netherlands
| |
Collapse
|
4
|
Jiang R, Singh P, Wrede F, Hellander A, Petzold L. Identification of dynamic mass-action biochemical reaction networks using sparse Bayesian methods. PLoS Comput Biol 2022; 18:e1009830. [PMID: 35100263 PMCID: PMC8830701 DOI: 10.1371/journal.pcbi.1009830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 02/10/2022] [Accepted: 01/12/2022] [Indexed: 11/18/2022] Open
Abstract
Identifying the reactions that govern a dynamical biological system is a crucial but challenging task in systems biology. In this work, we present a data-driven method to infer the underlying biochemical reaction system governing a set of observed species concentrations over time. We formulate the problem as a regression over a large, but limited, mass-action constrained reaction space and utilize sparse Bayesian inference via the regularized horseshoe prior to produce robust, interpretable biochemical reaction networks, along with uncertainty estimates of parameters. The resulting systems of chemical reactions and posteriors inform the biologist of potentially several reaction systems that can be further investigated. We demonstrate the method on two examples of recovering the dynamics of an unknown reaction system, to illustrate the benefits of improved accuracy and information obtained.
Collapse
Affiliation(s)
- Richard Jiang
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, California, United States of America
- * E-mail:
| | - Prashant Singh
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Fredrik Wrede
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Linda Petzold
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, California, United States of America
| |
Collapse
|
5
|
Veerman JR, Leday GGR, van de Wiel MA. Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2019.1646760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Jurre R. Veerman
- Departement of Epidemiology & Biostatistics, Amsterdam Public Health research institute, Amsterdam University medical centers, Amsterdam, The Netherlands
- Mathematical Institute, Leiden University, Leiden, the Netherlands
| | | | - Mark A. van de Wiel
- Departement of Epidemiology & Biostatistics, Amsterdam Public Health research institute, Amsterdam University medical centers, Amsterdam, The Netherlands
- MRC Biostatistics Unit, Cambridge University, Cambridge, UK
| |
Collapse
|
6
|
Münch MM, Peeters CFW, Van Der Vaart AW, Van De Wiel MA. Adaptive group-regularized logistic elastic net regression. Biostatistics 2021; 22:723-737. [PMID: 31886488 PMCID: PMC8596493 DOI: 10.1093/biostatistics/kxz062] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 12/04/2019] [Accepted: 12/05/2019] [Indexed: 12/27/2022] Open
Abstract
In high-dimensional data settings, additional information on the features is often
available. Examples of such external information in omics research are: (i)
\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$p$\end{document}-values from a previous study and (ii) omics
annotation. The inclusion of this information in the analysis may enhance classification
performance and feature selection but is not straightforward. We propose a
group-regularized (logistic) elastic net regression method, where each penalty parameter
corresponds to a group of features based on the external information. The method, termed
gren, makes use of the Bayesian formulation of logistic elastic
net regression to estimate both the model and penalty parameters in an approximate
empirical–variational Bayes framework. Simulations and applications to three cancer
genomics studies and one Alzheimer metabolomics study show that, if the partitioning of
the features is informative, classification performance, and feature selection are indeed
enhanced.
Collapse
Affiliation(s)
- Magnus M Münch
- Department of Epidemiology & Biostatistics, Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, PO Box 7057, 1007 MB Amsterdam, The Netherlands and Mathematical Institute, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
| | - Carel F W Peeters
- Department of Epidemiology & Biostatistics, Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, PO Box 7057, 1007 MB Amsterdam, The Netherlands
| | - Aad W Van Der Vaart
- Mathematical Institute, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
| | - Mark A Van De Wiel
- Department of Epidemiology & Biostatistics, Amsterdam Public Health Research Institute, Amsterdam University Medical Centers, PO Box 7057, 1007 MB Amsterdam, The Netherlands and MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK
| |
Collapse
|
7
|
Münch MM, van de Wiel MA, Richardson S, Leday GGR. Drug sensitivity prediction with normal inverse Gaussian shrinkage informed by external data. Biom J 2020; 63:289-304. [PMID: 33155717 PMCID: PMC7891636 DOI: 10.1002/bimj.201900371] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 04/30/2020] [Accepted: 06/03/2020] [Indexed: 11/09/2022]
Abstract
In precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods. In addition, external information on both the drugs and the features is often available. We propose to model the drug responses through a linear regression with shrinkage enforced through a normal inverse Gaussian prior. We let the prior depend on the external information, and estimate the model and external information dependence in an empirical-variational Bayes framework. We demonstrate the usefulness of this model in both a simulated setting and in the publicly available Genomics of Drug Sensitivity in Cancer data.
Collapse
Affiliation(s)
- Magnus M Münch
- Department of Epidemiology & Biostatistics, Amsterdam UMC, VU University, Amsterdam, The Netherlands.,Mathematical Institute, Leiden University, Leiden, The Netherlands.,MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Mark A van de Wiel
- Department of Epidemiology & Biostatistics, Amsterdam UMC, VU University, Amsterdam, The Netherlands.,MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Gwenaël G R Leday
- MRC Biostatistics Unit, University of Cambridge, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, United Kingdom
| |
Collapse
|
8
|
Effects of repetitive Iodine thyroid blocking on the foetal brain and thyroid in rats: a systems biology approach. Sci Rep 2020; 10:10839. [PMID: 32616734 PMCID: PMC7331645 DOI: 10.1038/s41598-020-67564-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 06/03/2020] [Indexed: 12/20/2022] Open
Abstract
A single administration of an iodine thyroid blocking agent is usually sufficient to protect thyroid from radioactive iodine and prevent thyroid cancer. Repeated administration of stable iodine (rKI) may be necessary during prolonged or repeated exposure to radioactive iodine. We previously showed that rKI for eight days offers protection without toxic effects in adult rats. However, the effect of rKI administration in the developing foetus is unknown, especially on brain development, although a correlation between impaired maternal thyroid status and a decrease in intelligence quotient of the progeny has been observed. This study revealed distinct gene expression profiles between the progeny of rats receiving either rKI or saline during pregnancy. To understand the implication of these differentially expressed (DE) genes, a systems biology approach was used to construct networks for each organ using three different techniques: Bayesian statistics, sPLS-DA and manual construction of a Process Descriptive (PD) network. The PD network showed DE genes from both organs participating in the same cellular processes that affect mitophagy and neuronal outgrowth. This work may help to evaluate the doctrine for using rKI in case of repetitive or prolonged exposure to radioactive particles upon nuclear accidents.
Collapse
|
9
|
Panchal V, Linder DF. Reverse engineering gene networks using global-local shrinkage rules. Interface Focus 2019; 10:20190049. [PMID: 31897291 DOI: 10.1098/rsfs.2019.0049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/13/2019] [Indexed: 12/26/2022] Open
Abstract
Inferring gene regulatory networks from high-throughput 'omics' data has proven to be a computationally demanding task of critical importance. Frequently, the classical methods break down owing to the curse of dimensionality, and popular strategies to overcome this are typically based on regularized versions of the classical methods. However, these approaches rely on loss functions that may not be robust and usually do not allow for the incorporation of prior information in a straightforward way. Fully Bayesian methods are equipped to handle both of these shortcomings quite naturally, and they offer the potential for improvements in network structure learning. We propose a Bayesian hierarchical model to reconstruct gene regulatory networks from time-series gene expression data, such as those common in perturbation experiments of biological systems. The proposed methodology uses global-local shrinkage priors for posterior selection of regulatory edges and relaxes the common normal likelihood assumption in order to allow for heavy-tailed data, which were shown in several of the cited references to severely impact network inference. We provide a sufficient condition for posterior propriety and derive an efficient Markov chain Monte Carlo via Gibbs sampling in the electronic supplementary material. We describe a novel way to detect multiple scales based on the corresponding posterior quantities. Finally, we demonstrate the performance of our approach in a simulation study and compare it with existing methods on real data from a T-cell activation study.
Collapse
Affiliation(s)
- Viral Panchal
- Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC 28403, USA
| | - Daniel F Linder
- Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| |
Collapse
|
10
|
|
11
|
Affiliation(s)
| | - Peter D. Hoff
- Department of Statistical Science, Duke University, Durham, NC
| |
Collapse
|
12
|
Leday GGR, Richardson S. Fast Bayesian inference in large Gaussian graphical models. Biometrics 2019; 75:1288-1298. [PMID: 31009060 PMCID: PMC6916355 DOI: 10.1111/biom.13064] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 03/27/2019] [Accepted: 03/28/2019] [Indexed: 11/27/2022]
Abstract
Despite major methodological developments, Bayesian inference in Gaussian graphical models remains challenging in high dimension due to the tremendous size of the model space. This article proposes a method to infer the marginal and conditional independence structures between variables by multiple testing, which bypasses the exploration of the model space. Specifically, we introduce closed-form Bayes factors under the Gaussian conjugate model to evaluate the null hypotheses of marginal and conditional independence between variables. Their computation for all pairs of variables is shown to be extremely efficient, thereby allowing us to address large problems with thousands of nodes as required by modern applications. Moreover, we derive exact tail probabilities from the null distributions of the Bayes factors. These allow the use of any multiplicity correction procedure to control error rates for incorrect edge inclusion. We demonstrate the proposed approach on various simulated examples as well as on a large gene expression data set from The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Gwenaël G R Leday
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Sylvia Richardson
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
13
|
van de Wiel MA, Te Beest DE, Münch MM. Learning from a lot: Empirical Bayes for high-dimensional model-based prediction. Scand Stat Theory Appl 2019; 46:2-25. [PMID: 31007342 PMCID: PMC6472625 DOI: 10.1111/sjos.12335] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 01/24/2018] [Accepted: 03/22/2018] [Indexed: 12/21/2022]
Abstract
Empirical Bayes is a versatile approach to "learn from a lot" in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss "formal" empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed "co-data". In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes-full Bayes ridge regression approach for estimation of the posterior predictive interval.
Collapse
Affiliation(s)
- Mark A. van de Wiel
- Department of Epidemiology and Biostatistics, Amsterdam Public Health Research InstituteVU University Medical CenterAmsterdamThe Netherlands
- Department of MathematicsVU UniversityAmsterdamThe Netherlands
| | - Dennis E. Te Beest
- Department of Epidemiology and Biostatistics, Amsterdam Public Health Research InstituteVU University Medical CenterAmsterdamThe Netherlands
| | - Magnus M. Münch
- Department of Epidemiology and Biostatistics, Amsterdam Public Health Research InstituteVU University Medical CenterAmsterdamThe Netherlands
- Mathematical Institute, Faculty of ScienceLeiden UniversityLeidenThe Netherlands
| |
Collapse
|
14
|
Kpogbezan GB, van der Vaart AW, van Wieringen WN, Leday GGR, van de Wiel MA. An empirical Bayes approach to network recovery using external knowledge. Biom J 2017; 59:932-947. [PMID: 28393396 DOI: 10.1002/bimj.201600090] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 11/22/2016] [Accepted: 12/04/2016] [Indexed: 11/12/2022]
Abstract
Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure that automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study, we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data.
Collapse
Affiliation(s)
- Gino B Kpogbezan
- Department of Mathematics, University of Leiden, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands
| | - Aad W van der Vaart
- Department of Mathematics, University of Leiden, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands
| | - Wessel N van Wieringen
- Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081, HV Amsterdam, The Netherlands.,Department of Epidemiology and Biostatistics, VU University Medical Center, 1007, MB, Amsterdam, The Netherlands
| | - Gwenaël G R Leday
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Forvie Site, Cambridge, CB2 0SR, United Kingdom
| | - Mark A van de Wiel
- Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081, HV Amsterdam, The Netherlands.,Department of Epidemiology and Biostatistics, VU University Medical Center, 1007, MB, Amsterdam, The Netherlands
| |
Collapse
|
15
|
Leday GGR, de Gunst MCM, Kpogbezan GB, van der Vaart AW, van Wieringen WN, van de Wiel MA. Gene Network Reconstruction using Global-Local Shrinkage Priors. Ann Appl Stat 2017; 11:41-68. [PMID: 28408966 DOI: 10.1214/16-aoas990] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Reconstructing a gene network from high-throughput molecular data is an important but challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighbourhood of each node or gene. However, estimation of the many regularization parameters is often difficult and can result in large statistical uncertainties. In this paper we propose to combine local regularization with global shrinkage of the regularization parameters to borrow strength between genes and improve inference. We employ a simple Bayesian model with non-sparse, conjugate priors to facilitate the use of fast variational approximations to posteriors. We discuss empirical Bayes estimation of hyper-parameters of the priors, and propose a novel approach to rank-based posterior thresholding. Using extensive model- and data-based simulations, we demonstrate that the proposed inference strategy outperforms popular (sparse) methods, yields more stable edges, and is more reproducible. The proposed method, termed ShrinkNet, is then applied to Glioblastoma to investigate the interactions between genes associated with patient survival.
Collapse
Affiliation(s)
- Gwenaël G R Leday
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge CB2 0SR, United Kingdom
| | - Mathisca C M de Gunst
- Vrije Universiteit Amsterdam, Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands
| | - Gino B Kpogbezan
- Leiden University, Mathematical Institute, Faculty of Science, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
| | - Aad W van der Vaart
- VU University Medical Center, Department of Epidemiology and Biostatistics, VU University Medical Center, PO Box 7057, 1007 MB Amsterdam, The Netherlands
| | - Wessel N van Wieringen
- Vrije Universiteit Amsterdam, Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands.,VU University Medical Center, Department of Epidemiology and Biostatistics, VU University Medical Center, PO Box 7057, 1007 MB Amsterdam, The Netherlands
| | - Mark A van de Wiel
- Vrije Universiteit Amsterdam, Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands.,VU University Medical Center, Department of Epidemiology and Biostatistics, VU University Medical Center, PO Box 7057, 1007 MB Amsterdam, The Netherlands
| |
Collapse
|