1
|
Oh J, Chang C, Long Q. Accounting for technical noise in Bayesian graphical models of single-cell RNA-sequencing data. Biostatistics 2022; 24:161-176. [PMID: 34520533 DOI: 10.1093/biostatistics/kxab011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 03/16/2021] [Accepted: 03/17/2021] [Indexed: 12/16/2022] Open
Abstract
Single-cell RNA-sequencing (scRNAseq) data contain a high level of noise, especially in the form of zero-inflation, that is, the presence of an excessively large number of zeros. This is largely due to dropout events and amplification biases that occur in the preparation stage of single-cell experiments. Recent scRNAseq experiments have been augmented with unique molecular identifiers (UMI) and External RNA Control Consortium (ERCC) molecules which can be used to account for zero-inflation. However, most of the current methods on graphical models are developed under the assumption of the multivariate Gaussian distribution or its variants, and thus they are not able to adequately account for an excessively large number of zeros in scRNAseq data. In this article, we propose a single-cell latent graphical model (scLGM)-a Bayesian hierarchical model for estimating the conditional dependency network among genes using scRNAseq data. Taking advantage of UMI and ERCC data, scLGM explicitly models the two sources of zero-inflation. Our simulation study and real data analysis demonstrate that the proposed approach outperforms several existing methods.
Collapse
Affiliation(s)
- Jihwan Oh
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvannia, 423 Guardian Drive, Philadelphia, PA 19104, USA
| | - Changgee Chang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvannia, 423 Guardian Drive, Philadelphia, PA 19104, USA
| | - Qi Long
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvannia, 423 Guardian Drive, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Molinari M, Cremaschi A, De Iorio M, Chaturvedi N, Hughes A, Tillin T. Bayesian dynamic network modelling: an application to metabolic associations in cardiovascular diseases. J Appl Stat 2022; 51:114-138. [PMID: 38179161 PMCID: PMC10763914 DOI: 10.1080/02664763.2022.2116746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 08/14/2022] [Indexed: 10/14/2022]
Abstract
We propose a novel approach to the estimation of multiple Graphical Models to analyse temporal patterns of association among a set of metabolites over different groups of patients. Our motivating application is the Southall And Brent REvisited (SABRE) study, a tri-ethnic cohort study conducted in the UK. We are interested in identifying potential ethnic differences in metabolite levels and associations as well as their evolution over time, with the aim of gaining a better understanding of different risk of cardio-metabolic disorders across ethnicities. Within a Bayesian framework, we employ a nodewise regression approach to infer the structure of the graphs, borrowing information across time as well as across ethnicities. The response variables of interest are metabolite levels measured at two time points and for two ethnic groups, Europeans and South-Asians. We use nodewise regression to estimate the high-dimensional precision matrices of the metabolites, imposing sparsity on the regression coefficients through the dynamic horseshoe prior, thus favouring sparser graphs. We provide the code to fit the proposed model using the software Stan, which performs posterior inference using Hamiltonian Monte Carlo sampling, as well as a detailed description of a block Gibbs sampling scheme.
Collapse
Affiliation(s)
- Marco Molinari
- Department of Statistical Science, University College, London, London, UK
| | | | - Maria De Iorio
- Department of Statistical Science, University College, London, London, UK
- Singapore Institute for Clinical Sciences, A*STAR, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Nishi Chaturvedi
- Department of Population Science and Experimental Medicine, University College London, London, UK
| | - Alun Hughes
- Department of Population Science and Experimental Medicine, University College London, London, UK
| | - Therese Tillin
- Department of Population Science and Experimental Medicine, University College London, London, UK
| |
Collapse
|
3
|
Joint learning of multiple Granger causal networks via non-convex regularizations: Inference of group-level brain connectivity. Neural Netw 2022; 149:157-171. [DOI: 10.1016/j.neunet.2022.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 01/09/2022] [Accepted: 02/06/2022] [Indexed: 11/23/2022]
|
4
|
Jin J, Wang Y. T2-DAG: a powerful test for differentially expressed gene pathways via graph-informed structural equation modeling. Bioinformatics 2022; 38:1005-1014. [PMID: 34755844 PMCID: PMC8796375 DOI: 10.1093/bioinformatics/btab770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/01/2021] [Accepted: 11/04/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION A major task in genetic studies is to identify genes related to human diseases and traits to understand functional characteristics of genetic mutations and enhance patient diagnosis. Compared with marginal analyses of individual genes, identification of gene pathways, i.e. a set of genes with known interactions that collectively contribute to specific biological functions, can provide more biologically meaningful results. Such gene pathway analysis can be formulated into a high-dimensional two-sample testing problem. Given the typically limited sample size of gene expression datasets, most existing two-sample tests tend to have compromised powers because they ignore or only inefficiently incorporate the auxiliary pathway information on gene interactions. RESULTS We propose T2-DAG, a Hotelling's T2-type test for detecting differentially expressed gene pathways, which efficiently leverages the auxiliary pathway information on gene interactions from existing pathway databases through a linear structural equation model. We further establish its asymptotic distribution under pertinent assumptions. Simulation studies under various scenarios show that T2-DAG outperforms several representative existing methods with well-controlled type-I error rates and substantially improved powers, even with incomplete or inaccurate pathway information or unadjusted confounding effects. We also illustrate the performance of T2-DAG in an application to detect differentially expressed KEGG pathways between different stages of lung cancer. AVAILABILITY AND IMPLEMENTATION The R (R Development Core Team, 2021) package T2DAG which implements the proposed T2-DAG test is available on Github at https://github.com/Jin93/T2DAG. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Jin
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Yue Wang
- School of Mathematical and Natural Sciences, Arizona State University, Glendale, AZ 85306, USA
| |
Collapse
|
5
|
Wang Z, Kaseb AO, Amin HM, Hassan MM, Wang W, Morris JS. Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer. J Am Stat Assoc 2022; 117:533-546. [PMID: 36090952 PMCID: PMC9454401 DOI: 10.1080/01621459.2021.2000866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/13/2021] [Accepted: 10/24/2021] [Indexed: 10/19/2022]
Abstract
It is well-established that interpatient heterogeneity in cancer may significantly affect genomic data analyses and in particular, network topologies. Most existing graphical model methods estimate a single population-level graph for genomic or proteomic network. In many investigations, these networks depend on patient-specific indicators that characterize the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. We evaluate our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity. In application to a dataset of proteomic measurements on plasma samples from patients with hepatocellular carcinoma (HCC), we ascertain how blood protein networks vary with disease severity, as measured by HepatoScore, a novel biomarker signature measuring disease severity. Our case study shows that the network connectivity increases with HepatoScore and a set of hub genes as well as important gene connections are identified under different HepatoScore, which may provide important biological insights to the development of precision therapies for HCC.
Collapse
Affiliation(s)
- Zeya Wang
- Department of Statistics, Rice University; Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Veerabhadran Baladandayuthapani; Department of Biostatistics, University of Michigan
| | - Ahmed O Kaseb
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center
| | - Hesham M Amin
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center
| | - Manal M Hassan
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| |
Collapse
|
6
|
Lee KY, Ji D, Li L, Constable T, Zhao H. Conditional Functional Graphical Models. J Am Stat Assoc 2021; 118:257-271. [PMID: 37193511 PMCID: PMC10181795 DOI: 10.1080/01621459.2021.1924178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 03/01/2021] [Accepted: 04/22/2021] [Indexed: 10/21/2022]
Abstract
Graphical modeling of multivariate functional data is becoming increasingly important in a wide variety of applications. The changes of graph structure can often be attributed to external variables, such as the diagnosis status or time, the latter of which gives rise to the problem of dynamic graphical modeling. Most existing methods focus on estimating the graph by aggregating samples, but largely ignore the subject-level heterogeneity due to the external variables. In this article, we introduce a conditional graphical model for multivariate random functions, where we treat the external variables as conditioning set, and allow the graph structure to vary with the external variables. Our method is built on two new linear operators, the conditional precision operator and the conditional partial correlation operator, which extend the precision matrix and the partial correlation matrix to both the conditional and functional settings. We show that their nonzero elements can be used to characterize the conditional graphs, and develop the corresponding estimators. We establish the uniform convergence of the proposed estimators and the consistency of the estimated graph, while allowing the graph size to grow with the sample size, and accommodating both completely and partially observed data. We demonstrate the efficacy of the method through both simulations and a study of brain functional connectivity network.
Collapse
Affiliation(s)
- Kuang-Yao Lee
- Department of Statistical Science, Temple University, Philadelphia, PA
| | - Dingjue Ji
- Department of Biostatistics, Yale University, New Haven, CT
| | - Lexin Li
- Division of Biostatistics, University of California, Berkeley, CA
| | - Todd Constable
- Department of Biostatistics, Yale University, New Haven, CT
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT
| |
Collapse
|
7
|
Jia B, Liang F. Fast hybrid Bayesian integrative learning of multiple gene regulatory networks for type 1 diabetes. Biostatistics 2021; 22:233-249. [PMID: 33838043 PMCID: PMC8035990 DOI: 10.1093/biostatistics/kxz027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 06/01/2019] [Accepted: 06/23/2019] [Indexed: 11/12/2022] Open
Abstract
Motivated by the study of the molecular mechanism underlying type 1 diabetes with gene expression data collected from both patients and healthy controls at multiple time points, we propose a hybrid Bayesian method for jointly estimating multiple dependent Gaussian graphical models with data observed under distinct conditions, which avoids inversion of high-dimensional covariance matrices and thus can be executed very fast. We prove the consistency of the proposed method under mild conditions. The numerical results indicate the superiority of the proposed method over existing ones in both estimation accuracy and computational efficiency. Extension of the proposed method to joint estimation of multiple mixed graphical models is straightforward.
Collapse
Affiliation(s)
- Bochao Jia
- Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN, USA
| | - Faming Liang
- Department of Statistics, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
8
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
9
|
Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019; 10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open
Abstract
The advent of large-scale microbiome studies affords newfound analytical opportunities to understand how these communities of microbes operate and relate to their environment. However, the analytical methodology needed to model microbiome data and integrate them with other data constructs remains nascent. This emergent analytical toolset frequently ports over techniques developed in other multi-omics investigations, especially the growing array of statistical and computational techniques for integrating and representing data through networks. While network analysis has emerged as a powerful approach to modeling microbiome data, oftentimes by integrating these data with other types of omics data to discern their functional linkages, it is not always evident if the statistical details of the approach being applied are consistent with the assumptions of microbiome data or how they impact data interpretation. In this review, we overview some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data. We compare advantages and disadvantages of various statistical tools, assess their applicability to microbiome data, and discuss their biological interpretability. We also highlight on-going statistical challenges and opportunities for integrative network analysis of microbiome data.
Collapse
Affiliation(s)
- Duo Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Courtney R Armour
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Chenxiao Hu
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Meng Mei
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Chuan Tian
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Statistics, Oregon State University, Corvallis, OR, United States
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
10
|
Abbas-Aghababazadeh F, Mo Q, Fridley BL. Statistical genomics in rare cancer. Semin Cancer Biol 2019; 61:1-10. [PMID: 31437624 DOI: 10.1016/j.semcancer.2019.08.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/14/2019] [Accepted: 08/17/2019] [Indexed: 12/26/2022]
Abstract
Rare cancers make of more than 20% of cancer cases. Due to the rare nature, less research has been conducted on rare cancers resulting in worse outcomes for patients with rare cancers compared to common cancers. The ability to study rare cancers is impaired by the ability to collect a large enough set of patients to complete an adequately powered genomic study. In this manuscript we outline analytical approaches and public genomic datasets that have been used in genomic studies of rare cancers. These statistical analysis approaches and study designs include: gene set / pathway analyses, pedigree and consortium studies, meta-analysis or horizontal integration, and integration of multiple types of genomic information or vertical integration. We also discuss some of the publicly available resources that can be leveraged in rare cancer genomic studies.
Collapse
Affiliation(s)
| | - Qianxing Mo
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, 33612, USA.
| | - Brooke L Fridley
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, 33612, USA.
| |
Collapse
|
11
|
Luo X, Wei Y. Nonparametric Bayesian learning of heterogeneous dynamic transcription factor networks. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1129] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Tan LSL, Jasra A, De Iorio M, Ebbels TMD. Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks. Ann Appl Stat 2017. [DOI: 10.1214/17-aoas1076] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Gibberd AJ, Nelson JDB. Regularized Estimation of Piecewise Constant Gaussian Graphical Models: The Group-Fused Graphical Lasso. J Comput Graph Stat 2017. [DOI: 10.1080/10618600.2017.1302340] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Alexander J. Gibberd
- Department of Statistical Science, University College London, Bloomsbury, London, United Kingdom
| | - James D. B. Nelson
- Department of Statistical Science, University College London, Bloomsbury, London, United Kingdom
| |
Collapse
|
14
|
Lin Z, Wang T, Yang C, Zhao H. On joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics 2017; 73:769-779. [PMID: 28099997 DOI: 10.1111/biom.12650] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 10/01/2016] [Accepted: 12/01/2016] [Indexed: 11/29/2022]
Abstract
In this article, we first propose a Bayesian neighborhood selection method to estimate Gaussian Graphical Models (GGMs). We show the graph selection consistency of this method in the sense that the posterior probability of the true model converges to one. When there are multiple groups of data available, instead of estimating the networks independently for each group, joint estimation of the networks may utilize the shared information among groups and lead to improved estimation for each individual network. Our method is extended to jointly estimate GGMs in multiple groups of data with complex structures, including spatial data, temporal data, and data with both spatial and temporal structures. Markov random field (MRF) models are used to efficiently incorporate the complex data structures. We develop and implement an efficient algorithm for statistical inference that enables parallel computing. Simulation studies suggest that our approach achieves better accuracy in network estimation compared with methods not incorporating spatial and temporal dependencies when there are shared structures among the networks, and that it performs comparably well otherwise. Finally, we illustrate our method using the human brain gene expression microarray dataset, where the expression levels of genes are measured in different brain regions across multiple time periods.
Collapse
Affiliation(s)
- Zhixiang Lin
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, U.S.A.,Department of Statistics, Stanford University, Stanford, California, U.S.A
| | - Tao Wang
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Can Yang
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, U.S.A
| |
Collapse
|
15
|
Differential network analysis from cross-platform gene expression data. Sci Rep 2016; 6:34112. [PMID: 27677586 PMCID: PMC5039701 DOI: 10.1038/srep34112] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 09/07/2016] [Indexed: 01/18/2023] Open
Abstract
Understanding how the structure of gene dependency network changes between two patient-specific groups is an important task for genomic research. Although many computational approaches have been proposed to undertake this task, most of them estimate correlation networks from group-specific gene expression data independently without considering the common structure shared between different groups. In addition, with the development of high-throughput technologies, we can collect gene expression profiles of same patients from multiple platforms. Therefore, inferring differential networks by considering cross-platform gene expression profiles will improve the reliability of network inference. We introduce a two dimensional joint graphical lasso (TDJGL) model to simultaneously estimate group-specific gene dependency networks from gene expression profiles collected from different platforms and infer differential networks. TDJGL can borrow strength across different patient groups and data platforms to improve the accuracy of estimated networks. Simulation studies demonstrate that TDJGL provides more accurate estimates of gene networks and differential networks than previous competing approaches. We apply TDJGL to the PI3K/AKT/mTOR pathway in ovarian tumors to build differential networks associated with platinum resistance. The hub genes of our inferred differential networks are significantly enriched with known platinum resistance-related genes and include potential platinum resistance-related genes.
Collapse
|
16
|
Li B, Chun H, Zhao H. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis. J Am Stat Assoc 2014; 109:1188-1204. [PMID: 26401064 PMCID: PMC4577248 DOI: 10.1080/01621459.2014.882842] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 11/01/2013] [Indexed: 10/25/2022]
Abstract
We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.
Collapse
Affiliation(s)
- Bing Li
- Professor of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, PA 16802
| | - Hyonho Chun
- Assistant Professor of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907
| | - Hongyu Zhao
- Professor of Biostatistics, Yale University, Suite 503, 300 George Street, New Haven, CT 06410
| |
Collapse
|