1
|
Bernal V, Soancatl-Aguilar V, Bulthuis J, Guryev V, Horvatovich P, Grzegorczyk M. GeneNetTools: tests for Gaussian graphical models with shrinkage. Bioinformatics 2022; 38:5049-5054. [PMID: 36179082 PMCID: PMC9665865 DOI: 10.1093/bioinformatics/btac657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 09/14/2022] [Accepted: 09/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of high-dimensional data analysis by using shrinkage methodologies. Therefore, they have become useful to reconstruct gene regulatory networks from gene-expression profiles. However, it is often ignored that the partial correlations are 'shrunk' and that they cannot be compared/assessed directly. Therefore, accurate (differential) network analyses need to account for the number of variables, the sample size, and also the shrinkage value, otherwise, the analysis and its biological interpretation would turn biased. To date, there are no appropriate methods to account for these factors and address these issues. RESULTS We derive the statistical properties of the partial correlation obtained with the Ledoit-Wolf shrinkage. Our result provides a toolbox for (differential) network analyses as (i) confidence intervals, (ii) a test for zero partial correlation (null-effects) and (iii) a test to compare partial correlations. Our novel (parametric) methods account for the number of variables, the sample size and the shrinkage values. Additionally, they are computationally fast, simple to implement and require only basic statistical knowledge. Our simulations show that the novel tests perform better than DiffNetFDR-a recently published alternative-in terms of the trade-off between true and false positives. The methods are demonstrated on synthetic data and two gene-expression datasets from Escherichia coli and Mus musculus. AVAILABILITY AND IMPLEMENTATION The R package with the methods and the R script with the analysis are available in https://github.com/V-Bernal/GeneNetTools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Bernal
- Center of Information Technology, University of Groningen, Groningen 9747 AJ, The Netherlands,Department of Mathematics, Bernoulli Institute, University of Groningen, Groningen 9747 AG, The Netherlands
| | | | - Jonas Bulthuis
- Center of Information Technology, University of Groningen, Groningen 9747 AJ, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen 9713 AV, The Netherlands
| | | | | |
Collapse
|
2
|
Wang Z, Kaseb AO, Amin HM, Hassan MM, Wang W, Morris JS. Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer. J Am Stat Assoc 2022; 117:533-546. [PMID: 36090952 PMCID: PMC9454401 DOI: 10.1080/01621459.2021.2000866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/13/2021] [Accepted: 10/24/2021] [Indexed: 10/19/2022]
Abstract
It is well-established that interpatient heterogeneity in cancer may significantly affect genomic data analyses and in particular, network topologies. Most existing graphical model methods estimate a single population-level graph for genomic or proteomic network. In many investigations, these networks depend on patient-specific indicators that characterize the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. We evaluate our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity. In application to a dataset of proteomic measurements on plasma samples from patients with hepatocellular carcinoma (HCC), we ascertain how blood protein networks vary with disease severity, as measured by HepatoScore, a novel biomarker signature measuring disease severity. Our case study shows that the network connectivity increases with HepatoScore and a set of hub genes as well as important gene connections are identified under different HepatoScore, which may provide important biological insights to the development of precision therapies for HCC.
Collapse
Affiliation(s)
- Zeya Wang
- Department of Statistics, Rice University; Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Veerabhadran Baladandayuthapani; Department of Biostatistics, University of Michigan
| | - Ahmed O Kaseb
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center
| | - Hesham M Amin
- Department of Hematopathology, The University of Texas MD Anderson Cancer Center
| | - Manal M Hassan
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center
| | - Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| |
Collapse
|
3
|
Tan YT, Ou-Yang L, Jiang X, Yan H, Zhang XF. Identifying Gene Network Rewiring Based on Partial Correlation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:513-521. [PMID: 32750866 DOI: 10.1109/tcbb.2020.3002906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It is an important task to learn how gene regulatory networks change under different conditions. Several Gaussian graphical model-based methods have been proposed to deal with this task by inferring differential networks from gene expression data. However, most existing methods define the differential networks as the difference of precision matrices, which may include false differential edges caused by the change of conditional variances. In addition, prior information about the condition-specific networks and the differential networks can be obtained from other domains. It is useful to incorporate prior information into differential network analysis. In this study, we propose a new differential network analysis method to address the above challenges. Instead of using the precision matrices, we define the differential networks as the difference of partial correlations, which can exclude the spurious differential edges due to the variants of conditional variances. Furthermore, prior information from multiple hypothesis testing is incorporated using a weighted fused penalty. Simulation studies show that our method outperforms the competing methods. We also apply our method to identify the differential network between luminal A and basal-like subtypes of breast cancers and the differential network between acute myeloid leukemia tumors and normal samples. The hub genes in the differential networks identified by our method carry out important biological functions.
Collapse
|
4
|
Kim B, Liu S, Kolar M. Two‐sample inference for high‐dimensional Markov networks. J R Stat Soc Series B Stat Methodol 2021. [DOI: 10.1111/rssb.12446] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Byol Kim
- Department of Statistics The University of Chicago Chicago Illinois USA
| | - Song Liu
- School of Mathematics University of Bristol Bristol UK
- The Alan Turing Institute London UK
| | - Mladen Kolar
- Booth School of Business The University of Chicago Chicago Illinois USA
| |
Collapse
|
5
|
Dai R, Kolar M. Inference for high-dimensional varying-coefficient quantile regression. Electron J Stat 2021. [DOI: 10.1214/21-ejs1919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center, USA
| | - Mladen Kolar
- Booth School of Business, University of Chicago, USA
| |
Collapse
|
6
|
Zhang XF, Ou-Yang L, Yang S, Hu X, Yan H. DiffNetFDR: differential network analysis with false discovery rate control. Bioinformatics 2020; 35:3184-3186. [PMID: 30689728 DOI: 10.1093/bioinformatics/btz051] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 01/10/2019] [Accepted: 01/20/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY To identify biological network rewiring under different conditions, we develop a user-friendly R package, named DiffNetFDR, to implement two methods developed for testing the difference in different Gaussian graphical models. Compared to existing tools, our methods have the following features: (i) they are based on Gaussian graphical models which can capture the changes of conditional dependencies; (ii) they determine the tuning parameters in a data-driven manner; (iii) they take a multiple testing procedure to control the overall false discovery rate; and (iv) our approach defines the differential network based on partial correlation coefficients so that the spurious differential edges caused by the variants of conditional variances can be excluded. We also develop a Shiny application to provide easier analysis and visualization. Simulation studies are conducted to evaluate the performance of our methods. We also apply our methods to two real gene expression datasets. The effectiveness of our methods is validated by the biological significance of the identified differential networks. AVAILABILITY AND IMPLEMENTATION R package and Shiny app are available at https://github.com/Zhangxf-ccnu/DiffNetFDR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- Department of Statistics, School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Department of Electronic Engineering, Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen, China
| | - Shuo Yang
- Department of Respiratory Medicine, Wuhan Number 1 Hospital, Wuhan, China
| | - Xiaohua Hu
- Department of Information Science, College of Computing and Informatics, Drexel University, Philadelphia, USA
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
7
|
Statistics in the Genomic Era. Genes (Basel) 2020; 11:genes11040443. [PMID: 32325634 PMCID: PMC7230157 DOI: 10.3390/genes11040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 04/15/2020] [Indexed: 11/29/2022] Open
|
8
|
Pan Y, Mai Q. Efficient computation for differential network analysis with applications to quadratic discriminant analysis. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106884] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
9
|
Zhang Q. Testing Differential Gene Networks under Nonparanormal Graphical Models with False Discovery Rate Control. Genes (Basel) 2020; 11:E167. [PMID: 32033447 PMCID: PMC7073847 DOI: 10.3390/genes11020167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 01/27/2020] [Accepted: 01/30/2020] [Indexed: 11/16/2022] Open
Abstract
The nonparanormal graphical model has emerged as an important tool for modeling dependency structure between variables because it is flexible to non-Gaussian data while maintaining the good interpretability and computational convenience of Gaussian graphical models. In this paper, we consider the problem of detecting differential substructure between two nonparanormal graphical models with false discovery rate control. We construct a new statistic based on a truncated estimator of the unknown transformation functions, together with a bias-corrected sample covariance. Furthermore, we show that the new test statistic converges to the same distribution as its oracle counterpart does. Both synthetic data and real cancer genomic data are used to illustrate the promise of the new method. Our proposed testing framework is simple and scalable, facilitating its applications to large-scale data. The computational pipeline has been implemented in the R package DNetFinder, which is freely available through the Comprehensive R Archive Network.
Collapse
Affiliation(s)
- Qingyang Zhang
- Department of Mathematical Sciences, University of Arkansas, Arkansas, AR 72701, USA
| |
Collapse
|
10
|
Zhang Q. Direct estimation of differential networks under high‐dimensional nonparanormal graphical models. CAN J STAT 2019. [DOI: 10.1002/cjs.11526] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Qingyang Zhang
- Department of Mathematical SciencesUniversity of ArkansasFayetteville AR U.S.A
| |
Collapse
|
11
|
Zhang Q, Du Y. Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control. PLoS One 2019; 14:e0217463. [PMID: 31150453 PMCID: PMC6544247 DOI: 10.1371/journal.pone.0217463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 05/13/2019] [Indexed: 11/19/2022] Open
Abstract
Feature screening has become a real prerequisite for the analysis of high-dimensional genomic data, as it is effective in reducing dimensionality and removing redundant features. However, existing methods for feature screening have been mostly relying on the assumptions of linear effects and independence (or weak dependence) between features, which might be inappropriate in real practice. In this paper, we consider the problem of selecting continuous features for a categorical outcome from high-dimensional data. We propose a powerful statistical procedure that consists of two steps, a nonparametric significance test based on edge count and a multiple testing procedure with dependence adjustment for false discovery rate control. The new method presents two novelties. First, the edge-count test directly targets distributional difference between groups, therefore it is sensitive to nonlinear effects. Second, we relax the independence assumption and adapt Efron's procedure to adjust for the dependence between features. The performance of the proposed procedure, in terms of statistical power and false discovery rate, is illustrated by simulated data. We apply the new method to three genomic datasets to identify genes associated with colon, cervical and prostate cancers.
Collapse
Affiliation(s)
- Qingyang Zhang
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR, United States of America
| | - Yuchun Du
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, United States of America
| |
Collapse
|
12
|
He Y, Ji J, Xie L, Zhang X, Xue F. A new insight into underlying disease mechanism through semi-parametric latent differential network model. BMC Bioinformatics 2018; 19:493. [PMID: 30591011 PMCID: PMC6309076 DOI: 10.1186/s12859-018-2461-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types. RESULTS We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness. CONCLUSIONS The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended.
Collapse
Affiliation(s)
- Yong He
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065 USA
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016 USA
| | - Xinsheng Zhang
- School of Management, Fudan University, Shanghai, 200433 China
| | - Fuzhong Xue
- School of Public Health, Shandong University, Jinan, 250012 China
| |
Collapse
|