1
|
Yan H, Lu S, Zhang S. The cluster D-trace loss for differential network analysis. J Appl Stat 2023; 51:1843-1860. [PMID: 39071251 PMCID: PMC11271130 DOI: 10.1080/02664763.2023.2245178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 07/29/2023] [Indexed: 07/30/2024]
Abstract
A growing literature suggests that gene expression can be greatly altered in disease conditions, and identifying those changes will improve the understanding of complex diseases such as cancers or diabetes. A prevailing direction in the analysis of gene expression studies the changes in gene pathways which include sets of related genes. Therefore, introducing structured exploration to differential analysis of gene expression networks may lead to meaningful discoveries. The topic of this paper is differential network analysis, which focuses on capturing the differences between two or more precision matrices. We discuss the connection between the thresholding method and the D-trace loss method on differential network analysis in the case that the precision matrices share the common connected components. Based on this connection, we further propose the cluster D-trace loss method which directly estimates the differential network and achieves model selection consistency. Simulation studies demonstrate its improved performance and computational efficiency. Finally, the usefulness of our proposed estimator is demonstrated by a real-data analysis on non-small cell lung cancer.
Collapse
Affiliation(s)
- Han Yan
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, People's Republic of China
- Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, People's Republic of China
- Pazhou Lab, Guangzhou, People's Republic of China
| | - Shuhan Lu
- Department of Mathematics, University of California, Los Angeles, CA, USA
| | - Sanguo Zhang
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, People's Republic of China
- Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, People's Republic of China
- Pazhou Lab, Guangzhou, People's Republic of China
| |
Collapse
|
2
|
Galindez G, Sadegh S, Baumbach J, Kacprowski T, List M. Network-based approaches for modeling disease regulation and progression. Comput Struct Biotechnol J 2022; 21:780-795. [PMID: 36698974 PMCID: PMC9841310 DOI: 10.1016/j.csbj.2022.12.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/14/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
Molecular interaction networks lay the foundation for studying how biological functions are controlled by the complex interplay of genes and proteins. Investigating perturbed processes using biological networks has been instrumental in uncovering mechanisms that underlie complex disease phenotypes. Rapid advances in omics technologies have prompted the generation of high-throughput datasets, enabling large-scale, network-based analyses. Consequently, various modeling techniques, including network enrichment, differential network extraction, and network inference, have proven to be useful for gaining new mechanistic insights. We provide an overview of recent network-based methods and their core ideas to facilitate the discovery of disease modules or candidate mechanisms. Knowledge generated from these computational efforts will benefit biomedical research, especially drug development and precision medicine. We further discuss current challenges and provide perspectives in the field, highlighting the need for more integrative and dynamic network approaches to model disease development and progression.
Collapse
Affiliation(s)
- Gihanna Galindez
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.,Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| |
Collapse
|
3
|
Leng J, Wu LY. Interaction-based transcriptome analysis via differential network inference. Brief Bioinform 2022; 23:6768051. [PMID: 36274239 PMCID: PMC9677477 DOI: 10.1093/bib/bbac466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/13/2022] [Accepted: 09/28/2022] [Indexed: 12/14/2022] Open
Abstract
Gene-based transcriptome analysis, such as differential expression analysis, can identify the key factors causing disease production, cell differentiation and other biological processes. However, this is not enough because basic life activities are mainly driven by the interactions between genes. Although there have been already many differential network inference methods for identifying the differential gene interactions, currently, most studies still only use the information of nodes in the network for downstream analyses. To investigate the insight into differential gene interactions, we should perform interaction-based transcriptome analysis (IBTA) instead of gene-based analysis after obtaining the differential networks. In this paper, we illustrated a workflow of IBTA by developing a Co-hub Differential Network inference (CDN) algorithm, and a novel interaction-based metric, pivot APC2. We confirmed the superior performance of CDN through simulation experiments compared with other popular differential network inference algorithms. Furthermore, three case studies are given using colorectal cancer, COVID-19 and triple-negative breast cancer datasets to demonstrate the ability of our interaction-based analytical process to uncover causative mechanisms.
Collapse
Affiliation(s)
- Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ling-Yun Wu
- Corresponding author. Ling-Yun Wu, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. E-mail:
| |
Collapse
|
4
|
Bernal V, Soancatl-Aguilar V, Bulthuis J, Guryev V, Horvatovich P, Grzegorczyk M. GeneNetTools: tests for Gaussian graphical models with shrinkage. Bioinformatics 2022; 38:5049-5054. [PMID: 36179082 PMCID: PMC9665865 DOI: 10.1093/bioinformatics/btac657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 09/14/2022] [Accepted: 09/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of high-dimensional data analysis by using shrinkage methodologies. Therefore, they have become useful to reconstruct gene regulatory networks from gene-expression profiles. However, it is often ignored that the partial correlations are 'shrunk' and that they cannot be compared/assessed directly. Therefore, accurate (differential) network analyses need to account for the number of variables, the sample size, and also the shrinkage value, otherwise, the analysis and its biological interpretation would turn biased. To date, there are no appropriate methods to account for these factors and address these issues. RESULTS We derive the statistical properties of the partial correlation obtained with the Ledoit-Wolf shrinkage. Our result provides a toolbox for (differential) network analyses as (i) confidence intervals, (ii) a test for zero partial correlation (null-effects) and (iii) a test to compare partial correlations. Our novel (parametric) methods account for the number of variables, the sample size and the shrinkage values. Additionally, they are computationally fast, simple to implement and require only basic statistical knowledge. Our simulations show that the novel tests perform better than DiffNetFDR-a recently published alternative-in terms of the trade-off between true and false positives. The methods are demonstrated on synthetic data and two gene-expression datasets from Escherichia coli and Mus musculus. AVAILABILITY AND IMPLEMENTATION The R package with the methods and the R script with the analysis are available in https://github.com/V-Bernal/GeneNetTools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor Bernal
- Center of Information Technology, University of Groningen, Groningen 9747 AJ, The Netherlands,Department of Mathematics, Bernoulli Institute, University of Groningen, Groningen 9747 AG, The Netherlands
| | | | - Jonas Bulthuis
- Center of Information Technology, University of Groningen, Groningen 9747 AJ, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen 9713 AV, The Netherlands
| | | | | |
Collapse
|
5
|
Chen L, Wan H, He Q, He S, Deng M. Statistical Methods for Microbiome Compositional Data Network Inference: A Survey. J Comput Biol 2022; 29:704-723. [PMID: 35404093 DOI: 10.1089/cmb.2021.0406] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Microbes can be found almost everywhere in the world. They are not isolated, but rather interact with each other and establish connections with their living environments. Studying these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A widely used approach toward this objective involves the inference of microbiome interaction networks. However, owing to the compositional, high-dimensional, sparse, and heterogeneous nature of observed microbial data, applying network inference methods to estimate their associations is challenging. In addition, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this article, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks, and differential networks. Their assumptions, high-level ideas, advantages, as well as limitations, are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has, to date, captured all the aspects of interest. In addition, we discuss the challenges now confronting current microbial interaction study and future prospects. Finally, we point out several feasible directions of microbial network inference analysis and highlight that future research requires the joint promotion of statistical computation methods and experimental techniques.
Collapse
Affiliation(s)
- Liang Chen
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Hui Wan
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Qiuyan He
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Shun He
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China.,Center for Statistical Science, Peking University, Beijing, China.,Center for Quantitative Biology, Peking University, Beijing, China
| |
Collapse
|
6
|
Smith J, Arashi M, Bekker A. Empowering differential networks using Bayesian analysis. PLoS One 2022; 17:e0261193. [PMID: 35077451 PMCID: PMC8789149 DOI: 10.1371/journal.pone.0261193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 11/24/2021] [Indexed: 11/19/2022] Open
Abstract
Differential networks (DN) are important tools for modeling the changes in conditional dependencies between multiple samples. A Bayesian approach for estimating DNs, from the classical viewpoint, is introduced with a computationally efficient threshold selection for graphical model determination. The algorithm separately estimates the precision matrices of the DN using the Bayesian adaptive graphical lasso procedure. Synthetic experiments illustrate that the Bayesian DN performs exceptionally well in numerical accuracy and graphical structure determination in comparison to state of the art methods. The proposed method is applied to South African COVID-19 data to investigate the change in DN structure between various phases of the pandemic.
Collapse
Affiliation(s)
- Jarod Smith
- Department of Statistics, University of Pretoria, Pretoria, South Africa
| | - Mohammad Arashi
- Department of Statistics, University of Pretoria, Pretoria, South Africa
- Department of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Andriëtte Bekker
- Department of Statistics, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
7
|
Tan YT, Ou-Yang L, Jiang X, Yan H, Zhang XF. Identifying Gene Network Rewiring Based on Partial Correlation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:513-521. [PMID: 32750866 DOI: 10.1109/tcbb.2020.3002906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It is an important task to learn how gene regulatory networks change under different conditions. Several Gaussian graphical model-based methods have been proposed to deal with this task by inferring differential networks from gene expression data. However, most existing methods define the differential networks as the difference of precision matrices, which may include false differential edges caused by the change of conditional variances. In addition, prior information about the condition-specific networks and the differential networks can be obtained from other domains. It is useful to incorporate prior information into differential network analysis. In this study, we propose a new differential network analysis method to address the above challenges. Instead of using the precision matrices, we define the differential networks as the difference of partial correlations, which can exclude the spurious differential edges due to the variants of conditional variances. Furthermore, prior information from multiple hypothesis testing is incorporated using a weighted fused penalty. Simulation studies show that our method outperforms the competing methods. We also apply our method to identify the differential network between luminal A and basal-like subtypes of breast cancers and the differential network between acute myeloid leukemia tumors and normal samples. The hub genes in the differential networks identified by our method carry out important biological functions.
Collapse
|
8
|
Liu C, Cai D, Zeng W, Huang Y. Inferring Differential Networks by Integrating Gene Expression Data With Additional Knowledge. Front Genet 2021; 12:760155. [PMID: 34858477 PMCID: PMC8632038 DOI: 10.3389/fgene.2021.760155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 10/13/2021] [Indexed: 11/23/2022] Open
Abstract
Evidences increasingly indicate the involvement of gene network rewiring in disease development and cell differentiation. With the accumulation of high-throughput gene expression data, it is now possible to infer the changes of gene networks between two different states or cell types via computational approaches. However, the distribution diversity of multi-platform gene expression data and the sparseness and high noise rate of single-cell RNA sequencing (scRNA-seq) data raise new challenges for existing differential network estimation methods. Furthermore, most existing methods are purely rely on gene expression data, and ignore the additional information provided by various existing biological knowledge. In this study, to address these challenges, we propose a general framework, named weighted joint sparse penalized D-trace model (WJSDM), to infer differential gene networks by integrating multi-platform gene expression data and multiple prior biological knowledge. Firstly, a non-paranormal graphical model is employed to tackle gene expression data with missing values. Then we propose a weighted group bridge penalty to integrate multi-platform gene expression data and various existing biological knowledge. Experiment results on synthetic data demonstrate the effectiveness of our method in inferring differential networks. We apply our method to the gene expression data of ovarian cancer and the scRNA-seq data of circulating tumor cells of prostate cancer, and infer the differential network associated with platinum resistance of ovarian cancer and anti-androgen resistance of prostate cancer. By analyzing the estimated differential networks, we find some important biological insights about the mechanisms underlying platinum resistance of ovarian cancer and anti-androgen resistance of prostate cancer.
Collapse
Affiliation(s)
- Chen Liu
- Department of Chemotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - WuCha Zeng
- Department of Chemotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| | - Yun Huang
- Department of Geriatric Medicine, The First Affiliated Hospital of Fujian Medical University, Fuzhou, China
| |
Collapse
|
9
|
Ji J, He Y, Liu L, Xie L. Brain connectivity alteration detection via matrix-variate differential network model. Biometrics 2021; 77:1409-1421. [PMID: 32829503 PMCID: PMC7900256 DOI: 10.1111/biom.13359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 08/10/2020] [Accepted: 08/14/2020] [Indexed: 10/23/2022]
Abstract
Brain functional connectivity reveals the synchronization of brain systems through correlations in neurophysiological measures of brain activities. Growing evidence now suggests that the brain connectivity network experiences alterations with the presence of numerous neurological disorders, thus differential brain network analysis may provide new insights into disease pathologies. The data from neurophysiological measurement are often multidimensional and in a matrix form, posing a challenge in brain connectivity analysis. Existing graphical model estimation methods either assume a vector normal distribution that in essence requires the columns of the matrix data to be independent or fail to address the estimation of differential networks across different populations. To tackle these issues, we propose an innovative matrix-variate differential network (MVDN) model. We exploit the D-trace loss function and a Lasso-type penalty to directly estimate the spatial differential partial correlation matrix and use an alternating direction method of multipliers algorithm for the optimization problem. Theoretical and simulation studies demonstrate that MVDN significantly outperforms other state-of-the-art methods in dynamic differential network analysis. We illustrate with a functional connectivity analysis of an attention deficit hyperactivity disorder dataset. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies.
Collapse
Affiliation(s)
- Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yong He
- Institute for Financial Studies, Shandong University, Jinan, China
| | - Lei Liu
- Division of Biostatistics, Washington University in St. Louis, U.S.A
| | - Lei Xie
- The Graduate Center, The City University of New York, New York, 10016, U.S.A
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065, U.S.A
| |
Collapse
|
10
|
Leng J, Wu LY. Importance-Penalized Joint Graphical Lasso (IPJGL): differential network inference via GGMs. Bioinformatics 2021; 38:770-777. [PMID: 34718410 PMCID: PMC8756181 DOI: 10.1093/bioinformatics/btab751] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 10/03/2021] [Accepted: 10/27/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Differential network inference is a fundamental and challenging problem to reveal gene interactions and regulation relationships under different conditions. Many algorithms have been developed for this problem; however, they do not consider the differences between the importance of genes, which may not fit the real-world situation. Different genes have different mutation probabilities, and the vital genes associated with basic life activities have less fault tolerance to mutation. Equally treating all genes may bias the results of differential network inference. Thus, it is necessary to consider the importance of genes in the models of differential network inference. RESULTS Based on the Gaussian graphical model with adaptive gene importance regularization, we develop a novel Importance-Penalized Joint Graphical Lasso method (IPJGL) for differential network inference. The presented method is validated by the simulation experiments as well as the real datasets. Furthermore, to precisely evaluate the results of differential network inference, we propose a new metric named APC2 for the differential levels of gene pairs. We apply IPJGL to analyze the TCGA colorectal and breast cancer datasets and find some candidate cancer genes with significant survival analysis results, including SOST for colorectal cancer and RBBP8 for breast cancer. We also conduct further analysis based on the interactions in the Reactome database and confirm the utility of our method. AVAILABILITY AND IMPLEMENTATION R source code of Importance-Penalized Joint Graphical Lasso is freely available at https://github.com/Wu-Lab/IPJGL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | | |
Collapse
|
11
|
Tu JJ, Ou-Yang L, Zhu Y, Yan H, Qin H, Zhang XF. Differential network analysis by simultaneously considering changes in gene interactions and gene expression. Bioinformatics 2021; 37:4414-4423. [PMID: 34245246 DOI: 10.1093/bioinformatics/btab502] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/13/2021] [Accepted: 07/05/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Differential network analysis is an important tool to investigate the rewiring of gene interactions under different conditions. Several computational methods have been developed to estimate differential networks from gene expression data, but most of them do not consider that gene network rewiring may be driven by the differential expression of individual genes. New differential network analysis methods that simultaneously take account of the changes in gene interactions and changes in expression levels are needed. RESULTS In this paper, we propose a differential network analysis method that considers the differential expression of individual genes when identifying differential edges. First, two hypothesis test statistics are used to quantify changes in partial correlations between gene pairs and changes in expression levels for individual genes. Then, an optimization framework is proposed to combine the two test statistics so that the resulting differential network has a hierarchical property, where a differential edge can be considered only if at least one of the two involved genes is differentially expressed. Simulation results indicate that our method outperforms current state-of-the-art methods. We apply our method to identify the differential networks between the luminal A and basal-like subtypes of breast cancer and those between acute myeloid leukemia and normal samples. Hub nodes in the differential networks estimated by our method, including both differentially and non-differentially expressed genes, have important biological functions. AVAILABILITY The source code is available at https://github.com/Zhangxf-ccnu/chNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia-Juan Tu
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Le Ou-Yang
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, 430074, China.,Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, China University of Geosciences, Wuhan, 430074, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Hong Qin
- Department of Statistics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| |
Collapse
|
12
|
scLink: Inferring Sparse Gene Co-expression Networks from Single-cell Expression Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:475-492. [PMID: 34252628 PMCID: PMC8896229 DOI: 10.1016/j.gpb.2020.11.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/23/2020] [Accepted: 12/26/2020] [Indexed: 11/23/2022]
Abstract
A system-level understanding of the regulation and coordination mechanisms of gene expression is essential for studying the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The scLink R package is available at https://github.com/Vivianstats/scLink.
Collapse
|
13
|
Xu T, Ou-Yang L, Yan H, Zhang XF. Time-Varying Differential Network Analysis for Revealing Network Rewiring over Cancer Progression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1632-1642. [PMID: 31647444 DOI: 10.1109/tcbb.2019.2949039] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
To reveal how gene regulatory networks change over cancer development, multiple time-varying differential networks between adjacent cancer stages should be estimated simultaneously. Since the network rewiring may be driven by the perturbation of certain individual genes, there may be some hub nodes shared by these differential networks. Although several methods have been developed to estimate differential networks from gene expression data, most of them are designed for estimating a single differential network, which neglect the similarities between different differential networks. In this article, we propose a new Gaussian graphical model-based method to jointly estimate multiple time-varying differential networks for identifying network rewiring over cancer development. A D-trace loss is used to determine the differential networks. A tree-structured group Lasso penalty is designed to identify the common hub nodes shared by different differential networks and the specific hub nodes unique to individual differential networks. Simulation experiment results demonstrate that our method outperforms other state-of-the-art techniques in most cases. We also apply our method to The Cancer Genome Atlas data to explore gene network rewiring over different breast cancer stages. Hub nodes in the estimated differential networks rediscover well known genes associated with the development and progression of breast cancer.
Collapse
|
14
|
Ou-Yang L, Cai D, Zhang XF, Yan H. WDNE: an integrative graphical model for inferring differential networks from multi-platform gene expression data with missing values. Brief Bioinform 2021; 22:6272792. [PMID: 33975339 DOI: 10.1093/bib/bbab086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 02/14/2021] [Accepted: 02/23/2021] [Indexed: 11/14/2022] Open
Abstract
The mechanisms controlling biological process, such as the development of disease or cell differentiation, can be investigated by examining changes in the networks of gene dependencies between states in the process. High-throughput experimental methods, like microarray and RNA sequencing, have been widely used to gather gene expression data, which paves the way to infer gene dependencies based on computational methods. However, most differential network analysis methods are designed to deal with fully observed data, but missing values, such as the dropout events in single-cell RNA-sequencing data, are frequent. New methods are needed to take account of these missing values. Moreover, since the changes of gene dependencies may be driven by certain perturbed genes, considering the changes in gene expression levels may promote the identification of gene network rewiring. In this study, a novel weighted differential network estimation (WDNE) model is proposed to handle multi-platform gene expression data with missing values and take account of changes in gene expression levels. Simulation studies demonstrate that WDNE outperforms state-of-the-art differential network estimation methods. When applied WDNE to infer differential gene networks associated with drug resistance in ovarian tumors, cell differentiation and breast tumor heterogeneity, the hub genes in the estimated differential gene networks can provide important insights into the underlying mechanisms. Furthermore, a Matlab toolbox, differential network analysis toolbox, was developed to implement the WDNE model and visualize the estimated differential networks.
Collapse
Affiliation(s)
- Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, 999077, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, 999077, China
| |
Collapse
|
15
|
Kontio JAJ, Pyhäjärvi T, Sillanpää MJ. Model guided trait-specific co-expression network estimation as a new perspective for identifying molecular interactions and pathways. PLoS Comput Biol 2021; 17:e1008960. [PMID: 33939702 PMCID: PMC8118548 DOI: 10.1371/journal.pcbi.1008960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 05/13/2021] [Accepted: 04/13/2021] [Indexed: 11/19/2022] Open
Abstract
A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.
Collapse
Affiliation(s)
- Juho A. J. Kontio
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| | - Tanja Pyhäjärvi
- Department of Ecology and Genetics, University of Oulu, Oulu, Finland
- Department of Forest Sciences, University of Helsinki, Helsinki, Finland
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- * E-mail:
| |
Collapse
|
16
|
Chen H, Guo Y, He Y, Ji J, Liu L, Shi Y, Wang Y, Yu L, Zhang X. Simultaneous differential network analysis and classification for matrix-variate data with application to brain connectivity. Biostatistics 2021; 23:967-989. [PMID: 33769450 DOI: 10.1093/biostatistics/kxab007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 01/03/2023] Open
Abstract
Growing evidence has shown that the brain connectivity network experiences alterations for complex diseases such as Alzheimer's disease (AD). Network comparison, also known as differential network analysis, is thus particularly powerful to reveal the disease pathologies and identify clinical biomarkers for medical diagnoses (classification). Data from neurophysiological measurements are multidimensional and in matrix-form. Naive vectorization method is not sufficient as it ignores the structural information within the matrix. In the article, we adopt the Kronecker product covariance matrices framework to capture both spatial and temporal correlations of the matrix-variate data while the temporal covariance matrix is treated as a nuisance parameter. By recognizing that the strengths of network connections may vary across subjects, we develop an ensemble-learning procedure, which identifies the differential interaction patterns of brain regions between the case group and the control group and conducts medical diagnosis (classification) of the disease simultaneously. Simulation studies are conducted to assess the performance of the proposed method. We apply the proposed procedure to the functional connectivity analysis of an functional magnetic resonance imaging study on AD. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies, and satisfactory out-of-sample classification performance is achieved for medical diagnosis of AD.
Collapse
Affiliation(s)
- Hao Chen
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014, China
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Yong He
- Institute for Financial Studies, Shandong University, Jinan, 250100, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, 250100, China
| | - Lei Liu
- Division of Biostatistics, Washington University in St.Louis, St. Louis, MO 63110, USA
| | - Yufeng Shi
- Institute for Financial Studies, Shandong University, Jinan, 250100, China
| | - Yikai Wang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Long Yu
- Department of Statistics, School of Management, Fudan University, Shanghai, 200433, China
| | - Xinsheng Zhang
- Department of Statistics, School of Management, Fudan University, Shanghai, 200433, China
| | | |
Collapse
|
17
|
Shojaie A. Differential Network Analysis: A Statistical Perspective. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2021; 13:e1508. [PMID: 37050915 PMCID: PMC10088462 DOI: 10.1002/wics.1508] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 03/03/2020] [Indexed: 11/06/2022]
Abstract
Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.
Collapse
Affiliation(s)
- Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle WA
| |
Collapse
|
18
|
Bar H, Bang S. A mixture model to detect edges in sparse co-expression graphs with an application for comparing breast cancer subtypes. PLoS One 2021; 16:e0246945. [PMID: 33571253 PMCID: PMC7877669 DOI: 10.1371/journal.pone.0246945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 01/28/2021] [Indexed: 11/19/2022] Open
Abstract
We develop a method to recover a gene network's structure from co-expression data, measured in terms of normalized Pearson's correlation coefficients between gene pairs. We treat these co-expression measurements as weights in the complete graph in which nodes correspond to genes. To decide which edges exist in the gene network, we fit a three-component mixture model such that the observed weights of 'null edges' follow a normal distribution with mean 0, and the non-null edges follow a mixture of two lognormal distributions, one for positively- and one for negatively-correlated pairs. We show that this so-called L2 N mixture model outperforms other methods in terms of power to detect edges, and it allows to control the false discovery rate. Importantly, our method makes no assumptions about the true network structure. We demonstrate our method, which is implemented in an R package called edgefinder, using a large dataset consisting of expression values of 12,750 genes obtained from 1,616 women. We infer the gene network structure by cancer subtype, and find insightful subtype characteristics. For example, we find thirteen pathways which are enriched in each of the cancer groups but not in the Normal group, with two of the pathways associated with autoimmune diseases and two other with graft rejection. We also find specific characteristics of different breast cancer subtypes. For example, the Luminal A network includes a single, highly connected cluster of genes, which is enriched in the human diseases category, and in the Her2 subtype network we find a distinct, and highly interconnected cluster which is uniquely enriched in drug metabolism pathways.
Collapse
Affiliation(s)
- Haim Bar
- Department of Statistics, University of Connecticut, Storrs, CT, United States of America
| | - Seojin Bang
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States of America
| |
Collapse
|
19
|
Na S, Kolar M, Koyejo O. Estimating differential latent variable graphical models with applications to brain connectivity. Biometrika 2020. [DOI: 10.1093/biomet/asaa066] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, and thus are of particular interest for scientific investigations. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the differential network is decomposed into sparse and low-rank components, both of which are symmetric indefinite matrices. We estimate these two components simultaneously using a two-stage procedure: (i) an initialization stage, which computes a simple, consistent estimator, and (ii) a convergence stage, implemented using a projected alternating gradient descent algorithm applied to a nonconvex objective, initialized using the output of the first stage. We prove that given the initialization, the estimator converges linearly with a nontrivial, minimax optimal statistical error. Experiments on synthetic and real data illustrate that the proposed nonconvex procedure outperforms existing methods.
Collapse
Affiliation(s)
- S Na
- Department of Statistics, University of Chicago, 5747 South Ellis Avenue, Chicago, Illinois 60637, U.S.A
| | - M Kolar
- Booth School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, Illinois 60637, U.S.A
| | - O Koyejo
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, Illinois 61801, U.S.A
| |
Collapse
|
20
|
Zhang J, Liu J, Lee D, Lou S, Chen Z, Gürsoy G, Gerstein M. DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring. BMC Bioinformatics 2020; 21:281. [PMID: 32615918 PMCID: PMC7333332 DOI: 10.1186/s12859-020-03605-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 06/15/2020] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. "co-binding changes") can affect the co-regulating associations between TFs (i.e. "rewiring the co-regulator network"). This, in turn, can potentially drive downstream expression changes, phenotypic variation, and even disease. However, quantification of co-regulatory network rewiring has not been comprehensively studied. RESULTS To address this, we propose DiNeR, a computational method to directly construct a differential TF co-regulation network from paired disease-to-normal ChIP-seq data. Specifically, DiNeR uses a graphical model to capture the gained and lost edges in the co-regulation network. Then, it adopts a stability-based, sparsity-tuning criterion -- by sub-sampling the complete binding profiles to remove spurious edges -- to report only significant co-regulation alterations. Finally, DiNeR highlights hubs in the resultant differential network as key TFs associated with disease. We assembled genome-wide binding profiles of 104 TFs in the K562 and GM12878 cell lines, which loosely model the transition between normal and cancerous states in chronic myeloid leukemia (CML). In total, we identified 351 significantly altered TF co-regulation pairs. In particular, we found that the co-binding of the tumor suppressor BRCA1 and RNA polymerase II, a well-known transcriptional pair in healthy cells, was disrupted in tumors. Thus, DiNeR successfully extracted hub regulators and discovered well-known risk genes. CONCLUSIONS Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators. Our method DiNeR makes it possible to quantify changes in co-regulatory networks and identify alterations to TF co-binding patterns, highlighting key disease regulators.
Collapse
Affiliation(s)
- Jing Zhang
- Department of Computer Science, University of California, Irvine, CA, 92617, USA
| | - Jason Liu
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Donghoon Lee
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Shaoke Lou
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Zhanlin Chen
- Department of Molecular Cellular and Developmental Biology, Yale University, New Haven, CT, 06520, USA
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA
| | - Gamze Gürsoy
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Mark Gerstein
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, 06520, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
21
|
Zhang XF, Ou-Yang L, Yang S, Hu X, Yan H. DiffNetFDR: differential network analysis with false discovery rate control. Bioinformatics 2020; 35:3184-3186. [PMID: 30689728 DOI: 10.1093/bioinformatics/btz051] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 01/10/2019] [Accepted: 01/20/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY To identify biological network rewiring under different conditions, we develop a user-friendly R package, named DiffNetFDR, to implement two methods developed for testing the difference in different Gaussian graphical models. Compared to existing tools, our methods have the following features: (i) they are based on Gaussian graphical models which can capture the changes of conditional dependencies; (ii) they determine the tuning parameters in a data-driven manner; (iii) they take a multiple testing procedure to control the overall false discovery rate; and (iv) our approach defines the differential network based on partial correlation coefficients so that the spurious differential edges caused by the variants of conditional variances can be excluded. We also develop a Shiny application to provide easier analysis and visualization. Simulation studies are conducted to evaluate the performance of our methods. We also apply our methods to two real gene expression datasets. The effectiveness of our methods is validated by the biological significance of the identified differential networks. AVAILABILITY AND IMPLEMENTATION R package and Shiny app are available at https://github.com/Zhangxf-ccnu/DiffNetFDR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- Department of Statistics, School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Department of Electronic Engineering, Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen, China
| | - Shuo Yang
- Department of Respiratory Medicine, Wuhan Number 1 Hospital, Wuhan, China
| | - Xiaohua Hu
- Department of Information Science, College of Computing and Informatics, Drexel University, Philadelphia, USA
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
22
|
Pan Y, Mai Q. Efficient computation for differential network analysis with applications to quadratic discriminant analysis. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106884] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
23
|
Yuan R, Ou-Yang L, Hu X, Zhang XF. Identifying Gene Network Rewiring Using Robust Differential Graphical Model with Multivariate t-Distribution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:712-718. [PMID: 30802872 DOI: 10.1109/tcbb.2019.2901473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying gene network rewiring under different biological conditions is important for understanding the mechanisms underlying complex diseases. Gaussian graphical models, which assume the data follow the multivariate normal distribution, are widely used to identify gene network rewiring. However, the normality assume often fails in reality since the data are contaminated by extreme outliers in general. In this study, we propose a new robust differential graphical model to identify gene network rewiring between two conditions based on the multivariate t-distribution. The multivariate t-distribution is more robust to outliers than the normal distribution since it has heavy tails and allows values far from the mean. A fused lasso penalty is used to borrow information across conditions to improve the results. We develop an expectation maximization algorithm to solve the optimization model. Experiment results on simulated data show that our method outperforms the state-of-the-art methods. Our method is also applied to identify gene network rewiring between luminal A and basal-like subtypes of breast cancer, and gene network rewiring between the proneural and mesenchymal subtypes of glioblastoma. Several key genes which drive gene network rewiring are discovered.
Collapse
|
24
|
Wu Y, Li T, Liu X, Chen L. Differential network inference via the fused D-trace loss with cross variables. Electron J Stat 2020. [DOI: 10.1214/20-ejs1691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Ou-Yang L, Zhang XF, Zhao XM, Wang DD, Wang FL, Lei B, Yan H. Joint Learning of Multiple Differential Networks With Latent Variables. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3494-3506. [PMID: 29994625 DOI: 10.1109/tcyb.2018.2845838] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Graphical models have been widely used to learn the conditional dependence structures among random variables. In many controlled experiments, such as the studies of disease or drug effectiveness, learning the structural changes of graphical models under two different conditions is of great importance. However, most existing graphical models are developed for estimating a single graph and based on a tacit assumption that there is no missing relevant variables, which wastes the common information provided by multiple heterogeneous data sets and underestimates the influence of latent/unobserved relevant variables. In this paper, we propose a joint differential network analysis (JDNA) model to jointly estimate multiple differential networks with latent variables from multiple data sets. The JDNA model is built on a penalized D-trace loss function, with group lasso or generalized fused lasso penalties. We implement a proximal gradient-based alternating direction method of multipliers to tackle the corresponding convex optimization problems. Extensive simulation experiments demonstrate that JDNA model outperforms state-of-the-art methods in estimating the structural changes of graphical models. Moreover, a series of experiments on several real-world data sets have been performed and experiment results consistently show that our proposed JDNA model is effective in identifying differential networks under different conditions.
Collapse
|
26
|
Tang Z, Yu Z, Wang C. A fast iterative algorithm for high-dimensional differential network. Comput Stat 2019. [DOI: 10.1007/s00180-019-00915-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
27
|
He S, Deng M. Direct interaction network and differential network inference from compositional data via lasso penalized D-trace loss. PLoS One 2019; 14:e0207731. [PMID: 31339885 PMCID: PMC6655598 DOI: 10.1371/journal.pone.0207731] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 07/02/2019] [Indexed: 11/30/2022] Open
Abstract
The development of high-throughput sequencing technologies for 16S rRNA gene profiling provides higher quality compositional data for microbe communities. Inferring the direct interaction network under a specific condition and understanding how the network structure changes between two different environmental or genetic conditions are two important topics in biological studies. However, the compositional nature and high dimensionality of the data are challenging in the context of network and differential network recovery. To address this problem in the present paper, we proposed two new loss functions to incorporate the data transformations developed for compositional data analysis into D-trace loss for network and differential network estimation, respectively. The sparse matrix estimators are defined as the minimizer of the corresponding lasso penalized loss. Our method is characterized by its straightforward application based on the ADMM algorithm for numerical solution. Simulations show that the proposed method outperforms other state-of-the-art methods in network and differential network inference under different scenarios. Finally, as an illustration, our method is applied to a mouse skin microbiome data.
Collapse
Affiliation(s)
- Shun He
- School of Mathematical Sciences, Peking University, Beijing, 10087, P.R. China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, 10087, P.R. China
- Center for Statistical Science, Peking University, Beijing, 10087, P.R. China
- * E-mail:
| |
Collapse
|
28
|
Wu N, Huang J, Zhang XF, Ou-Yang L, He S, Zhu Z, Xie W. Weighted Fused Pathway Graphical Lasso for Joint Estimation of Multiple Gene Networks. Front Genet 2019; 10:623. [PMID: 31396259 PMCID: PMC6662592 DOI: 10.3389/fgene.2019.00623] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/13/2019] [Indexed: 01/17/2023] Open
Abstract
Gene regulatory networks (GRNs) are often inferred based on Gaussian graphical models that could identify the conditional dependence among genes by estimating the corresponding precision matrix. Classical Gaussian graphical models are usually designed for single network estimation and ignore existing knowledge such as pathway information. Therefore, they can neither make use of the common information shared by multiple networks, nor can they utilize useful prior information to guide the estimation. In this paper, we propose a new weighted fused pathway graphical lasso (WFPGL) to jointly estimate multiple networks by incorporating prior knowledge derived from known pathways and gene interactions. Based on the assumption that two genes are less likely to be connected if they do not participate together in any pathways, a pathway-based constraint is considered in our model. Moreover, we introduce a weighted fused lasso penalty in our model to take into account prior gene interaction data and common information shared by multiple networks. Our model is optimized based on the alternating direction method of multipliers (ADMM). Experiments on synthetic data demonstrate that our method outperforms other five state-of-the-art graphical models. We then apply our model to two real datasets. Hub genes in our identified state-specific networks show some shared and specific patterns, which indicates the efficiency of our model in revealing the underlying mechanisms of complex diseases.
Collapse
Affiliation(s)
- Nuosi Wu
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Jiang Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
- Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, China
| | - Shan He
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Weixin Xie
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
29
|
He Y, Ji J, Xie L, Zhang X, Xue F. A new insight into underlying disease mechanism through semi-parametric latent differential network model. BMC Bioinformatics 2018; 19:493. [PMID: 30591011 PMCID: PMC6309076 DOI: 10.1186/s12859-018-2461-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types. RESULTS We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness. CONCLUSIONS The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended.
Collapse
Affiliation(s)
- Yong He
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065 USA
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016 USA
| | - Xinsheng Zhang
- School of Management, Fudan University, Shanghai, 200433 China
| | - Fuzhong Xue
- School of Public Health, Shandong University, Jinan, 250012 China
| |
Collapse
|
30
|
Xu T, Ou-Yang L, Hu X, Zhang XF. Identifying Gene Network Rewiring by Integrating Gene Expression and Gene Network Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2079-2085. [PMID: 29994068 DOI: 10.1109/tcbb.2018.2809603] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Exploring the rewiring pattern of gene regulatory networks between different pathological states is an important task in bioinformatics. Although a number of computational approaches have been developed to infer differential networks from high-throughput data, most of them only focus on gene expression data. The valuable static gene regulatory network data accumulated in recent biomedical researches are neglected. In this study, we propose a new Gaussian graphical model-based method to infer differential networks by integrating gene expression and static gene regulatory network data. We first evaluate the empirical performance of our method by comparing with the state-of-the-art methods using simulation data. We also apply our method to The Cancer Genome Atlas data to identify gene network rewiring between ovarian cancers with different platinum responses, and rewiring between breast cancers of luminal A subtype and basal-like subtype. Hub genes in the estimated differential networks rediscover known genes associated with platinum resistance in ovarian cancer and signatures of the breast cancer intrinsic subtypes.
Collapse
|
31
|
Tu JJ, Ou-Yang L, Hu X, Zhang XF. Identifying gene network rewiring by combining gene expression and gene mutation data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1042-1048. [PMID: 29993891 DOI: 10.1109/tcbb.2018.2834529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Understanding how gene dependency networks rewire between different disease states is an important task in genomic research. Although many computational methods have been proposed to undertake this task via differential network analysis, most of them are designed for a predefined data type. With the development of the high throughput technologies, gene activity measurements can be collected from different aspects (e.g., mRNA expression and DNA mutation). Different data types might share some common characteristics and include certain unique properties. New methods are needed to explore the similarity and difference between differential networks estimated from different data types. In this study, we develop a new differential network inference model which identifies gene network rewiring by combining gene expression and gene mutation data. Similarity and difference between different data types are learned via a group bridge penalty function. Simulation studies have demonstrated that our method consistently outperforms the competing methods. We also apply our method to identify gene network rewiring associated with ovarian cancer platinum resistance. There are certain differential edges common to both data types and some differential edges unique to individual data types. Hub genes in the differential networks inferred by our method play important roles in ovarian cancer drug resistance.
Collapse
|
32
|
Zhang XF, Ou-Yang L, Yang S, Hu X, Yan H. DiffGraph: an R package for identifying gene network rewiring using differential graphical models. Bioinformatics 2017; 34:1571-1573. [DOI: 10.1093/bioinformatics/btx836] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 12/21/2017] [Indexed: 01/28/2023] Open
Affiliation(s)
- Xiao-Fei Zhang
- Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Department of Electronic Engineering, College of Information Engineering, Shenzhen University, Shenzhen, China
| | - Shuo Yang
- Department of Respiratory Medicine, Wuhan Number 1 Hospital, Wuhan, China
| | - Xiaohua Hu
- Department of Computer Science, School of Computer, Central China Normal University, Wuhan, China
- Department of Information Science, College of Computing and Informatics, Drexel University, Philadelphia, USA
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|