1
|
Lin W, Ji J, Zhu Y, Li M, Zhao J, Xue F, Yuan Z. PMINR: Pointwise Mutual Information-Based Network Regression - With Application to Studies of Lung Cancer and Alzheimer's Disease. Front Genet 2020; 11:556259. [PMID: 33193633 PMCID: PMC7594515 DOI: 10.3389/fgene.2020.556259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 08/12/2020] [Indexed: 11/13/2022] Open
Abstract
Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer’s disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis.
Collapse
Affiliation(s)
- Weiqiang Lin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Department of Data Science, School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yuchen Zhu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Mingzhuo Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jinghua Zhao
- Cardiovasucular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| |
Collapse
|
2
|
Zhu Y, Ji J, Lin W, Li M, Liu L, Zhu H, Xue F, Li X, Zhou X, Yuan Z. MCC-SP: a powerful integration method for identification of causal pathways from genetic variants to complex disease. BMC Genet 2020; 21:90. [PMID: 32847502 PMCID: PMC7477886 DOI: 10.1186/s12863-020-00899-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 08/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner. RESULTS We first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as "MCC-SP". Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer's disease (AD) through gene expression enriched in Alzheimer's disease pathway. CONCLUSIONS MCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub ( https://github.com/zhuyuchen95/ADnet ).
Collapse
Affiliation(s)
- Yuchen Zhu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Jiadong Ji
- Department of Data Science, School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Weiqiang Lin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Mingzhuo Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Lu Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Huanhuan Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109 USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Xiujun Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109 USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012 Shandong China
| |
Collapse
|
3
|
Li H, Geng Z, Sun X, Yu Y, Xue F. A novel path-specific effect statistic for identifying the differential specific paths in systems epidemiology. BMC Genet 2020; 21:85. [PMID: 32770935 PMCID: PMC7414699 DOI: 10.1186/s12863-020-00876-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 06/25/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Biological pathways play an important role in the occurrence, development and recovery of complex diseases, such as cancers, which are multifactorial complex diseases that are generally caused by mutation of multiple genes or dysregulation of pathways. RESULTS We propose a path-specific effect statistic (PSE) to detect the differential specific paths under two conditions (e.g. case VS. control groups, exposure Vs. nonexposure groups). In observational studies, the path-specific effect can be obtained by separately calculating the average causal effect of each directed edge through adjusting for the parent nodes of nodes in the specific path and multiplying them under each condition. Theoretical proofs and a series of simulations are conducted to validate the path-specific effect statistic. Applications are also performed to evaluate its practical performances. A series of simulation studies show that the Type I error rates of PSE with Permutation tests are more stable at the nominal level 0.05 and can accurately detect the differential specific paths when comparing with other methods. Specifically, the power reveals an increasing trends with the enlargement of path-specific effects and its effect differences under two conditions. Besides, the power of PSE is robust to the variation of parent or child node of the nodes on specific paths. Application to real data of Glioblastoma Multiforme (GBM), we successfully identified 14 positive specific pathways in mTOR pathway contributing to survival time of patients with GBM. All codes for automatic searching specific paths linking two continuous variables and adjusting set as well as PSE statistic can be found in supplementary materials. CONCLUSION: The proposed PSE statistic can accurately detect the differential specific pathways contributing to complex disease and thus potentially provides new insights and ways to unlock the black box of disease mechanisms.
Collapse
Affiliation(s)
- Hongkai Li
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| | - Zhi Geng
- School of Mathematical Sciences, Peking University, Beijing, 100000 People’s Republic of China
| | - Xiaoru Sun
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| | - Yuanyuan Yu
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| | - Fuzhong Xue
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| |
Collapse
|
4
|
Chen H, He Y, Ji J, Shi Y. A Machine Learning Method for Identifying Critical Interactions Between Gene Pairs in Alzheimer's Disease Prediction. Front Neurol 2019; 10:1162. [PMID: 31736866 PMCID: PMC6834789 DOI: 10.3389/fneur.2019.01162] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 10/15/2019] [Indexed: 12/26/2022] Open
Abstract
Background: Alzheimer's disease (AD) is the most common type of dementia. Scientists have discovered that the causes of AD may include a combination of genetic, lifestyle, and environmental factors, but the exact cause has not yet been elucidated. Effective strategies to prevent and treat AD therefore remain elusive. The identified genetic causes of AD mainly focus on individual genes, but growing evidence has shown that complex diseases are usually affected by the interaction of genes in a network. Few studies have focused on the interactions and correlations between genes and how they are gradually destroyed or disappear during AD progression. A differential network analysis has been recognized as an essential tool for identifying the underlying pathogenic mechanisms and significant genes for prediction analysis. We therefore aim to conduct a differential network analysis to reveal potential networks involved in the neuropathogenesis of AD and identify genes for AD prediction. Methods: In this paper, we selected 365 samples from the Religious Orders Study and the Rush Memory and Aging Project, including 193 clinically and neuropathologically confirmed AD subjects and 172 no cognitive impairment (NCI) controls. Then, we selected 158 genes belonging to the AD pathway (hsa05010) of the Kyoto Encyclopedia of Genes and Genomes. We employed a machine learning method, namely, joint density-based non-parametric differential interaction network analysis and classification (JDINAC), in the analysis of gene expression data (RNA-seq data). We searched for the differential networks in the RNA-seq data with a pathological diagnosis of AD. Finally, an optimal prediction model was built through cross-validation, which showed good discrimination and calibration for AD prediction. Results: We used JDINAC to derive a gene co-expression network and to explore the relationship between the interaction of gene pairs and AD, and the top 10 differential gene pairs were identified. We then compared the prediction performance between JDINAC and individual genes based on prediction methods. JDINAC provides better accuracy of classification than the latest methods, such as random forest and penalized logistic regression. Conclusions: The interaction between gene pairs is related to AD and can provide more insight than the individual genes in AD prediction.
Collapse
Affiliation(s)
- Hao Chen
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yong He
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yufeng Shi
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
- Institute for Financial Studies and School of Mathematics, Shandong University, Jinan, China
| |
Collapse
|
5
|
He Y, Ji J, Xie L, Zhang X, Xue F. A new insight into underlying disease mechanism through semi-parametric latent differential network model. BMC Bioinformatics 2018; 19:493. [PMID: 30591011 PMCID: PMC6309076 DOI: 10.1186/s12859-018-2461-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types. RESULTS We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness. CONCLUSIONS The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended.
Collapse
Affiliation(s)
- Yong He
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Jiadong Ji
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014 China
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, 10065 USA
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, 10016 USA
| | - Xinsheng Zhang
- School of Management, Fudan University, Shanghai, 200433 China
| | - Fuzhong Xue
- School of Public Health, Shandong University, Jinan, 250012 China
| |
Collapse
|
6
|
Ji J, He D, Feng Y, He Y, Xue F, Xie L. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics 2018; 33:3080-3087. [PMID: 28582486 DOI: 10.1093/bioinformatics/btx360] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 06/01/2017] [Indexed: 12/26/2022] Open
Abstract
Motivation A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. Results We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. Availability and implementation R scripts available at https://github.com/jijiadong/JDINAC. Contact lxie@iscb.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA
| | - Yang Feng
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | - Yong He
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, Jinan 250012, China
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA.,Department of Computer Science, Hunter College, The City University of New York, NY 10065, USA
| |
Collapse
|
7
|
Al–Taie Z, Thanintorn N, Ersoy I, Kholod O, Taylor K, Hammer R, Shin D. REDESIGN: RDF-based Differential Signaling Framework for Precision Medicine Analytics. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:35-44. [PMID: 29888036 PMCID: PMC5961787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Pathway-based analysis holds promise to be instrumental in precision and personalized medicine analytics. However, the majority of pathway-based analysis methods utilize "fixed" or "rigid" data sets that limit their ability to account for complex biological inter-dependencies. Here, we present REDESIGN: RDF-based Differential Signaling Pathway informatics framework. The distinctive feature of the REDESIGN is that it is designed to run on "flexible" ontology-enabled data sets of curated signal transduction pathway maps to uncover high explanatory differential pathway mechanisms on gene-to-gene level. The experiments on two morphoproteomic cases demonstrated REDESIGN's capability to generate actionable hypotheses in precision/personalized medicine analytics.
Collapse
Affiliation(s)
- Zainab Al–Taie
- MU Informatics Institute, University of Missouri, Columbia, MO,Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| | - Nattapon Thanintorn
- Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| | - Ilker Ersoy
- MU Informatics Institute, University of Missouri, Columbia, MO,Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| | - Olha Kholod
- MU Informatics Institute, University of Missouri, Columbia, MO,Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| | - Kristen Taylor
- Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| | - Richard Hammer
- MU Informatics Institute, University of Missouri, Columbia, MO,Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| | - Dmitriy Shin
- MU Informatics Institute, University of Missouri, Columbia, MO,Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO
| |
Collapse
|
8
|
A powerful weighted statistic for detecting group differences of directed biological networks. Sci Rep 2016; 6:34159. [PMID: 27686331 PMCID: PMC5054825 DOI: 10.1038/srep34159] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 09/08/2016] [Indexed: 12/15/2022] Open
Abstract
Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. Different physiological conditions such as cases and controls may manifest as different networks. Statistical comparison between biological networks can provide not only new insight into the disease mechanism but statistical guidance for drug development. However, the methods developed in previous studies are inadequate to capture the changes in both the nodes and edges, and often ignore the network structure. In this study, we present a powerful weighted statistical test for group differences of directed biological networks, which is independent of the network attributes and can capture the changes in both the nodes and edges, as well as simultaneously accounting for the network structure through putting more weights on the difference of nodes locating on relatively more important position. Simulation studies illustrate that this method had better performance than previous ones under various sample sizes and network structures. One application to GWAS of leprosy successfully identifies the specific gene interaction network contributing to leprosy. Another real data analysis significantly identifies a new biological network, which is related to acute myeloid leukemia. One potential network responsible for lung cancer has also been significantly detected. The source R code is available on our website.
Collapse
|
9
|
Yuan Z, Ji J, Zhang T, Liu Y, Zhang X, Chen W, Xue F. A novel chi-square statistic for detecting group differences between pathways in systems epidemiology. Stat Med 2016; 35:5512-5524. [PMID: 27605026 DOI: 10.1002/sim.7094] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 08/01/2016] [Accepted: 08/16/2016] [Indexed: 12/15/2022]
Abstract
Traditional epidemiology often pays more attention to the identification of a single factor rather than to the pathway that is related to a disease, and therefore, it is difficult to explore the disease mechanism. Systems epidemiology aims to integrate putative lifestyle exposures and biomarkers extracted from multiple omics platforms to offer new insights into the pathway mechanisms that underlie disease at the human population level. One key but inadequately addressed question is how to develop powerful statistics to identify whether one candidate pathway is associated with a disease. Bearing in mind that a pathway difference can result from not only changes in the nodes but also changes in the edges, we propose a novel statistic for detecting group differences between pathways, which in principle, captures the nodes changes and edge changes, as well as simultaneously accounting for the pathway structure simultaneously. The proposed test has been proven to follow the chi-square distribution, and various simulations have shown it has better performance than other existing methods. Integrating genome-wide DNA methylation data, we analyzed one real data set from the Bogalusa cohort study and significantly identified a potential pathway, Smoking → SOCS3 → PIK3R1, which was strongly associated with abdominal obesity. The proposed test was powerful and efficient at identifying pathway differences between two groups, and it can be extended to other disciplines that involve statistical comparisons between pathways. The source code in R is available on our website. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Shandong University, Jinan, 250012, Shandong, China
| | - Jiadong Ji
- Department of Biostatistics, School of Public Health, Shandong University, Jinan, 250012, Shandong, China
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Shandong University, Jinan, 250012, Shandong, China.,Department of Epidemiology, Tulane University Health Sciences Center, Tulane University, New Orleans, LA, U.S.A
| | - Yi Liu
- Department of Biostatistics, School of Public Health, Shandong University, Jinan, 250012, Shandong, China
| | - Xiaoshuai Zhang
- Department of Biostatistics, School of Public Health, Shandong University, Jinan, 250012, Shandong, China
| | - Wei Chen
- Department of Epidemiology, Tulane University Health Sciences Center, Tulane University, New Orleans, LA, U.S.A
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, Jinan, 250012, Shandong, China
| |
Collapse
|
10
|
Ji J, Yuan Z, Zhang X, Xue F. A powerful score-based statistical test for group difference in weighted biological networks. BMC Bioinformatics 2016; 17:86. [PMID: 26867929 PMCID: PMC4751708 DOI: 10.1186/s12859-016-0916-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 01/29/2016] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. A key but inadequately addressed issue is how to test possible differences of the networks between two groups. Group-level comparison of network properties may shed light on underlying disease mechanisms and benefit the design of drug targets for complex diseases. We therefore proposed a powerful score-based statistic to detect group difference in weighted networks, which simultaneously capture the vertex changes and edge changes. RESULTS Simulation studies indicated that the proposed network difference measure (NetDifM) was stable and outperformed other methods existed, under various sample sizes and network topology structure. One application to real data about GWAS of leprosy successfully identified the specific gene interaction network contributing to leprosy. For additional gene expression data of ovarian cancer, two candidate subnetworks, PI3K-AKT and Notch signaling pathways, were considered and identified respectively. CONCLUSIONS The proposed method, accounting for the vertex changes and edge changes simultaneously, is valid and powerful to capture the group difference of biological networks.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| | - Xiaoshuai Zhang
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| |
Collapse
|