1
|
Lin W, Ji J, Zhu Y, Li M, Zhao J, Xue F, Yuan Z. PMINR: Pointwise Mutual Information-Based Network Regression - With Application to Studies of Lung Cancer and Alzheimer's Disease. Front Genet 2020; 11:556259. [PMID: 33193633 PMCID: PMC7594515 DOI: 10.3389/fgene.2020.556259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 08/12/2020] [Indexed: 11/13/2022] Open
Abstract
Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer’s disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis.
Collapse
Affiliation(s)
- Weiqiang Lin
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jiadong Ji
- Department of Data Science, School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Yuchen Zhu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Mingzhuo Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jinghua Zhao
- Cardiovasucular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
| |
Collapse
|
2
|
Li H, Geng Z, Sun X, Yu Y, Xue F. A novel path-specific effect statistic for identifying the differential specific paths in systems epidemiology. BMC Genet 2020; 21:85. [PMID: 32770935 PMCID: PMC7414699 DOI: 10.1186/s12863-020-00876-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 06/25/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Biological pathways play an important role in the occurrence, development and recovery of complex diseases, such as cancers, which are multifactorial complex diseases that are generally caused by mutation of multiple genes or dysregulation of pathways. RESULTS We propose a path-specific effect statistic (PSE) to detect the differential specific paths under two conditions (e.g. case VS. control groups, exposure Vs. nonexposure groups). In observational studies, the path-specific effect can be obtained by separately calculating the average causal effect of each directed edge through adjusting for the parent nodes of nodes in the specific path and multiplying them under each condition. Theoretical proofs and a series of simulations are conducted to validate the path-specific effect statistic. Applications are also performed to evaluate its practical performances. A series of simulation studies show that the Type I error rates of PSE with Permutation tests are more stable at the nominal level 0.05 and can accurately detect the differential specific paths when comparing with other methods. Specifically, the power reveals an increasing trends with the enlargement of path-specific effects and its effect differences under two conditions. Besides, the power of PSE is robust to the variation of parent or child node of the nodes on specific paths. Application to real data of Glioblastoma Multiforme (GBM), we successfully identified 14 positive specific pathways in mTOR pathway contributing to survival time of patients with GBM. All codes for automatic searching specific paths linking two continuous variables and adjusting set as well as PSE statistic can be found in supplementary materials. CONCLUSION: The proposed PSE statistic can accurately detect the differential specific pathways contributing to complex disease and thus potentially provides new insights and ways to unlock the black box of disease mechanisms.
Collapse
Affiliation(s)
- Hongkai Li
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| | - Zhi Geng
- School of Mathematical Sciences, Peking University, Beijing, 100000 People’s Republic of China
| | - Xiaoru Sun
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| | - Yuanyuan Yu
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| | - Fuzhong Xue
- Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, 250000 People’s Republic of China
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, People’s Republic of China
| |
Collapse
|
3
|
Effectiveness of the Execution and Prevention of Metric-Based Adversarial Attacks on Social Network Data †. INFORMATION 2020. [DOI: 10.3390/info11060306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Observed social networks are often considered as proxies for underlying social networks. The analysis of observed networks oftentimes involves the identification of influential nodes via various centrality measures. This paper brings insights from research on adversarial attacks on machine learning systems to the domain of social networks by studying strategies by which an adversary can minimally perturb the observed network structure to achieve their target function of modifying the ranking of a target node according to centrality measures. This can represent the attempt of an adversary to boost or demote the degree to which others perceive individual nodes as influential or powerful. We study the impact of adversarial attacks on targets and victims, and identify metric-based security strategies to mitigate such attacks. We conduct a series of controlled experiments on synthetic network data to identify attacks that allow the adversary to achieve their objective with a single move. We then replicate the experiments with empirical network data. We run our experiments on common network topologies and use common centrality measures. We identify a small set of moves that result in the adversary achieving their objective. This set is smaller for decreasing centrality measures than for increasing them. For both synthetic and empirical networks, we observe that larger networks are less prone to adversarial attacks than smaller ones. Adversarial moves have a higher impact on cellular and small-world networks, while random and scale-free networks are harder to perturb. Also, empirical networks are harder to attack than synthetic networks. Using correlation analysis on our experimental results, we identify how combining measures with low correlation can aid in reducing the effectiveness of adversarial moves. Our results also advance the knowledge about the robustness of centrality measures to network perturbations. The notion of changing social network data to yield adversarial outcomes has practical implications, e.g., for information diffusion on social media, influence and power dynamics in social systems, and developing solutions to improving network security.
Collapse
|
4
|
Jin S, Zeng X, Xia F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform 2020; 22:1902-1917. [PMID: 32363401 DOI: 10.1093/bib/bbaa043] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 02/19/2020] [Accepted: 03/05/2020] [Indexed: 01/07/2023] Open
Abstract
The increase in biological data and the formation of various biomolecule interaction databases enable us to obtain diverse biological networks. These biological networks provide a wealth of raw materials for further understanding of biological systems, the discovery of complex diseases and the search for therapeutic drugs. However, the increase in data also increases the difficulty of biological networks analysis. Therefore, algorithms that can handle large, heterogeneous and complex data are needed to better analyze the data of these network structures and mine their useful information. Deep learning is a branch of machine learning that extracts more abstract features from a larger set of training data. Through the establishment of an artificial neural network with a network hierarchy structure, deep learning can extract and screen the input information layer by layer and has representation learning ability. The improved deep learning algorithm can be used to process complex and heterogeneous graph data structures and is increasingly being applied to the mining of network data information. In this paper, we first introduce the used network data deep learning models. After words, we summarize the application of deep learning on biological networks. Finally, we discuss the future development prospects of this field.
Collapse
|
5
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
6
|
Ji J, He D, Feng Y, He Y, Xue F, Xie L. JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data. Bioinformatics 2018; 33:3080-3087. [PMID: 28582486 DOI: 10.1093/bioinformatics/btx360] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 06/01/2017] [Indexed: 12/26/2022] Open
Abstract
Motivation A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. Results We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. Availability and implementation R scripts available at https://github.com/jijiadong/JDINAC. Contact lxie@iscb.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA
| | - Yang Feng
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | - Yong He
- Department of Mathematical Statistics, School of Statistics, Shandong University of Finance and Economics, Jinan 250014, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, Jinan 250012, China
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, NY 10016, USA.,Department of Computer Science, Hunter College, The City University of New York, NY 10065, USA
| |
Collapse
|
7
|
A powerful weighted statistic for detecting group differences of directed biological networks. Sci Rep 2016; 6:34159. [PMID: 27686331 PMCID: PMC5054825 DOI: 10.1038/srep34159] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 09/08/2016] [Indexed: 12/15/2022] Open
Abstract
Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. Different physiological conditions such as cases and controls may manifest as different networks. Statistical comparison between biological networks can provide not only new insight into the disease mechanism but statistical guidance for drug development. However, the methods developed in previous studies are inadequate to capture the changes in both the nodes and edges, and often ignore the network structure. In this study, we present a powerful weighted statistical test for group differences of directed biological networks, which is independent of the network attributes and can capture the changes in both the nodes and edges, as well as simultaneously accounting for the network structure through putting more weights on the difference of nodes locating on relatively more important position. Simulation studies illustrate that this method had better performance than previous ones under various sample sizes and network structures. One application to GWAS of leprosy successfully identifies the specific gene interaction network contributing to leprosy. Another real data analysis significantly identifies a new biological network, which is related to acute myeloid leukemia. One potential network responsible for lung cancer has also been significantly detected. The source R code is available on our website.
Collapse
|
8
|
Ji J, Yuan Z, Zhang X, Xue F. A powerful score-based statistical test for group difference in weighted biological networks. BMC Bioinformatics 2016; 17:86. [PMID: 26867929 PMCID: PMC4751708 DOI: 10.1186/s12859-016-0916-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 01/29/2016] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. A key but inadequately addressed issue is how to test possible differences of the networks between two groups. Group-level comparison of network properties may shed light on underlying disease mechanisms and benefit the design of drug targets for complex diseases. We therefore proposed a powerful score-based statistic to detect group difference in weighted networks, which simultaneously capture the vertex changes and edge changes. RESULTS Simulation studies indicated that the proposed network difference measure (NetDifM) was stable and outperformed other methods existed, under various sample sizes and network topology structure. One application to real data about GWAS of leprosy successfully identified the specific gene interaction network contributing to leprosy. For additional gene expression data of ovarian cancer, two candidate subnetworks, PI3K-AKT and Notch signaling pathways, were considered and identified respectively. CONCLUSIONS The proposed method, accounting for the vertex changes and edge changes simultaneously, is valid and powerful to capture the group difference of biological networks.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| | - Xiaoshuai Zhang
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Shandong University, PO Box 100, Jinan, 250012, Shandong, China.
| |
Collapse
|
9
|
Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinformatics 2015; 16:327. [PMID: 26453322 PMCID: PMC4600256 DOI: 10.1186/s12859-015-0735-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 08/18/2015] [Indexed: 12/13/2022] Open
Abstract
Background In cancer research, the comparison of gene expression or DNA methylation networks inferred from healthy controls and patients can lead to the discovery of biological pathways associated to the disease. As a cancer progresses, its signalling and control networks are subject to some degree of localised re-wiring. Being able to detect disrupted interaction patterns induced by the presence or progression of the disease can lead to the discovery of novel molecular diagnostic and prognostic signatures. Currently there is a lack of scalable statistical procedures for two-network comparisons aimed at detecting localised topological differences. Results We propose the dGHD algorithm, a methodology for detecting differential interaction patterns in two-network comparisons. The algorithm relies on a statistic, the Generalised Hamming Distance (GHD), for assessing the degree of topological difference between networks and evaluating its statistical significance. dGHD builds on a non-parametric permutation testing framework but achieves computationally efficiency through an asymptotic normal approximation. Conclusions We show that the GHD is able to detect more subtle topological differences compared to a standard Hamming distance between networks. This results in the dGHD algorithm achieving high performance in simulation studies as measured by sensitivity and specificity. An application to the problem of detecting differential DNA co-methylation subnetworks associated to ovarian cancer demonstrates the potential benefits of the proposed methodology for discovering network-derived biomarkers associated with a trait of interest.
Collapse
Affiliation(s)
- Da Ruan
- Department of Biomedical Engineering, King's College London, London, SE1 7EH, UK.
| | - Alastair Young
- Department of Biomedical Engineering, King's College London, London, SE1 7EH, UK.
| | - Giovanni Montana
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK. .,Department of Biomedical Engineering, King's College London, London, SE1 7EH, UK.
| |
Collapse
|
10
|
Oates CJ, Amos R, Spencer SEF. Quantifying the multi-scale performance of network inference algorithms. Stat Appl Genet Mol Biol 2015; 13:611-31. [PMID: 25153244 DOI: 10.1515/sagmb-2014-0012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Graphical models are widely used to study complex multivariate biological systems. Network inference algorithms aim to reverse-engineer such models from noisy experimental data. It is common to assess such algorithms using techniques from classifier analysis. These metrics, based on ability to correctly infer individual edges, possess a number of appealing features including invariance to rank-preserving transformation. However, regulation in biological systems occurs on multiple scales and existing metrics do not take into account the correctness of higher-order network structure. In this paper novel performance scores are presented that share the appealing properties of existing scores, whilst capturing ability to uncover regulation on multiple scales. Theoretical results confirm that performance of a network inference algorithm depends crucially on the scale at which inferences are to be made; in particular strong local performance does not guarantee accurate reconstruction of higher-order topology. Applying these scores to a large corpus of data from the DREAM5 challenge, we undertake a data-driven assessment of estimator performance. We find that the "wisdom of crowds" network, that demonstrated superior local performance in the DREAM5 challenge, is also among the best performing methodologies for inference of regulation on multiple length scales.
Collapse
|
11
|
Ji J, Yuan Z, Zhang X, Li F, Xu J, Liu Y, Li H, Wang J, Xue F. Detection for pathway effect contributing to disease in systems epidemiology with a case-control design. BMJ Open 2015; 5:e006721. [PMID: 25596199 PMCID: PMC4298111 DOI: 10.1136/bmjopen-2014-006721] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
OBJECTIVES Identification of pathway effects responsible for specific diseases has been one of the essential tasks in systems epidemiology. Despite some advance in procedures for distinguishing specific pathway (or network) topology between different disease status, statistical inference at a population level remains unsolved and further development is still needed. To identify the specific pathways contributing to diseases, we attempt to develop powerful statistics which can capture the complex relationship among risk factors. SETTING AND PARTICIPANTS Acute myeloid leukaemia (AML) data obtained from 133 adults (98 patients and 35 controls; 47% female). RESULTS Simulation studies indicated that the proposed Pathway Effect Measures (PEM) were stable; bootstrap-based methods outperformed the others, with bias-corrected bootstrap CI method having the highest power. Application to real data of AML successfully identified the specific pathway (Treg→TGFβ→Th17) effect contributing to AML with p values less than 0.05 under various methods and the bias-corrected bootstrap CI (-0.214 to -0.020). It demonstrated that Th17-Treg correlation balance was impaired in patients with AML, suggesting that Th17-Treg imbalance potentially plays a role in the pathogenesis of AML. CONCLUSIONS The proposed bootstrap-based PEM are valid and powerful for detecting the specific pathway effect contributing to disease, thus potentially providing new insight into the underlying mechanisms and ways to study the disease effects of specific pathways more comprehensively.
Collapse
Affiliation(s)
- Jiadong Ji
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan, China
| | - Zhongshang Yuan
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan, China
| | - Xiaoshuai Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan, China
| | - Fangyu Li
- Department of Neurology, Capital Medical University, Xuanwu Hospital, Beijing, China
| | - Jing Xu
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan, China
| | - Ying Liu
- Department of Public Health and Clinical Medicine, Umea University, Umea, Sweden
| | - Hongkai Li
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan, China
| | - Jia Wang
- School of Mathematics, Shandong University, Jinan, China
| | - Fuzhong Xue
- Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan, China
| |
Collapse
|