1
|
Zhang Y, Lin X, Gao Z, Wang T, Dong K, Zhang J. An omics data analysis method based on feature linear relationship and graph convolutional network. J Biomed Inform 2023; 145:104479. [PMID: 37634557 DOI: 10.1016/j.jbi.2023.104479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 07/26/2023] [Accepted: 08/23/2023] [Indexed: 08/29/2023]
Abstract
Biological networks are known to be highly modular, and the dysfunction of network modules may cause diseases. Defining the key modules from the omics data and establishing the classification model is helpful in promoting the research of disease diagnosis and prognosis. However, for applying modules in downstream analysis such as disease states discrimination, most methods only utilize the node information, and ignore the node interactions or topological information, which may lead to false positives and limit the model performance. In this study, we propose an omics data analysis method based on feature linear relationship and graph convolutional network (LCNet). In LCNet, we adopt a way of applying the difference of feature linear relationships during disease development to characterize physiological and pathological changes and construct the differential linear relation network, which is simple and interpretable from the perspective of feature linear relationship. A greedy strategy is developed for searching the highly interactive modules with a strong discrimination ability. To fully utilize the information of the detected modules, the personalized sub-graphs for each sample based on the modules are defined, and the graph convolutional network (GCN) classifiers are trained to predict the sample labels. The experimental results on public datasets show the superiority of LCNet in classification performance. For Breast Cancer metabolic data, the identified metabolites by LCNet involve important pathways. Thus, LCNet can identify the module biomarkers by feature linear relationship and a greedy strategy, and label samples by personalized sub-graphs and GCN. It provides a new manner of utilizing node (molecule) information and topological information in the defined modules for better disease classification.
Collapse
Affiliation(s)
- Yanhui Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China.
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Tianxiang Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Kunjie Dong
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
| | - Jianjun Zhang
- Cancer Hospital of Dalian University of Technology (Liaoning Cancer Hospital & Institute), Liaoning, China
| |
Collapse
|
2
|
Zhang Y, Chang X, Xia J, Huang Y, Sun S, Chen L, Liu X. Identifying network biomarkers of cancer by sample-specific differential network. BMC Bioinformatics 2022; 23:230. [PMID: 35705908 PMCID: PMC9202129 DOI: 10.1186/s12859-022-04772-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 06/02/2022] [Indexed: 02/08/2023] Open
Abstract
Abundant datasets generated from various big science projects on diseases have presented great challenges and opportunities, which contributed to unfolding the complexity of diseases. The discovery of disease-associated molecular networks for each individual plays an important role in personalized therapy and precision treatment of cancer-based on the reference networks. However, there are no effective ways to distinguish the consistency of different reference networks. In this study, we developed a statistical method, i.e. a sample-specific differential network (SSDN), to construct and analyze such networks based on gene expression of a single sample against a reference dataset. We proved that the SSDN is structurally consistent even with different reference datasets if the reference dataset can follow certain conditions. The SSDN also can be used to identify patient-specific disease modules or network biomarkers as well as predict the potential driver genes of a tumor sample.
Collapse
Affiliation(s)
- Yu Zhang
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.,Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou, 310024, China.,School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance & Economics, Bengbu, 233030, China.
| | - Jie Xia
- Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Science, Shanghai, 200031, China
| | - Yanhong Huang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Shaoyan Sun
- School of Mathematics and Statistics, Ludong University, Yantai, 264025, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China. .,Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou, 310024, China. .,Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Science, Shanghai, 200031, China. .,West China Biomedical Big Data Center, Med-X center for informatics, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Xiaoping Liu
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China. .,Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou, 310024, China. .,School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China.
| |
Collapse
|
3
|
Song TZ, Zhen XC, Gao W, Zhu W. Identification of potential driving genes in prostatic cancer using complex network analysis. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2021.2015722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- TaekWon Zeyuan Song
- School of Mathematics and Statistics, Shandong University(Weihai), Weihai, People’s Republic of China
| | - Xiao-Cong Zhen
- School of Mathematics and Statistics, Shandong University(Weihai), Weihai, People’s Republic of China
| | - Wensuo Gao
- Medical Department, Weishan People’s Hospital, Jining, People’s Republic of China
| | - Wenyan Zhu
- Medical Department, Weishan People’s Hospital, Jining, People’s Republic of China
| |
Collapse
|
4
|
Gao Y, Chang X, Xia J, Sun S, Mu Z, Liu X. Identification of HCC-Related Genes Based on Differential Partial Correlation Network. Front Genet 2021; 12:672117. [PMID: 34335688 PMCID: PMC8320536 DOI: 10.3389/fgene.2021.672117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 05/20/2021] [Indexed: 01/01/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is one of the most common causes of cancer-related death, but its pathogenesis is still unclear. As the disease is involved in multiple biological processes, systematic identification of disease genes and module biomarkers can provide a better understanding of disease mechanisms. In this study, we provided a network-based approach to integrate multi-omics data and discover disease-related genes. We applied our method to HCC data from The Cancer Genome Atlas (TCGA) database and obtained a functional module with 15 disease-related genes as network biomarkers. The results of classification and hierarchical clustering demonstrate that the identified functional module can effectively distinguish between the disease and the control group in both supervised and unsupervised methods. In brief, this computational method to identify potential functional disease modules could be useful to disease diagnosis and further mechanism study of complex diseases.
Collapse
Affiliation(s)
- Yuyao Gao
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou, China
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, China
| | - Jie Xia
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Shaoyan Sun
- School of Mathematics and Statistics, Ludong University, Yantai, China
| | - Zengchao Mu
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Xiaoping Liu
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou, China
- School of Mathematics and Statistics, Shandong University, Weihai, China
| |
Collapse
|
5
|
Differential metabolic network construction for personalized medicine: Study of type 2 diabetes mellitus patients' response to gliclazide-modified-release-treated. J Biomed Inform 2021; 118:103796. [PMID: 33932596 DOI: 10.1016/j.jbi.2021.103796] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 02/26/2021] [Accepted: 04/26/2021] [Indexed: 11/21/2022]
Abstract
Individual variation in genetic and environmental factors can cause the differences in metabolic phenotypes, which may have an effect on drug responses of patients. Deep exploration of patients' responses to therapeutic agents is a crucial and urgent event in the personalized treatment study. Using machine learning methods for the discovery of suitability evaluation biomarkers can provide deep insight into the mechanism of disease therapy and facilitate the development of personalized medicine. To find important metabolic network signals for the prediction of patients' drug responses, a novel method referred to as differential metabolic network construction (DMNC) was proposed. In DMNC, concentration changes in metabolite ratios between different pathological states are measured to construct differential metabolic networks, which can be used to advance clinical decision-making. In this study, DMNC was applied to characterize type 2 diabetes mellitus (T2DM) patients' responses against gliclazide modified-release (MR) therapy. Two T2DM metabolomics datasets from different batches of subjects treated by gliclazide MR were analyzed in depth. A network biomarker was defined to assess the patients' suitability for gliclazide MR. It can be effective in the prediction of significant responders from nonsignificant responders, achieving area under the curve values of 0.893 and 1.000 for the discovery and validation sets, respectively. Compared with the metabolites selected by the other methods, the network biomarker selected by DMNC was more stable and precise to reflect the metabolic responses in patients to gliclazide MR therapy, thereby contributing for the personalized medicine of T2DM patients. The better performance of DMNC validated its potential for the identification of network biomarkers to characterize the responses against therapeutic treatments and provide valuable information for personalized medicine.
Collapse
|
6
|
Zhang Y, Chang X, Liu X. Inference of gene regulatory networks using pseudo-time series data. Bioinformatics 2021; 37:2423-2431. [PMID: 33576787 DOI: 10.1093/bioinformatics/btab099] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 01/18/2021] [Accepted: 02/10/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. RESULTS Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. AVAILABILITY The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuelei Zhang
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| | - Xiao Chang
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, 233030, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310012, China.,School of Mathematics and Statistics, Shandong University, Weihai, Shandong, 264209, China
| |
Collapse
|
7
|
Identification of Long Noncoding RNA Biomarkers for Hepatocellular Carcinoma Using Single-Sample Networks. BIOMED RESEARCH INTERNATIONAL 2020; 2020:8579651. [PMID: 33299877 PMCID: PMC7700720 DOI: 10.1155/2020/8579651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/19/2020] [Accepted: 10/29/2020] [Indexed: 02/07/2023]
Abstract
Objective Many studies have found that long noncoding RNAs (lncRNAs) are differentially expressed in hepatocellular carcinoma (HCC) and closely associated with the occurrence and prognosis of HCC. Since patients with HCC are usually diagnosed in late stages, more effective biomarkers for early diagnosis and prognostic prediction are in urgent need. Methods The RNA-seq data of liver hepatocellular carcinoma (LIHC) were downloaded from The Cancer Genome Atlas (TCGA). Differentially expressed lncRNAs and mRNAs were obtained using the edgeR package. The single-sample networks of the 371 tumor samples were constructed to identify the candidate lncRNA biomarkers. Univariate Cox regression analysis was performed to further select the potential lncRNA biomarkers. By multivariate Cox regression analysis, a 3-lncRNA-based risk score model was established on the training set. Then, the survival prediction ability of the 3-lncRNA-based risk score model was evaluated on the testing set and the entire set. Function enrichment analyses were performed using Metascape. Results Three lncRNAs (RP11-150O12.3, RP11-187E13.1, and RP13-143G15.4) were identified as the potential lncRNA biomarkers for LIHC. The 3-lncRNA-based risk model had a good survival prediction ability for the patients with LIHC. Multivariate Cox regression analysis proved that the 3-lncRNA-based risk score was an independent predictor for the survival prediction of patients with LIHC. Function enrichment analysis indicated that the three lncRNAs may be associated with LIHC via their involvement in many known cancer-associated biological functions. Conclusion This study could provide novel insights to identify lncRNA biomarkers for LIHC at a molecular network level.
Collapse
|
8
|
Predicting Functional Modules of Liver Cancer Based on Differential Network Analysis. Interdiscip Sci 2019; 11:636-644. [DOI: 10.1007/s12539-018-0314-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 11/19/2018] [Accepted: 12/10/2018] [Indexed: 11/27/2022]
|
9
|
Optimizing miRNA-module diagnostic biomarkers of gastric carcinoma via integrated network analysis. PLoS One 2018; 13:e0198445. [PMID: 29879180 PMCID: PMC5991748 DOI: 10.1371/journal.pone.0198445] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 05/18/2018] [Indexed: 12/17/2022] Open
Abstract
Several microRNAs (miRNAs) have been suggested as novel biomarkers for diagnosing gastric cancer (GC) at an early stage, but the single-marker strategy may ignore the co-regulatory relationships and lead to low diagnostic specificity. Thus, multi-target modular diagnostic biomarkers are urgently needed. In this study, a Zsummary and NetSVM-based method was used to identify GC-related hub miRNAs and activated modules from clinical miRNA co-expression networks. The NetSVM-based sub-network consisting of the top 20 hub miRNAs reached a high sensitivity and specificity of 0.94 and 0.82. The Zsummary algorithm identified an activated module (miR-486, miR-451, miR-185, and miR-600) which might serve as diagnostic biomarker of GC. Three members of this module were previously suggested as biomarkers of GC and its 24 target genes were significantly enriched in pathways directly related to cancer. The weighted diagnostic ROC AUC of this module was 0.838, and an optimized module unit (miR-451 and miR-185) obtained a higher value of 0.904, both of which were higher than that of individual miRNAs. These hub miRNAs and module have the potential to become robust biomarkers for early diagnosis of GC with further validations. Moreover, such modular analysis may offer valuable insights into multi-target approaches to cancer diagnosis and treatment.
Collapse
|
10
|
Cooper-Knock J, Green C, Altschuler G, Wei W, Bury JJ, Heath PR, Wyles M, Gelsthorpe C, Highley JR, Lorente-Pons A, Beck T, Doyle K, Otero K, Traynor B, Kirby J, Shaw PJ, Hide W. A data-driven approach links microglia to pathology and prognosis in amyotrophic lateral sclerosis. Acta Neuropathol Commun 2017; 5:23. [PMID: 28302159 PMCID: PMC5353945 DOI: 10.1186/s40478-017-0424-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 03/06/2017] [Indexed: 12/12/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease that lacks a predictive and broadly applicable biomarker. Continued focus on mutation-specific upstream mechanisms has yet to predict disease progression in the clinic. Utilising cellular pathology common to the majority of ALS patients, we implemented an objective transcriptome-driven approach to develop noninvasive prognostic biomarkers for disease progression. Genes expressed in laser captured motor neurons in direct correlation (Spearman rank correlation, p < 0.01) with counts of neuropathology were developed into co-expression network modules. Screening modules using three gene sets representing rate of disease progression and upstream genetic association with ALS led to the prioritisation of a single module enriched for immune response to motor neuron degeneration. Genes in the network module are important for microglial activation and predict disease progression in genetically heterogeneous ALS cohorts: Expression of three genes in peripheral lymphocytes - LILRA2, ITGB2 and CEBPD – differentiate patients with rapid and slowly progressive disease, suggesting promise as a blood-derived biomarker. TREM2 is a member of the network module and the level of soluble TREM2 protein in cerebrospinal fluid is shown to predict survival when measured in late stage disease (Spearman rank correlation, p = 0.01). Our data-driven systems approach has, for the first time, directly linked microglia to the development of motor neuron pathology. LILRA2, ITGB2 and CEBPD represent peripherally accessible candidate biomarkers and TREM2 provides a broadly applicable therapeutic target for ALS.
Collapse
|