1
|
Wang XF, Huang L, Wang Y, Guan RC, You ZH, Sheng N, Xie XP, Hou WJ. Multi-view learning framework for predicting unknown types of cancer markers via directed graph neural networks fitting regulatory networks. Brief Bioinform 2024; 25:bbae546. [PMID: 39470307 PMCID: PMC11514060 DOI: 10.1093/bib/bbae546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/02/2024] [Accepted: 10/11/2024] [Indexed: 10/30/2024] Open
Abstract
The discovery of diagnostic and therapeutic biomarkers for complex diseases, especially cancer, has always been a central and long-term challenge in molecular association prediction research, offering promising avenues for advancing the understanding of complex diseases. To this end, researchers have developed various network-based prediction techniques targeting specific molecular associations. However, limitations imposed by reductionism and network representation learning have led existing studies to narrowly focus on high prediction efficiency within single association type, thereby glossing over the discovery of unknown types of associations. Additionally, effectively utilizing network structure to fit the interaction properties of regulatory networks and combining specific case biomarker validations remains an unresolved issue in cancer biomarker prediction methods. To overcome these limitations, we propose a multi-view learning framework, CeRVE, based on directed graph neural networks (DGNN) for predicting unknown type cancer biomarkers. CeRVE effectively extracts and integrates subgraph information through multi-view feature learning. Subsequently, CeRVE utilizes DGNN to simulate the entire regulatory network, propagating node attribute features and extracting various interaction relationships between molecules. Furthermore, CeRVE constructed a comparative analysis matrix of three cancers and adjacent normal tissues through The Cancer Genome Atlas and identified multiple types of potential cancer biomarkers through differential expression analysis of mRNA, microRNA, and long noncoding RNA. Computational testing of multiple types of biomarkers for 72 cancers demonstrates that CeRVE exhibits superior performance in cancer biomarker prediction, providing a powerful tool and insightful approach for AI-assisted disease biomarker discovery.
Collapse
Affiliation(s)
- Xin-Fei Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Ren-Chu Guan
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Youyi West Road, Xi’an, 710072, China
| | - Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Xu-Ping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| | - Wen-Ju Hou
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, No. 2699, Qianjin Street, Changchun, 130012, China
| |
Collapse
|
2
|
Li HY, Chen HY, Wang L, Song SJ, You ZH, Yan X, Yu JQ. A structural deep network embedding model for predicting associations between miRNA and disease based on molecular association network. Sci Rep 2021; 11:12640. [PMID: 34135401 PMCID: PMC8209151 DOI: 10.1038/s41598-021-91991-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 04/30/2021] [Indexed: 02/05/2023] Open
Abstract
Previous studies indicated that miRNA plays an important role in human biological processes especially in the field of diseases. However, constrained by biotechnology, only a small part of the miRNA-disease associations has been verified by biological experiment. This impel that more and more researchers pay attention to develop efficient and high-precision computational methods for predicting the potential miRNA-disease associations. Based on the assumption that molecules are related to each other in human physiological processes, we developed a novel structural deep network embedding model (SDNE-MDA) for predicting miRNA-disease association using molecular associations network. Specifically, the SDNE-MDA model first integrating miRNA attribute information by Chao Game Representation (CGR) algorithm and disease attribute information by disease semantic similarity. Secondly, we extract feature by structural deep network embedding from the heterogeneous molecular associations network. Then, a comprehensive feature descriptor is constructed by combining attribute information and behavior information. Finally, Convolutional Neural Network (CNN) is adopted to train and classify these feature descriptors. In the five-fold cross validation experiment, SDNE-MDA achieved AUC of 0.9447 with the prediction accuracy of 87.38% on the HMDD v3.0 dataset. To further verify the performance of SDNE-MDA, we contrasted it with different feature extraction models and classifier models. Moreover, the case studies with three important human diseases, including Breast Neoplasms, Kidney Neoplasms, Lymphoma were implemented by the proposed model. As a result, 47, 46 and 46 out of top-50 predicted disease-related miRNAs have been confirmed by independent databases. These results anticipate that SDNE-MDA would be a reliable computational tool for predicting potential miRNA-disease associations.
Collapse
Affiliation(s)
- Hao-Yuan Li
- grid.411510.00000 0000 9030 231XSchool of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116 China
| | - Hai-Yan Chen
- Xinjiang Autonomous Region tax Service, State Taxation Administration, Urumqi, 830011 China
| | - Lei Wang
- grid.9227.e0000000119573309Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China
| | - Shen-Jian Song
- Science & Technology Department of Xinjiang Uygur Autonomous Region, Urumqi, 830011 China
| | - Zhu-Hong You
- grid.9227.e0000000119573309Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China
| | - Xin Yan
- grid.411510.00000 0000 9030 231XSchool of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116 China
| | - Jin-Qian Yu
- grid.411510.00000 0000 9030 231XSchool of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116 China
| |
Collapse
|
3
|
Zhao BW, You ZH, Hu L, Guo ZH, Wang L, Chen ZH, Wong L. A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning. Cancers (Basel) 2021; 13:2111. [PMID: 33925568 PMCID: PMC8123765 DOI: 10.3390/cancers13092111] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 04/20/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open
Abstract
Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhen-Hao Guo
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Lei Wang
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| | - Zhan-Heng Chen
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China;
| | - Leon Wong
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; (B.-W.Z.); (L.H.); (Z.-H.G.); (L.W.); (L.W.)
- University of Chinese Academy of Sciences, Beijing 100049, China
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
| |
Collapse
|
4
|
Zhao BW, You ZH, Wong L, Zhang P, Li HY, Wang L. MGRL: Predicting Drug-Disease Associations Based on Multi-Graph Representation Learning. Front Genet 2021; 12:657182. [PMID: 34054920 PMCID: PMC8153989 DOI: 10.3389/fgene.2021.657182] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/15/2021] [Indexed: 11/13/2022] Open
Abstract
Drug repositioning is an application-based solution based on mining existing drugs to find new targets, quickly discovering new drug-disease associations, and reducing the risk of drug discovery in traditional medicine and biology. Therefore, it is of great significance to design a computational model with high efficiency and accuracy. In this paper, we propose a novel computational method MGRL to predict drug-disease associations based on multi-graph representation learning. More specifically, MGRL first uses the graph convolution network to learn the graph representation of drugs and diseases from their self-attributes. Then, the graph embedding algorithm is used to represent the relationships between drugs and diseases. Finally, the two kinds of graph representation learning features were put into the random forest classifier for training. To the best of our knowledge, this is the first work to construct a multi-graph to extract the characteristics of drugs and diseases to predict drug-disease associations. The experiments show that the MGRL can achieve a higher AUC of 0.8506 based on five-fold cross-validation, which is significantly better than other existing methods. Case study results show the reliability of the proposed method, which is of great significance for practical applications.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Ping Zhang
- The School of Computer Sciences, BaoJi University of Arts and Sciences, Baoji, China
| | - Hao-Yuan Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Lei Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| |
Collapse
|
5
|
Guo ZH, You ZH, Wang YB, Huang DS, Yi HC, Chen ZH. Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities. Gigascience 2020; 9:giaa032. [PMID: 32533701 PMCID: PMC7293023 DOI: 10.1093/gigascience/giaa032] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/06/2020] [Accepted: 03/13/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. RESULTS We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. CONCLUSIONS Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bin Wang
- School of Cyber Science and Technology, Zhejiang University, Hangzhou 310000, Zhejiang, China
| | - De-Shuang Huang
- Computer Science Department, Tongji University, Shanghai 200000, China
| | - Hai-Cheng Yi
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhan-Heng Chen
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|