1
|
Du L, Gao P, Liu Z, Yin N, Wang X. TMODINET: A trustworthy multi-omics dynamic learning integration network for cancer diagnostic. Comput Biol Chem 2024; 113:108202. [PMID: 39243551 DOI: 10.1016/j.compbiolchem.2024.108202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/23/2024] [Accepted: 08/31/2024] [Indexed: 09/09/2024]
Abstract
Multiple types of omics data contain a wealth of biomedical information which reflect different aspects of clinical samples. Multi-omics integrated analysis is more likely to lead to more accurate clinical decisions. Existing cancer diagnostic methods based on multi-omics data integration mainly focus on the classification accuracy of the model, while neglecting the interpretability of the internal mechanism and the reliability of the results, which are crucial in specific domains such as precision medicine and the life sciences. To overcome this limitation, we propose a trustworthy multi-omics dynamic learning framework (TMODINET) for cancer diagnostic. The framework employs multi-omics adaptive dynamic learning to process each sample to provide patient-centered personality diagnosis by using self-attentional learning of features and modalities. To characterize the correlation between samples well, we introduce a graph dynamic learning method which can adaptively adjust the graph structure according to the specific classification results for specific graph convolutional networks (GCN) learning. Moreover, we utilize an uncertainty mechanism by employing Dirichlet distribution and Dempster-Shafer theory to obtain uncertainty and integrate multi-omics data at the decision level, ensuring trustworthy for cancer diagnosis. Extensive experiments on four real-world multimodal medical datasets are conducted. Compared to state-of-the-art methods, the superior performance and trustworthiness of our proposed algorithm are clearly validated. Our model has great potential for clinical diagnosis.
Collapse
Affiliation(s)
- Ling Du
- Department of Software, Tiangong University, Tianjin, China.
| | - Peipei Gao
- Department of Computer Science and Technology, Tiangong University, Tianjin, China.
| | - Zhuang Liu
- School of FinTech, Research Center of Applied Finance Dongbei University of Finance & Economics, Dalian, China.
| | - Nan Yin
- Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
| | - Xiaochao Wang
- Department of Mathematical Sciences, Tiangong University, Tianjin, China.
| |
Collapse
|
2
|
Lan W, Li C, Chen Q, Yu N, Pan Y, Zheng Y, Chen YPP. LGCDA: Predicting CircRNA-Disease Association Based on Fusion of Local and Global Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1413-1422. [PMID: 38607720 DOI: 10.1109/tcbb.2024.3387913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
CircRNA has been shown to be involved in the occurrence of many diseases. Several computational frameworks have been proposed to identify circRNA-disease associations. Despite the existing computational methods have obtained considerable successes, these methods still require to be improved as their performance may degrade due to the sparsity of the data and the problem of memory overflow. We develop a novel computational framework called LGCDA to predict circRNA-disease associations by fusing local and global features to solve the above mentioned problems. First, we construct closed local subgraphs by using k-hop closed subgraph and label the subgraphs to obtain rich graph pattern information. Then, the local features are extracted by using graph neural network (GNN). In addition, we fuse Gaussian interaction profile (GIP) kernel and cosine similarity to obtain global features. Finally, the score of circRNA-disease associations is predicted by using the multilayer perceptron (MLP) based on local and global features. We perform five-fold cross validation on five datasets for model evaluation and our model surpasses other advanced methods.
Collapse
|
3
|
Li R, Su X, Zhang H, Zhang X, Yao Y, Zhou S, Zhang B, Ye M, Lv C. Integration of Diffusion Transformer and Knowledge Graph for Efficient Cucumber Disease Detection in Agriculture. PLANTS (BASEL, SWITZERLAND) 2024; 13:2435. [PMID: 39273919 PMCID: PMC11396938 DOI: 10.3390/plants13172435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 08/15/2024] [Accepted: 08/30/2024] [Indexed: 09/15/2024]
Abstract
In this study, a deep learning method combining knowledge graph and diffusion Transformer has been proposed for cucumber disease detection. By incorporating the diffusion attention mechanism and diffusion loss function, the research aims to enhance the model's ability to recognize complex agricultural disease features and to address the issue of sample imbalance efficiently. Experimental results demonstrate that the proposed method outperforms existing deep learning models in cucumber disease detection tasks. Specifically, the method achieved a precision of 93%, a recall of 89%, an accuracy of 92%, and a mean average precision (mAP) of 91%, with a frame rate of 57 frames per second (FPS). Additionally, the study successfully implemented model lightweighting, enabling effective operation on mobile devices, which supports rapid on-site diagnosis of cucumber diseases. The research not only optimizes the performance of cucumber disease detection, but also opens new possibilities for the application of deep learning in the field of agricultural disease detection.
Collapse
Affiliation(s)
- Ruiheng Li
- China Agricultural University, Beijing 100083, China
| | - Xiaotong Su
- China Agricultural University, Beijing 100083, China
| | - Hang Zhang
- China Agricultural University, Beijing 100083, China
| | - Xiyan Zhang
- China Agricultural University, Beijing 100083, China
| | - Yifan Yao
- China Agricultural University, Beijing 100083, China
| | - Shutian Zhou
- China Agricultural University, Beijing 100083, China
| | - Bohan Zhang
- China Agricultural University, Beijing 100083, China
| | - Muyang Ye
- China Agricultural University, Beijing 100083, China
| | - Chunli Lv
- China Agricultural University, Beijing 100083, China
| |
Collapse
|
4
|
Xuan P, Wang W, Cui H, Wang S, Nakaguchi T, Zhang T. Mask-Guided Target Node Feature Learning and Dynamic Detailed Feature Enhancement for lncRNA-Disease Association Prediction. J Chem Inf Model 2024; 64:6662-6675. [PMID: 39112431 DOI: 10.1021/acs.jcim.4c00652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Identifying new relevant long noncoding RNAs (lncRNAs) for various human diseases can facilitate the exploration of the causes and progression of these diseases. Recently, several graph inference methods have been proposed to predict disease-related lncRNAs by exploiting the topological structure and node attributes within graphs. However, these methods did not prioritize the target lncRNA and disease nodes over auxiliary nodes like miRNA nodes, potentially limiting their ability to fully utilize the features of the target nodes. We propose a new method, mask-guided target node feature learning and dynamic detailed feature enhancement for lncRNA-disease association prediction (MDLD), to enhance node feature learning for improved lncRNA-disease association prediction. First, we designed a heterogeneous graph masked transformer autoencoder to guide feature learning, focusing more on the features of target lncRNA (disease) nodes. The target nodes were increasingly masked as training progressed, which helps develop a more robust prediction model. Second, we developed a graph convolutional network with dynamic residuals (GCNDR) to learn and integrate the heterogeneous topology and features of all lncRNA, disease, and miRNA nodes. GCNDR employs an interlayer residual strategy and a residual evolution strategy to mitigate oversmoothing caused by multilayer graph convolution. The interlayer residual strategy estimates the importance of node features learned in the previous GCN encoding layer for nodes in the current encoding layer. Additionally, since there are dependencies in the importance of features of individual lncRNA (disease, miRNA) nodes across multiple encoding layers, a gated recurrent unit-based strategy is proposed to encode these dependencies. Finally, we designed a perspective-level attention mechanism to obtain more informative features of lncRNA and disease node pairs from the perspectives of mask-enhanced and dynamic-enhanced node features. Cross-validation experimental results demonstrated that MDLD outperformed 10 other state-of-the-art prediction methods. Ablation experiments and case studies on candidate lncRNAs for three diseases further proved the technical contributions of MDLD and its capability to discover disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Wei Wang
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
5
|
Zhang Y, Wang Z, Wei H, Chen M. Exploring potential circRNA biomarkers for cancers based on double-line heterogeneous graph representation learning. BMC Med Inform Decis Mak 2024; 24:159. [PMID: 38844961 PMCID: PMC11157868 DOI: 10.1186/s12911-024-02564-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUND Compared with the time-consuming and labor-intensive for biological validation in vitro or in vivo, the computational models can provide high-quality and purposeful candidates in an instant. Existing computational models face limitations in effectively utilizing sparse local structural information for accurate predictions in circRNA-disease associations. This study addresses this challenge with a proposed method, CDA-DGRL (Prediction of CircRNA-Disease Association based on Double-line Graph Representation Learning), which employs a deep learning framework leveraging graph networks and a dual-line representation model integrating graph node features. METHOD CDA-DGRL comprises several key steps: initially, the integration of diverse biological information to compute integrated similarities among circRNAs and diseases, leading to the construction of a heterogeneous network specific to circRNA-disease associations. Subsequently, circRNA and disease node features are derived using sparse autoencoders. Thirdly, a graph convolutional neural network is employed to capture the local graph network structure by inputting the circRNA-disease heterogeneous network alongside node features. Fourthly, the utilization of node2vec facilitates depth-first sampling of the circRNA-disease heterogeneous network to grasp the global graph network structure, addressing issues associated with sparse raw data. Finally, the fusion of local and global graph network structures is inputted into an extra trees classifier to identify potential circRNA-disease associations. RESULTS The results, obtained through a rigorous five-fold cross-validation on the circR2Disease dataset, demonstrate the superiority of CDA-DGRL with an AUC value of 0.9866 and an AUPR value of 0.9897 compared to existing state-of-the-art models. Notably, the hyper-random tree classifier employed in this model outperforms other machine learning classifiers. CONCLUSION Thus, CDA-DGRL stands as a promising methodology for reliably identifying circRNA-disease associations, offering potential avenues to alleviate the necessity for extensive traditional biological experiments. The source code and data for this study are available at https://github.com/zywait/CDA-DGRL .
Collapse
Affiliation(s)
- Yi Zhang
- School of Computer Science and Engineering, Guilin University of Technology, Guilin, 541004, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, 541004, China
| | - ZhenMei Wang
- School of Big Data, Guangxi Vocational and Technical College, Nanning, 530003, China.
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, 541004, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421010, China
| |
Collapse
|
6
|
Luo Y, Duan G, Zhao Q, Bi X, Wang J. DTKGIN: Predicting drug-target interactions based on knowledge graph and intent graph. Methods 2024; 226:21-27. [PMID: 38608849 DOI: 10.1016/j.ymeth.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 01/16/2024] [Accepted: 04/09/2024] [Indexed: 04/14/2024] Open
Abstract
Knowledge graph intent graph attention mechanism Predicting drug-target interactions (DTIs) plays a crucial role in drug discovery and drug development. Considering the high cost and risk of biological experiments, developing computational approaches to explore the interactions between drugs and targets can effectively reduce the time and cost of drug development. Recently, many methods have made significant progress in predicting DTIs. However, existing approaches still suffer from the high sparsity of DTI datasets and the cold start problem. In this paper, we develop a new model to predict drug-target interactions via a knowledge graph and intent graph named DTKGIN. Our method can effectively capture biological environment information for targets and drugs by mining their associated relations in the knowledge graph and considering drug-target interactions at a fine-grained level in the intent graph. DTKGIN learns the representation of drugs and targets from the knowledge graph and the intent graph. Then the probabilities of interactions between drugs and targets are obtained through the inner product of the representation of drugs and targets. Experimental results show that our proposed method outperforms other state-of-the-art methods in 10-fold cross-validation, especially in cold-start experimental settings. Furthermore, the case studies demonstrate the effectiveness of DTKGIN in predicting potential drug-target interactions. The code is available on GitHub: https://github.com/Royluoyi123/DTKGIN.
Collapse
Affiliation(s)
- Yi Luo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.
| | - Qichang Zhao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Xuehua Bi
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
7
|
Lan W, Liao H, Chen Q, Zhu L, Pan Y, Chen YPP. DeepKEGG: a multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Brief Bioinform 2024; 25:bbae185. [PMID: 38678587 PMCID: PMC11056029 DOI: 10.1093/bib/bbae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/07/2024] [Accepted: 04/09/2024] [Indexed: 05/01/2024] Open
Abstract
Deep learning-based multi-omics data integration methods have the capability to reveal the mechanisms of cancer development, discover cancer biomarkers and identify pathogenic targets. However, current methods ignore the potential correlations between samples in integrating multi-omics data. In addition, providing accurate biological explanations still poses significant challenges due to the complexity of deep learning models. Therefore, there is an urgent need for a deep learning-based multi-omics integration method to explore the potential correlations between samples and provide model interpretability. Herein, we propose a novel interpretable multi-omics data integration method (DeepKEGG) for cancer recurrence prediction and biomarker discovery. In DeepKEGG, a biological hierarchical module is designed for local connections of neuron nodes and model interpretability based on the biological relationship between genes/miRNAs and pathways. In addition, a pathway self-attention module is constructed to explore the correlation between different samples and generate the potential pathway feature representation for enhancing the prediction performance of the model. Lastly, an attribution-based feature importance calculation method is utilized to discover biomarkers related to cancer recurrence and provide a biological interpretation of the model. Experimental results demonstrate that DeepKEGG outperforms other state-of-the-art methods in 5-fold cross validation. Furthermore, case studies also indicate that DeepKEGG serves as an effective tool for biomarker discovery. The code is available at https://github.com/lanbiolab/DeepKEGG.
Collapse
Affiliation(s)
- Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Haibo Liao
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Qingfeng Chen
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, School of Computer, Electronic and Information, Guangxi University, No. 100 Daxue Road, Xixiangtang District, Nanning 530004, China
| | - Lingzhi Zhu
- School of Computer and Information Science, Hunan Institute of Technology, No. 18 Henghua Road, Zhuhui District, Hengyang 421002, China
| | - Yi Pan
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Plenty Rd, Bundoora, Melbourne, Victoria 3086, Australia
| |
Collapse
|
8
|
Peng L, Yang Y, Yang C, Li Z, Cheong N. HRGCNLDA: Forecasting of lncRNA-disease association based on hierarchical refinement graph convolutional neural network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:4814-4834. [PMID: 38872515 DOI: 10.3934/mbe.2024212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Long non-coding RNA (lncRNA) is considered to be a crucial regulator involved in various human biological processes, including the regulation of tumor immune checkpoint proteins. It has great potential as both a cancer biomolecular biomarker and therapeutic target. Nevertheless, conventional biological experimental techniques are both resource-intensive and laborious, making it essential to develop an accurate and efficient computational method to facilitate the discovery of potential links between lncRNAs and diseases. In this study, we proposed HRGCNLDA, a computational approach utilizing hierarchical refinement of graph convolutional neural networks for forecasting lncRNA-disease potential associations. This approach effectively addresses the over-smoothing problem that arises from stacking multiple layers of graph convolutional neural networks. Specifically, HRGCNLDA enhances the layer representation during message propagation and node updates, thereby amplifying the contribution of hidden layers that resemble the ego layer while reducing discrepancies. The results of the experiments showed that HRGCNLDA achieved the highest AUC-ROC (area under the receiver operating characteristic curve, AUC for short) and AUC-PR (area under the precision versus recall curve, AUPR for short) values compared to other methods. Finally, to further demonstrate the reliability and efficacy of our approach, we performed case studies on the case of three prevalent human diseases, namely, breast cancer, lung cancer and gastric cancer.
Collapse
Affiliation(s)
- Li Peng
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
- Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Yujie Yang
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Cheng Yang
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang 421002, China
| | - Ngai Cheong
- Faculty of Applied Sciences, Macao Polytechnic University, Macau 999078, China
| |
Collapse
|
9
|
Lu P, Zhang W, Wu J. AMPCDA: Prediction of circRNA-disease associations by utilizing attention mechanisms on metapaths. Comput Biol Chem 2024; 108:107989. [PMID: 38016366 DOI: 10.1016/j.compbiolchem.2023.107989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/24/2023] [Accepted: 11/15/2023] [Indexed: 11/30/2023]
Abstract
Researchers have been creating an expanding corpus of experimental evidences in biomedical field which has revealed prevalent associations between circRNAs and human diseases. Such linkages unveiled afforded a new perspective for elucidating etiology and devise innovative therapeutic strategies. In recent years, many computational methods were introduced to remedy the limitations of inefficiency and exorbitant budgets brought by conventional lab-experimental approaches to enumerate possible circRNA-disease associations, but the majority of existing methods still face challenges in effectively integrating node embeddings with higher-order neighborhood representations, which might hinder the final predictive accuracy from attaining optimal measures. To overcome such constraints, we proposed AMPCDA, a computational technique harnessing predefined metapaths to predict circRNA-disease associations. Specifically, an association graph is initially built upon three source databases and two similarity derivation procedures, and DeepWalk is subsequently imposed on the graph to procure initial feature representations. Vectorial embeddings of metapath instances, concatenated by initial node features, are then fed through a customed encoder. By employing self-attention section, metapath-specific contributions to each node are accumulated before combining with node's intrinsic features and channeling into a graph attention module, which furnished the input representations for the multilayer perceptron to predict the ultimate association probability scores. By integrating graph topology features and node embedding themselves, AMPCDA managed to effectively leverage information carried by multiple nodes along paths and exhibited an exceptional predictive performance, achieving AUC values of 0.9623, 0.9675, and 0.9711 under 5-fold cross validation, 10-fold cross validation, and leave-one-out cross validation, respectively. These results signify substantial accuracy improvements compared to other prediction models. Case study assessments confirm the high predictive accuracy of our proposed technique in identifying circRNA-disease connections, highlighting its value in guiding future biological research to reveal new disease mechanisms.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Wenqi Zhang
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Jinkai Wu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| |
Collapse
|
10
|
Lan W, Liu M, Chen J, Ye J, Zheng R, Zhu X, Peng W. JLONMFSC: Clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering. Methods 2024; 222:1-9. [PMID: 38128706 DOI: 10.1016/j.ymeth.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/07/2023] [Accepted: 11/29/2023] [Indexed: 12/23/2023] Open
Abstract
The development of single cell RNA sequencing (scRNA-seq) has provided new perspectives to study biological problems at the single cell level. One of the key issues in scRNA-seq data analysis is to divide cells into several clusters for discovering the heterogeneity and diversity of cells. However, the existing scRNA-seq data are high-dimensional, sparse, and noisy, which challenges the existing single-cell clustering methods. In this study, we propose a joint learning framework (JLONMFSC) for clustering scRNA-seq data. In our method, the dimension of the original data is reduced to minimize the effect of noise. In addition, the graph regularized matrix factorization is used to learn the local features. Further, the Low-Rank Representation (LRR) subspace clustering is utilized to learn the global features. Finally, the joint learning of local features and global features is performed to obtain the results of clustering. We compare the proposed algorithm with eight state-of-the-art algorithms for clustering performance on six datasets, and the experimental results demonstrate that the JLONMFSC achieves better performance in all datasets. The code is avalable at https://github.com/lanbiolab/JLONMFSC.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, China; Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China.
| | - Mingyang Liu
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jianwei Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Jin Ye
- School of Computer, Electronic and Information, Guangxi University, Nanning, China
| | - Ruiqing Zheng
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Information Security, Guilin University of Science and Technology, Guilin, China
| | - Wei Peng
- School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| |
Collapse
|
11
|
Niu M, Wang C, Zhang Z, Zou Q. A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation. BMC Biol 2024; 22:24. [PMID: 38281919 PMCID: PMC10823650 DOI: 10.1186/s12915-024-01826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND Circular RNAs (circRNAs) have been confirmed to play a vital role in the occurrence and development of diseases. Exploring the relationship between circRNAs and diseases is of far-reaching significance for studying etiopathogenesis and treating diseases. To this end, based on the graph Markov neural network algorithm (GMNN) constructed in our previous work GMNN2CD, we further considered the multisource biological data that affects the association between circRNA and disease and developed an updated web server CircDA and based on the human hepatocellular carcinoma (HCC) tissue data to verify the prediction results of CircDA. RESULTS CircDA is built on a Tumarkov-based deep learning framework. The algorithm regards biomolecules as nodes and the interactions between molecules as edges, reasonably abstracts multiomics data, and models them as a heterogeneous biomolecular association network, which can reflect the complex relationship between different biomolecules. Case studies using literature data from HCC, cervical, and gastric cancers demonstrate that the CircDA predictor can identify missing associations between known circRNAs and diseases, and using the quantitative real-time PCR (RT-qPCR) experiment of HCC in human tissue samples, it was found that five circRNAs were significantly differentially expressed, which proved that CircDA can predict diseases related to new circRNAs. CONCLUSIONS This efficient computational prediction and case analysis with sufficient feedback allows us to identify circRNA-associated diseases and disease-associated circRNAs. Our work provides a method to predict circRNA-associated diseases and can provide guidance for the association of diseases with certain circRNAs. For ease of use, an online prediction server ( http://server.malab.cn/CircDA ) is provided, and the code is open-sourced ( https://github.com/nmt315320/CircDA.git ) for the convenience of algorithm improvement.
Collapse
Affiliation(s)
- Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China
| | - Zhanguo Zhang
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1095 Jiefang Avenue, Wuhan, 430030, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 4 Block 2 North Jianshe Road, Chengdu, 610054, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
12
|
Wang S, Hui C, Zhang T, Wu P, Nakaguchi T, Xuan P. Graph Reasoning Method Based on Affinity Identification and Representation Decoupling for Predicting lncRNA-Disease Associations. J Chem Inf Model 2023; 63:6947-6958. [PMID: 37906529 DOI: 10.1021/acs.jcim.3c01214] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
An increasing number of studies have shown that dysregulation of lncRNAs is related to the occurrence of various diseases. Most of the previous methods, however, are designed based on homogeneity assumption that the representation of a target lncRNA (or disease) node should be updated by aggregating the attributes of its neighbor nodes. However, the assumption ignores the affinity nodes that are far from the target node. We present a novel prediction method, GAIRD, to fully leverage the heterogeneous information in the network and the decoupled node features. The first major innovation is a random walk strategy based on width-first searching and depth-first searching. Different from previous methods that only focus on homogeneous information, our new strategy learns both the homogeneous information within local neighborhoods and the heterogeneous information within higher-order neighborhoods. The second innovation is a representation decoupling module to extract the purer attributes and the purer topologies. Third, a module based on group convolution and deep separable convolution is developed to promote the pairwise intrachannel and interchannel feature learning. The experimental results show that GAIRD outperforms comparing state-of-the-art methods, and the ablation studies prove the contributions of major innovations. We also performed case studies on 3 diseases to further demonstrate the effectiveness of the GAIRD model in applications.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Cui Hui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
- Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| |
Collapse
|
13
|
Hu X, Liu D, Zhang J, Fan Y, Ouyang T, Luo Y, Zhang Y, Deng L. A comprehensive review and evaluation of graph neural networks for non-coding RNA and complex disease associations. Brief Bioinform 2023; 24:bbad410. [PMID: 37985451 DOI: 10.1093/bib/bbad410] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/07/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
Non-coding RNAs (ncRNAs) play a critical role in the occurrence and development of numerous human diseases. Consequently, studying the associations between ncRNAs and diseases has garnered significant attention from researchers in recent years. Various computational methods have been proposed to explore ncRNA-disease relationships, with Graph Neural Network (GNN) emerging as a state-of-the-art approach for ncRNA-disease association prediction. In this survey, we present a comprehensive review of GNN-based models for ncRNA-disease associations. Firstly, we provide a detailed introduction to ncRNAs and GNNs. Next, we delve into the motivations behind adopting GNNs for predicting ncRNA-disease associations, focusing on data structure, high-order connectivity in graphs and sparse supervision signals. Subsequently, we analyze the challenges associated with using GNNs in predicting ncRNA-disease associations, covering graph construction, feature propagation and aggregation, and model optimization. We then present a detailed summary and performance evaluation of existing GNN-based models in the context of ncRNA-disease associations. Lastly, we explore potential future research directions in this rapidly evolving field. This survey serves as a valuable resource for researchers interested in leveraging GNNs to uncover the complex relationships between ncRNAs and diseases.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego,92093 CA, USA
| | - Yanhao Fan
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Tianxiang Ouyang
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yue Luo
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yuanpeng Zhang
- school of software, Xinjiang University, 830046 Urumqi, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| |
Collapse
|
14
|
Wu J, Ning Z, Ding Y, Wang Y, Peng Q, Fu L. KGETCDA: an efficient representation learning framework based on knowledge graph encoder from transformer for predicting circRNA-disease associations. Brief Bioinform 2023; 24:bbad292. [PMID: 37587836 DOI: 10.1093/bib/bbad292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/27/2023] [Accepted: 07/27/2023] [Indexed: 08/18/2023] Open
Abstract
Recent studies have demonstrated the significant role that circRNA plays in the progression of human diseases. Identifying circRNA-disease associations (CDA) in an efficient manner can offer crucial insights into disease diagnosis. While traditional biological experiments can be time-consuming and labor-intensive, computational methods have emerged as a viable alternative in recent years. However, these methods are often limited by data sparsity and their inability to explore high-order information. In this paper, we introduce a novel method named Knowledge Graph Encoder from Transformer for predicting CDA (KGETCDA). Specifically, KGETCDA first integrates more than 10 databases to construct a large heterogeneous non-coding RNA dataset, which contains multiple relationships between circRNA, miRNA, lncRNA and disease. Then, a biological knowledge graph is created based on this dataset and Transformer-based knowledge representation learning and attentive propagation layers are applied to obtain high-quality embeddings with accurately captured high-order interaction information. Finally, multilayer perceptron is utilized to predict the matching scores of CDA based on their embeddings. Our empirical results demonstrate that KGETCDA significantly outperforms other state-of-the-art models. To enhance user experience, we have developed an interactive web-based platform named HNRBase that allows users to visualize, download data and make predictions using KGETCDA with ease. The code and datasets are publicly available at https://github.com/jinyangwu/KGETCDA.
Collapse
Affiliation(s)
- Jinyang Wu
- School of Automation Science and Engineering, Xi'an Jiaotong University, 710049, Shaanxi, China
| | - Zhiwei Ning
- School of Automation Science and Engineering, Xi'an Jiaotong University, 710049, Shaanxi, China
| | - Yidong Ding
- School of Automation Science and Engineering, Xi'an Jiaotong University, 710049, Shaanxi, China
| | - Ying Wang
- School of Automation Science and Engineering, Xi'an Jiaotong University, 710049, Shaanxi, China
| | - Qinke Peng
- School of Automation Science and Engineering, Xi'an Jiaotong University, 710049, Shaanxi, China
| | - Laiyi Fu
- School of Automation Science and Engineering, Xi'an Jiaotong University, 710049, Shaanxi, China
- Research Institute of Xi'an Jiaotong University, 311200, Zhejiang, China
- Sichuan Digital Economy Industry Development Research Institute, 610036, Sichuan, China
| |
Collapse
|
15
|
Ai N, Liang Y, Yuan H, Ouyang D, Xie S, Liu X. GDCL-NcDA: identifying non-coding RNA-disease associations via contrastive learning between deep graph learning and deep matrix factorization. BMC Genomics 2023; 24:424. [PMID: 37501127 PMCID: PMC10373414 DOI: 10.1186/s12864-023-09501-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 07/02/2023] [Indexed: 07/29/2023] Open
Abstract
Non-coding RNAs (ncRNAs) draw much attention from studies widely in recent years because they play vital roles in life activities. As a good complement to wet experiment methods, computational prediction methods can greatly save experimental costs. However, high false-negative data and insufficient use of multi-source information can affect the performance of computational prediction methods. Furthermore, many computational methods do not have good robustness and generalization on different datasets. In this work, we propose an effective end-to-end computing framework, called GDCL-NcDA, of deep graph learning and deep matrix factorization (DMF) with contrastive learning, which identifies the latent ncRNA-disease association on diverse multi-source heterogeneous networks (MHNs). The diverse MHNs include different similarity networks and proven associations among ncRNAs (miRNAs, circRNAs, and lncRNAs), genes, and diseases. Firstly, GDCL-NcDA employs deep graph convolutional network and multiple attention mechanisms to adaptively integrate multi-source of MHNs and reconstruct the ncRNA-disease association graph. Then, GDCL-NcDA utilizes DMF to predict the latent disease-associated ncRNAs based on the reconstructed graphs to reduce the impact of the false-negatives from the original associations. Finally, GDCL-NcDA uses contrastive learning (CL) to generate a contrastive loss on the reconstructed graphs and the predicted graphs to improve the generalization and robustness of our GDCL-NcDA framework. The experimental results show that GDCL-NcDA outperforms highly related computational methods. Moreover, case studies demonstrate the effectiveness of GDCL-NcDA in identifying the associations among diversiform ncRNAs and diseases.
Collapse
Affiliation(s)
- Ning Ai
- Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China
- School of Computer Science and Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China.
- Pazhou Laboratory (Huangpu), Guangzhou, 510555, Guangdong, China.
| | - Haoliang Yuan
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, Guangdong, China
| | - Dong Ouyang
- Peng Cheng Laboratory, Shenzhen, 518005, Guangdong, China
- School of Computer Science and Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China
| | - Shengli Xie
- Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou, 510000, Guangdong, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, Guangdong, 519090, China
| |
Collapse
|
16
|
Abu-Salih B, AL-Qurishi M, Alweshah M, AL-Smadi M, Alfayez R, Saadeh H. Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities. JOURNAL OF BIG DATA 2023; 10:81. [PMID: 37274445 PMCID: PMC10225120 DOI: 10.1186/s40537-023-00774-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 05/17/2023] [Indexed: 06/06/2023]
Abstract
The incorporation of data analytics in the healthcare industry has made significant progress, driven by the demand for efficient and effective big data analytics solutions. Knowledge graphs (KGs) have proven utility in this arena and are rooted in a number of healthcare applications to furnish better data representation and knowledge inference. However, in conjunction with a lack of a representative KG construction taxonomy, several existing approaches in this designated domain are inadequate and inferior. This paper is the first to provide a comprehensive taxonomy and a bird's eye view of healthcare KG construction. Additionally, a thorough examination of the current state-of-the-art techniques drawn from academic works relevant to various healthcare contexts is carried out. These techniques are critically evaluated in terms of methods used for knowledge extraction, types of the knowledge base and sources, and the incorporated evaluation protocols. Finally, several research findings and existing issues in the literature are reported and discussed, opening horizons for future research in this vibrant area.
Collapse
Affiliation(s)
| | | | | | - Mohammad AL-Smadi
- Jordan University of Science and Technology, Irbid, Jordan
- Qatar University, Doha, Qatar
| | | | | |
Collapse
|
17
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
18
|
Lu C, Zhang L, Zeng M, Lan W, Duan G, Wang J. Inferring disease-associated circRNAs by multi-source aggregation based on heterogeneous graph neural network. Brief Bioinform 2023; 24:6960978. [PMID: 36572658 DOI: 10.1093/bib/bbac549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 11/03/2022] [Accepted: 11/11/2022] [Indexed: 12/28/2022] Open
Abstract
Emerging evidence has proved that circular RNAs (circRNAs) are implicated in pathogenic processes. They are regarded as promising biomarkers for diagnosis due to covalently closed loop structures. As opposed to traditional experiments, computational approaches can identify circRNA-disease associations at a lower cost. Aggregating multi-source pathogenesis data helps to alleviate data sparsity and infer potential associations at the system level. The majority of computational approaches construct a homologous network using multi-source data, but they lose the heterogeneity of the data. Effective methods that use the features of multi-source data are considered as a matter of urgency. In this paper, we propose a model (CDHGNN) based on edge-weighted graph attention and heterogeneous graph neural networks for potential circRNA-disease association prediction. The circRNA network, micro RNA network, disease network and heterogeneous network are constructed based on multi-source data. To reflect association probabilities between nodes, an edge-weighted graph attention network model is designed for node features. To assign attention weights to different types of edges and learn contextual meta-path, CDHGNN infers potential circRNA-disease association based on heterogeneous neural networks. CDHGNN outperforms state-of-the-art algorithms in terms of accuracy. Edge-weighted graph attention networks and heterogeneous graph networks have both improved performance significantly. Furthermore, case studies suggest that CDHGNN is capable of identifying specific molecular associations and investigating biomolecular regulatory relationships in pathogenesis. The code of CDHGNN is freely available at https://github.com/BioinformaticsCSU/CDHGNN.
Collapse
Affiliation(s)
- Chengqian Lu
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China.,School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China
| | - Lishen Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, 530004, Guangxi, China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, Hunan, China
| |
Collapse
|
19
|
Lan W, Dong Y, Zhang H, Li C, Chen Q, Liu J, Wang J, Chen YPP. Benchmarking of computational methods for predicting circRNA-disease associations. Brief Bioinform 2023; 24:6972300. [PMID: 36611256 DOI: 10.1093/bib/bbac613] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 10/29/2022] [Accepted: 12/11/2022] [Indexed: 01/09/2023] Open
Abstract
Accumulating evidences demonstrate that circular RNA (circRNA) plays an important role in human diseases. Identification of circRNA-disease associations can help for the diagnosis of human diseases, while the traditional method based on biological experiments is time-consuming. In order to address the limitation, a series of computational methods have been proposed in recent years. However, few works have summarized these methods or compared the performance of them. In this paper, we divided the existing methods into three categories: information propagation, traditional machine learning and deep learning. Then, the baseline methods in each category are introduced in detail. Further, 5 different datasets are collected, and 14 representative methods of each category are selected and compared in the 5-fold, 10-fold cross-validation and the de novo experiment. In order to further evaluate the effectiveness of these methods, six common cancers are selected to compare the number of correctly identified circRNA-disease associations in the top-10, top-20, top-50, top-100 and top-200. In addition, according to the results, the observation about the robustness and the character of these methods are concluded. Finally, the future directions and challenges are discussed.
Collapse
Affiliation(s)
- Wei Lan
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Yi Dong
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Hongyu Zhang
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Chunling Li
- School of Computer, Electronic and Information and Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, Guangxi 530004, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information and State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, Guangxi 530004, China
| | - Jin Liu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
20
|
Yao D, Nong L, Qin M, Wu S, Yao S. Identifying circRNA-miRNA interaction based on multi-biological interaction fusion. Front Microbiol 2022; 13:987930. [PMID: 36620017 PMCID: PMC9815023 DOI: 10.3389/fmicb.2022.987930] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
CircRNA is a new type of non-coding RNA with a closed loop structure. More and more biological experiments show that circRNA plays important roles in many diseases by regulating the target genes of miRNA. Therefore, correct identification of the potential interaction between circRNA and miRNA not only helps to understand the mechanism of the disease, but also contributes to the diagnosis, treatment, and prognosis of the disease. In this study, we propose a model (IIMCCMA) by using network embedding and matrix completion to predict the potential interaction of circRNA-miRNA. Firstly, the corresponding adjacency matrix is constructed based on the experimentally verified circRNA-miRNA interaction, circRNA-cancer interaction, and miRNA-cancer interaction. Then, the Gaussian kernel function and the cosine function are used to calculate the circRNA Gaussian interaction profile kernel similarity, circRNA functional similarity, miRNA Gaussian interaction profile kernel similarity, and miRNA functional similarity. In order to reduce the influence of noise and redundant information in known interactions, this model uses network embedding to extract the potential feature vectors of circRNA and miRNA, respectively. Finally, an improved inductive matrix completion algorithm based on the feature vectors of circRNA and miRNA is used to identify potential interactions between circRNAs and miRNAs. The 10-fold cross-validation experiment is utilized to prove the predictive ability of the IIMCCMA. The experimental results show that the AUC value and AUPR value of the IIMCCMA model are higher than other state-of-the-art algorithms. In addition, case studies show that the IIMCCMA model can correctly identify the potential interactions between circRNAs and miRNAs.
Collapse
Affiliation(s)
- Dunwei Yao
- Department of Gastroenterology, The People’s Hospital of Baise, Baise, China,The Southwest Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China
| | - Lidan Nong
- Department of Child Healthcare, Baise Maternal and Child Hospital, Baise, China
| | - Minzhen Qin
- Department of Gastroenterology, The People’s Hospital of Baise, Baise, China,The Southwest Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China
| | - Shengbin Wu
- The Southwest Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China,Department of Pulmonary and Critical Care Medicine, The People's Hospital of Baise, Baise, China
| | - Shunhan Yao
- Medical College of Guangxi University, Nanning, China,*Correspondence: Shunhan Yao,
| |
Collapse
|
21
|
DRGCNCDA: Predicting circRNA-disease interactions based on knowledge graph and disentangled relational graph convolutional network. Methods 2022; 208:35-41. [DOI: 10.1016/j.ymeth.2022.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/15/2022] [Accepted: 10/10/2022] [Indexed: 11/06/2022] Open
|
22
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine learning-based methods for RNA data analysis—Volume II. Front Genet 2022; 13:1010089. [DOI: 10.3389/fgene.2022.1010089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open
|
23
|
Bang D, Gu J, Park J, Jeong D, Koo B, Yi J, Shin J, Jung I, Kim S, Lee S. A Survey on Computational Methods for Investigation on ncRNA-Disease Association through the Mode of Action Perspective. Int J Mol Sci 2022; 23:ijms231911498. [PMID: 36232792 PMCID: PMC9570358 DOI: 10.3390/ijms231911498] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/18/2022] [Accepted: 09/26/2022] [Indexed: 02/01/2023] Open
Abstract
Molecular and sequencing technologies have been successfully used in decoding biological mechanisms of various diseases. As revealed by many novel discoveries, the role of non-coding RNAs (ncRNAs) in understanding disease mechanisms is becoming increasingly important. Since ncRNAs primarily act as regulators of transcription, associating ncRNAs with diseases involves multiple inference steps. Leveraging the fast-accumulating high-throughput screening results, a number of computational models predicting ncRNA-disease associations have been developed. These tools suggest novel disease-related biomarkers or therapeutic targetable ncRNAs, contributing to the realization of precision medicine. In this survey, we first introduce the biological roles of different ncRNAs and summarize the databases containing ncRNA-disease associations. Then, we suggest a new trend in recent computational prediction of ncRNA-disease association, which is the mode of action (MoA) network perspective. This perspective includes integrating ncRNAs with mRNA, pathway and phenotype information. In the next section, we describe computational methodologies widely used in this research domain. Existing computational studies are then summarized in terms of their coverage of the MoA network. Lastly, we discuss the potential applications and future roles of the MoA network in terms of integrating biological mechanisms for ncRNA-disease associations.
Collapse
Affiliation(s)
- Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Jeonghyeon Gu
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
| | - Joonhyeong Park
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Bonil Koo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Jungseob Yi
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
| | - Jihye Shin
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Inuk Jung
- Department of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul 08826, Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Korea
- MOGAM Institute for Biomedical Research, Yongin-si 16924, Korea
| | - Sunho Lee
- AIGENDRUG Co., Ltd., Seoul 08826, Korea
- Correspondence:
| |
Collapse
|
24
|
Li G, Lin Y, Luo J, Xiao Q, Liang C. GGAECDA: predicting circRNA-disease associations using graph autoencoder based on graph representation learning. Comput Biol Chem 2022; 99:107722. [DOI: 10.1016/j.compbiolchem.2022.107722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/25/2022] [Accepted: 06/30/2022] [Indexed: 11/27/2022]
|
25
|
Zhang HY, Wang L, You ZH, Hu L, Zhao BW, Li ZW, Li YM. iGRLCDA: identifying circRNA-disease association based on graph representation learning. Brief Bioinform 2022; 23:6552271. [PMID: 35323894 DOI: 10.1093/bib/bbac083] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/16/2022] [Accepted: 02/17/2022] [Indexed: 12/18/2022] Open
Abstract
While the technologies of ribonucleic acid-sequence (RNA-seq) and transcript assembly analysis have continued to improve, a novel topology of RNA transcript was uncovered in the last decade and is called circular RNA (circRNA). Recently, researchers have revealed that they compete with messenger RNA (mRNA) and long noncoding for combining with microRNA in gene regulation. Therefore, circRNA was assumed to be associated with complex disease and discovering the relationship between them would contribute to medical research. However, the work of identifying the association between circRNA and disease in vitro takes a long time and usually without direction. During these years, more and more associations were verified by experiments. Hence, we proposed a computational method named identifying circRNA-disease association based on graph representation learning (iGRLCDA) for the prediction of the potential association of circRNA and disease, which utilized a deep learning model of graph convolution network (GCN) and graph factorization (GF). In detail, iGRLCDA first derived the hidden feature of known associations between circRNA and disease using the Gaussian interaction profile (GIP) kernel combined with disease semantic information to form a numeric descriptor. After that, it further used the deep learning model of GCN and GF to extract hidden features from the descriptor. Finally, the random forest classifier is introduced to identify the potential circRNA-disease association. The five-fold cross-validation of iGRLCDA shows strong competitiveness in comparison with other excellent prediction models at the gold standard data and achieved an average area under the receiver operating characteristic curve of 0.9289 and an area under the precision-recall curve of 0.9377. On reviewing the prediction results from the relevant literature, 22 of the top 30 predicted circRNA-disease associations were noted in recent published papers. These exceptional results make us believe that iGRLCDA can provide reliable circRNA-disease associations for medical research and reduce the blindness of wet-lab experiments.
Collapse
Affiliation(s)
- Han-Yuan Zhang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.,College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zheng-Wei Li
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Yang-Ming Li
- College of Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
26
|
Hao X, Chen Q, Pan H, Qiu J, Zhang Y, Yu Q, Han Z, Du X. Enhancing drug-drug interaction prediction by three-way decision and knowledge graph embedding. GRANULAR COMPUTING 2022; 8:67-76. [PMID: 38624759 PMCID: PMC8913867 DOI: 10.1007/s41066-022-00315-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/15/2022] [Indexed: 11/30/2022]
Abstract
Drug-Drug interaction (DDI) prediction is essential in pharmaceutical research and clinical application. Existing computational methods mainly extract data from multiple resources and treat it as binary classification. However, this cannot unambiguously tell the boundary between positive and negative samples owing to the incompleteness and uncertainty of derived data. A granular computing method called three-way decision is proved to be effective in making uncertain decision, but it relies on supplementary information to make delay decision. Recently, biomedical knowledge graph has been regarded as an important source to obtain abundant supplementary information about drugs. This paper proposes a three-way decision-based method called 3WDDI, in combination with knowledge graph embedding as supplementary features to enhance DDI prediction. The drug pairs are divided into positive, negative and boundary regions by Convolutional Neural Network (CNN) according to drug chemical structure feature. Further, delay decision is made for objects in the boundary region by integrating knowledge graph embedding feature to promote the accuracy of decision-making. The empirical results show that 3WDDI yields up to 0.8922, 0.9614, 0.9582, 0.8930 for Accuracy, AUPR, AUC and F1-score, respectively, and outperforms several baseline models.
Collapse
Affiliation(s)
- Xinkun Hao
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Qingfeng Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIA 3086 Australia
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Haiming Pan
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Jie Qiu
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Yuxiao Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Qian Yu
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Zongzhao Han
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| | - Xiaojing Du
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004 Guangxi China
- School of Computer Science and Engineering, Yulin Normal University, Yulin, 537000 Guangxi China
| |
Collapse
|