1
|
Guo C, Wang X, Ren H. Databases and computational methods for the identification of piRNA-related molecules: A survey. Comput Struct Biotechnol J 2024; 23:813-833. [PMID: 38328006 PMCID: PMC10847878 DOI: 10.1016/j.csbj.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/31/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024] Open
Abstract
Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs (ncRNAs) that plays important roles in many biological processes and major cancer diagnosis and treatment, thus becoming a hot research topic. This study aims to provide an in-depth review of computational piRNA-related research, including databases and computational models. Herein, we perform literature analysis and use comparative evaluation methods to summarize and analyze three aspects of computational piRNA-related research: (i) computational models for piRNA-related molecular identification tasks, (ii) computational models for piRNA-disease association prediction tasks, and (iii) computational resources and evaluation metrics for these tasks. This study shows that computational piRNA-related research has significantly progressed, exhibiting promising performance in recent years, whereas they also suffer from the emerging challenges of inconsistent naming systems and the lack of data. Different from other reviews on piRNA-related identification tasks that focus on the organization of datasets and computational methods, we pay more attention to the analysis of computational models, algorithms, and performances that aim to provide valuable references for computational piRNA-related identification tasks. This study will benefit the theoretical development and practical application of piRNAs by better understanding computational models and resources to investigate the biological functions and clinical implications of piRNA.
Collapse
Affiliation(s)
- Chang Guo
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
| | - Xiaoli Wang
- Institute of Reproductive Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Han Ren
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
- Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou 510420, China
| |
Collapse
|
2
|
Luo L, Tan Z, Wang S. RSANMDA: Resampling based subview attention network for miRNA-disease association prediction. Methods 2024; 230:99-107. [PMID: 39097178 DOI: 10.1016/j.ymeth.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/16/2024] [Accepted: 07/23/2024] [Indexed: 08/05/2024] Open
Abstract
Many studies have demonstrated the importance of accurately identifying miRNA-disease associations (MDAs) for understanding disease mechanisms. However, the number of known MDAs is significantly fewer than the unknown pairs. Here, we propose RSANMDA, a subview attention network for predicting MDAs. We first extract miRNA and disease features from multiple similarity matrices. Next, using resampling techniques, we generate different subviews from known MDAs. Each subview undergoes multi-head graph attention to capture its features, followed by semantic attention to integrate features across subviews. Finally, combining raw and training features, we use a multilayer scoring perceptron for prediction. In the experimental section, we conducted comparative experiments with other advanced models on both HMDD v2.0 and HMDD v3.2 datasets. We also performed a series of ablation studies and parameter tuning exercises. Comprehensive experiments conclusively demonstrate the superiority of our model. Case studies on lung, breast, and esophageal cancers further validate our method's predictive capability for identifying disease-related miRNAs.
Collapse
Affiliation(s)
- Longfei Luo
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Zhuokun Tan
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| |
Collapse
|
3
|
Sun W, Zhang P, Zhang W, Xu J, Huang Y, Li L. Synchronous Mutual Learning Network and Asynchronous Multi-Scale Embedding Network for miRNA-Disease Association Prediction. Interdiscip Sci 2024; 16:532-553. [PMID: 38310628 DOI: 10.1007/s12539-023-00602-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 02/06/2024]
Abstract
MicroRNA (miRNA) serves as a pivotal regulator of numerous cellular processes, and the identification of miRNA-disease associations (MDAs) is crucial for comprehending complex diseases. Recently, graph neural networks (GNN) have made significant advancements in MDA prediction. However, these methods tend to learn one type of node representation from a single heterogeneous network, ignoring the importance of multiple network topologies and node attributes. Here, we propose SMDAP (Sequence hierarchical modeling-based Mirna-Disease Association Prediction framework), a novel GNN-based framework that incorporates multiple network topologies and various node attributes including miRNA seed and full-length sequences to predict potential MDAs. Specifically, SMDAP consists of two types of MDA representation: following a heterogeneous pattern, we construct a transfer learning-like synchronous mutual learning network to learn the first MDA representation in conjunction with the miRNA seed sequence. Meanwhile, following a homogeneous pattern, we design a subgraph-inspired asynchronous multi-scale embedding network to obtain the second MDA representation based on the miRNA full-length sequence. Subsequently, an adaptive fusion approach is designed to combine the two branches such that we can score the MDAs by the downstream classifier and infer novel MDAs. Comprehensive experiments demonstrate that SMDAP integrates the advantages of multiple network topologies and node attributes into two branch representations. Moreover, the area under the receiver operating characteristic curve is 0.9622 on DB1, which is a 5.06% increase from the baselines. The area under the precision-recall curve is 0.9777, which is a 7.33% increase from the baselines. In addition, case studies on three human cancers validated the predictive performance of SMDAP. Overall, SMDAP represents a powerful tool for MDA prediction.
Collapse
Affiliation(s)
- Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ping Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | | | - Li Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
4
|
Wang Z, Wei Z. PT-KGNN: A framework for pre-training biomedical knowledge graphs with graph neural networks. Comput Biol Med 2024; 178:108768. [PMID: 38936076 DOI: 10.1016/j.compbiomed.2024.108768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/23/2024] [Accepted: 06/15/2024] [Indexed: 06/29/2024]
Abstract
Biomedical knowledge graphs (KGs) serve as comprehensive data repositories that contain rich information about nodes and edges, providing modeling capabilities for complex relationships among biological entities. Many approaches either learn node features through traditional machine learning methods, or leverage graph neural networks (GNNs) to directly learn features of target nodes in the biomedical KGs and utilize them for downstream tasks. Motivated by the pre-training technique in natural language processing (NLP), we propose a framework named PT-KGNN (Pre-Training the biomedical KG with GNNs) to learn embeddings of nodes in a broader context by applying GNNs on the biomedical KG. We design several experiments to evaluate the effectivity of our proposed framework and the impact of the scale of KGs. The results of tasks consistently improve as the scale of the biomedical KG used for pre-training increases. Pre-training on large-scale biomedical KGs significantly enhances the drug-drug interaction (DDI) and drug-disease association (DDA) prediction performance on the independent dataset. The embeddings derived from a larger biomedical KG have demonstrated superior performance compared to those obtained from a smaller KG. By applying pre-training techniques on biomedical KGs, rich semantic and structural information can be learned, leading to enhanced performance on downstream tasks. it is evident that pre-training techniques hold tremendous potential and wide-ranging applications in bioinformatics.
Collapse
Affiliation(s)
- Zhenxing Wang
- School of Data Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China.
| | - Zhongyu Wei
- School of Data Science, Fudan University, 220 Handan Rd., Shanghai, 200433, China.
| |
Collapse
|
5
|
Zhang W, Zhang P, Sun W, Xu J, Liao L, Cao Y, Han Y. Improving plant miRNA-target prediction with self-supervised k-mer embedding and spectral graph convolutional neural network. PeerJ 2024; 12:e17396. [PMID: 38799058 PMCID: PMC11122044 DOI: 10.7717/peerj.17396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/25/2024] [Indexed: 05/29/2024] Open
Abstract
Deciphering the targets of microRNAs (miRNAs) in plants is crucial for comprehending their function and the variation in phenotype that they cause. As the highly cell-specific nature of miRNA regulation, recent computational approaches usually utilize expression data to identify the most physiologically relevant targets. Although these methods are effective, they typically require a large sample size and high-depth sequencing to detect potential miRNA-target pairs, thereby limiting their applicability in improving plant breeding. In this study, we propose a novel miRNA-target prediction framework named kmerPMTF (k-mer-based prediction framework for plant miRNA-target). Our framework effectively extracts the latent semantic embeddings of sequences by utilizing k-mer splitting and a deep self-supervised neural network. We construct multiple similarity networks based on k-mer embeddings and employ graph convolutional networks to derive deep representations of miRNAs and targets and calculate the probabilities of potential associations. We evaluated the performance of kmerPMTF on four typical plant datasets: Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, and Prunus persica. The results demonstrate its ability to achieve AUPRC values of 84.9%, 91.0%, 80.1%, and 82.1% in 5-fold cross-validation, respectively. Compared with several state-of-the-art existing methods, our framework achieves better performance on threshold-independent evaluation metrics. Overall, our study provides an efficient and simplified methodology for identifying plant miRNA-target associations, which will contribute to a deeper comprehension of miRNA regulatory mechanisms in plants.
Collapse
Affiliation(s)
- Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Ping Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, China
| | - Liao Liao
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Yunpeng Cao
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Yuepeng Han
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design of Chinese Academy of Sciences, Wuhan, Hubei Province, China
- Sino-African Joint Research Center, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| |
Collapse
|
6
|
Sheng N, Xie X, Wang Y, Huang L, Zhang S, Gao L, Wang H. A Survey of Deep Learning for Detecting miRNA- Disease Associations: Databases, Computational Methods, Challenges, and Future Directions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:328-347. [PMID: 38194377 DOI: 10.1109/tcbb.2024.3351752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
MicroRNAs (miRNAs) are an important class of non-coding RNAs that play an essential role in the occurrence and development of various diseases. Identifying the potential miRNA-disease associations (MDAs) can be beneficial in understanding disease pathogenesis. Traditional laboratory experiments are expensive and time-consuming. Computational models have enabled systematic large-scale prediction of potential MDAs, greatly improving the research efficiency. With recent advances in deep learning, it has become an attractive and powerful technique for uncovering novel MDAs. Consequently, numerous MDA prediction methods based on deep learning have emerged. In this review, we first summarize publicly available databases related to miRNAs and diseases for MDA prediction. Next, we outline commonly used miRNA and disease similarity calculation and integration methods. Then, we comprehensively review the 48 existing deep learning-based MDA computation methods, categorizing them into classical deep learning and graph neural network-based techniques. Subsequently, we investigate the evaluation methods and metrics that are frequently used to assess MDA prediction performance. Finally, we discuss the performance trends of different computational methods, point out some problems in current research, and propose 9 potential future research directions. Data resources and recent advances in MDA prediction methods are summarized in the GitHub repository https://github.com/sheng-n/DL-miRNA-disease-association-methods.
Collapse
|
7
|
Liu Y, Zhang R, Dong X, Yang H, Li J, Cao H, Tian J, Zhang Y. DAE-CFR: detecting microRNA-disease associations using deep autoencoder and combined feature representation. BMC Bioinformatics 2024; 25:139. [PMID: 38553698 PMCID: PMC10981315 DOI: 10.1186/s12859-024-05757-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/20/2024] [Indexed: 04/01/2024] Open
Abstract
BACKGROUND MicroRNA (miRNA) has been shown to play a key role in the occurrence and progression of diseases, making uncovering miRNA-disease associations vital for disease prevention and therapy. However, traditional laboratory methods for detecting these associations are slow, strenuous, expensive, and uncertain. Although numerous advanced algorithms have emerged, it is still a challenge to develop more effective methods to explore underlying miRNA-disease associations. RESULTS In the study, we designed a novel approach on the basis of deep autoencoder and combined feature representation (DAE-CFR) to predict possible miRNA-disease associations. We began by creating integrated similarity matrices of miRNAs and diseases, performing a logistic function transformation, balancing positive and negative samples with k-means clustering, and constructing training samples. Then, deep autoencoder was used to extract low-dimensional feature from two kinds of feature representations for miRNAs and diseases, namely, original association information-based and similarity information-based. Next, we combined the resulting features for each miRNA-disease pair and used a logistic regression (LR) classifier to infer all unknown miRNA-disease interactions. Under five and tenfold cross-validation (CV) frameworks, DAE-CFR not only outperformed six popular algorithms and nine classifiers, but also demonstrated superior performance on an additional dataset. Furthermore, case studies on three diseases (myocardial infarction, hypertension and stroke) confirmed the validity of DAE-CFR in practice. CONCLUSIONS DAE-CFR achieved outstanding performance in predicting miRNA-disease associations and can provide evidence to inform biological experiments and clinical therapy.
Collapse
Affiliation(s)
- Yanling Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Department of Mathematics, Changzhi Medical College, Changzhi, China
| | - Ruiyan Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Xiaojing Dong
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Jing Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hongyan Cao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Jing Tian
- Department of Cardiology, First Hospital of Shanxi Medical University, Taiyuan, China.
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China.
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.
- School of Health and Service Management, Shanxi University of Chinese Medicine, Jinzhong, China.
| |
Collapse
|
8
|
Fan Y, Zhang C, Hu X, Huang Z, Xue J, Deng L. SGCLDGA: unveiling drug-gene associations through simple graph contrastive learning. Brief Bioinform 2024; 25:bbae231. [PMID: 38754409 PMCID: PMC11097980 DOI: 10.1093/bib/bbae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/15/2024] [Accepted: 04/30/2024] [Indexed: 05/18/2024] Open
Abstract
Drug repurposing offers a viable strategy for discovering new drugs and therapeutic targets through the analysis of drug-gene interactions. However, traditional experimental methods are plagued by their costliness and inefficiency. Despite graph convolutional network (GCN)-based models' state-of-the-art performance in prediction, their reliance on supervised learning makes them vulnerable to data sparsity, a common challenge in drug discovery, further complicating model development. In this study, we propose SGCLDGA, a novel computational model leveraging graph neural networks and contrastive learning to predict unknown drug-gene associations. SGCLDGA employs GCNs to extract vector representations of drugs and genes from the original bipartite graph. Subsequently, singular value decomposition (SVD) is employed to enhance the graph and generate multiple views. The model performs contrastive learning across these views, optimizing vector representations through a contrastive loss function to better distinguish positive and negative samples. The final step involves utilizing inner product calculations to determine association scores between drugs and genes. Experimental results on the DGIdb4.0 dataset demonstrate SGCLDGA's superior performance compared with six state-of-the-art methods. Ablation studies and case analyses validate the significance of contrastive learning and SVD, highlighting SGCLDGA's potential in discovering new drug-gene associations. The code and dataset for SGCLDGA are freely available at https://github.com/one-melon/SGCLDGA.
Collapse
Affiliation(s)
- Yanhao Fan
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Che Zhang
- School of software, Xinjiang University, 830046, Urumqi, China
| | - Xiaowen Hu
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Jiameng Xue
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| |
Collapse
|
9
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
10
|
Jin Z, Wang M, Tang C, Zheng X, Zhang W, Sha X, An S. Predicting miRNA-disease association via graph attention learning and multiplex adaptive modality fusion. Comput Biol Med 2024; 169:107904. [PMID: 38181611 DOI: 10.1016/j.compbiomed.2023.107904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 12/12/2023] [Accepted: 12/23/2023] [Indexed: 01/07/2024]
Abstract
miRNAs are a class of small non-coding RNA molecules that play important roles in gene regulation. They are crucial for maintaining normal cellular functions, and dysregulation or dysfunction of miRNAs which are linked to the onset and advancement of multiple human diseases. Research on miRNAs has unveiled novel avenues in the realm of the diagnosis, treatment, and prevention of human diseases. However, clinical trials pose challenges and drawbacks, such as complexity and time-consuming processes, which create obstacles for many researchers. Graph Attention Network (GAT) has shown excellent performance in handling graph-structured data for tasks such as link prediction. Some studies have successfully applied GAT to miRNA-disease association prediction. However, there are several drawbacks to existing methods. Firstly, most of the previous models rely solely on concatenation operations to merge features of miRNAs and diseases, which results in the deprivation of significant modality-specific information and even the inclusion of redundant information. Secondly, as the number of layers in GAT increases, there is a possibility of excessive smoothing in the feature extraction process, which significantly affects the prediction accuracy. To address these issues and effectively complete miRNA disease prediction tasks, we propose an innovative model called Multiplex Adaptive Modality Fusion Graph Attention Network (MAMFGAT). MAMFGAT utilizes GAT as the main structure for feature aggregation and incorporates a multi-modal adaptive fusion module to extract features from three interconnected networks: the miRNA-disease association network, the miRNA similarity network, and the disease similarity network. It employs adaptive learning and cross-modality contrastive learning to fuse more effective miRNA and disease feature embeddings as well as incorporates multi-modal residual feature fusion to tackle the problem of excessive feature smoothing in GATs. Finally, we employ a Multi-Layer Perceptron (MLP) model that takes the embeddings of miRNA and disease features as input to anticipate the presence of potential miRNA-disease associations. Extensive experimental results provide evidence of the superior performance of MAMFGAT in comparison to other state-of-the-art methods. To validate the significance of various modalities and assess the efficacy of the designed modules, we performed an ablation analysis. Furthermore, MAMFGAT shows outstanding performance in three cancer case studies, indicating that it is a reliable method for studying the association between miRNA and diseases. The implementation of MAMFGAT can be accessed at the following GitHub repository: https://github.com/zixiaojin66/MAMFGAT-master.
Collapse
Affiliation(s)
- Zixiao Jin
- School of Computer, China University of Geosciences, Wuhan, 430074, China.
| | - Minhui Wang
- Department of Pharmacy, Lianshui People's Hospital of Kangda College Affiliated to Nanjing Medical University, Huai'an 223300, China.
| | - Chang Tang
- School of Computer, China University of Geosciences, Wuhan, 430074, China.
| | - Xiao Zheng
- School of Computer, National University of Defense Technology, Changsha, 410073, China.
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Xiaofeng Sha
- Department of Oncology, Huai'an Hongze District People's Hospital, Huai'an, 223100, China.
| | - Shan An
- JD Health International Inc., China.
| |
Collapse
|
11
|
Xie GB, Yu JR, Lin ZY, Gu GS, Chen RB, Xu HJ, Liu ZG. Prediction of miRNA-disease associations based on strengthened hypergraph convolutional autoencoder. Comput Biol Chem 2024; 108:107992. [PMID: 38056378 DOI: 10.1016/j.compbiolchem.2023.107992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/04/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Most existing graph neural network-based methods for predicting miRNA-disease associations rely on initial association matrices to pass messages, but the sparsity of these matrices greatly limits performance. To address this issue and predict potential associations between miRNAs and diseases, we propose a method called strengthened hypergraph convolutional autoencoder (SHGAE). SHGAE leverages multiple layers of strengthened hypergraph neural networks (SHGNN) to obtain robust node embeddings. Within SHGNN, we design a strengthened hypergraph convolutional network module (SHGCN) that enhances original graph associations and reduces matrix sparsity. Additionally, SHGCN expands node receptive fields by utilizing hyperedge features as intermediaries to obtain high-order neighbor embeddings. To improve performance, we also incorporate attention-based fusion of self-embeddings and SHGCN embeddings. SHGAE predicts potential miRNA-disease associations using a multilayer perceptron as the decoder. Across multiple metrics, SHGAE outperforms other state-of-the-art methods in five-fold cross-validation. Furthermore, we evaluate SHGAE on colon and lung neoplasms cases to demonstrate its ability to predict potential associations. Notably, SHGAE also performs well in the analysis of gastric neoplasms without miRNA associations.
Collapse
Affiliation(s)
- Guo-Bo Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Jun-Rui Yu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Zhi-Yi Lin
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Guo-Sheng Gu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Rui-Bin Chen
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Hao-Jie Xu
- School of Computer Science, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Zhen-Guo Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou 510080, China.
| |
Collapse
|
12
|
Li J, Chen J, Wang Z, Lei X. HoRDA: Learning higher-order structure information for predicting RNA-disease associations. Artif Intell Med 2024; 148:102775. [PMID: 38325924 DOI: 10.1016/j.artmed.2024.102775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 10/16/2023] [Accepted: 01/14/2024] [Indexed: 02/09/2024]
Abstract
CircRNA and miRNA are crucial non-coding RNAs, which are associated with biological diseases. Exploring the associations between RNAs and diseases often requires a significant time and financial investments, which has been greatly alleviated and improved with the application of deep learning methods in bioinformatics. However, existing methods often fail to achieve higher accuracy and cannot be universal between multiple RNAs. Moreover, complex RNA-disease associations hide important higher-order topology information. To address these issues, we learn higher-order structure information for predicting RNA-disease associations (HoRDA). Firstly, the correlations between RNAs and the correlations between diseases are fully explored by combining similarity and higher-order graph attention network. Then, a higher-order graph convolutional network is constructed to aggregate neighbor information, and further obtain the representations of RNAs and diseases. Meanwhile, due to the large number of complex and variable higher-order structures in biological networks, we design a higher-order negative sampling strategy to gain more desirable negative samples. Finally, the obtained embeddings of RNAs and diseases are feed into logistic regression model to acquire the probabilities of RNA-disease associations. Diverse simulation results demonstrate the superiority of the proposed method. In the end, the case study is conducted on breast neoplasms, colorectal neoplasms, and gastric neoplasms. We validate the proposed higher-order strategies through ablative and exploratory analyses and further demonstrate the practical applicability of HoRDA. HoRDA has a certain contribution in RNA-disease association prediction.
Collapse
Affiliation(s)
- Julong Li
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Jianrui Chen
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| | - Zhihui Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China
| |
Collapse
|
13
|
Zhang Y, Chu Y, Lin S, Xiong Y, Wei DQ. ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler. Brief Bioinform 2024; 25:bbae103. [PMID: 38517693 PMCID: PMC10959163 DOI: 10.1093/bib/bbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/04/2024] [Accepted: 02/23/2024] [Indexed: 03/24/2024] Open
Abstract
Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.
Collapse
Affiliation(s)
- Yufang Zhang
- School of Mathematical Sciences and SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai 200240, China
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
| | - Yanyi Chu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
| | - Dong-Qing Wei
- Peng Cheng Laboratory, Shenzhen, Guangdong 518055, China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan, 473006, China
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, and Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
14
|
Yang C, Wang Z, Zhang S, Li X, Wang X, Liu J, Li R, Zeng S. MVNMDA: A Multi-View Network Combing Semantic and Global Features for Predicting miRNA-Disease Association. Molecules 2023; 29:230. [PMID: 38202814 PMCID: PMC10780172 DOI: 10.3390/molecules29010230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/23/2023] [Accepted: 12/28/2023] [Indexed: 01/12/2024] Open
Abstract
A growing body of experimental evidence suggests that microRNAs (miRNAs) are closely associated with specific human diseases and play critical roles in their development and progression. Therefore, identifying miRNA related to specific diseases is of great significance for disease screening and treatment. In the early stages, the identification of associations between miRNAs and diseases demanded laborious and time-consuming biological experiments that often carried a substantial risk of failure. With the exponential growth in the number of potential miRNA-disease association combinations, traditional biological experimental methods face difficulties in processing massive amounts of data. Hence, developing more efficient computational methods to predict possible miRNA-disease associations and prioritize them is particularly necessary. In recent years, numerous deep learning-based computational methods have been developed and have demonstrated excellent performance. However, most of these methods rely on external databases or tools to compute various auxiliary information. Unfortunately, these external databases or tools often cover only a limited portion of miRNAs and diseases, resulting in many miRNAs and diseases being unable to match with these computational methods. Therefore, there are certain limitations associated with the practical application of these methods. To overcome the above limitations, this study proposes a multi-view computational model called MVNMDA, which predicts potential miRNA-disease associations by integrating features of miRNA and diseases from local views, global views, and semantic views. Specifically, MVNMDA utilizes known association information to construct node initial features. Then, multiple networks are constructed based on known association to extract low-dimensional feature embedding of all nodes. Finally, a cascaded attention classifier is proposed to fuse features from coarse to fine, suppressing noise within the features and making precise predictions. To validate the effectiveness of the proposed method, extensive experiments were conducted on the HMDD v2.0 and HMDD v3.2 datasets. The experimental results demonstrate that MVNMDA achieves better performance compared to other computational methods. Additionally, the case study results further demonstrate the reliable predictive performance of MVNMDA.
Collapse
Affiliation(s)
| | - Zhen Wang
- School of Electronic Infomation, Xijing University, Xi’an 710123, China; (C.Y.); (S.Z.); (X.L.); (X.W.); (J.L.); (R.L.); (S.Z.)
| | | | | | | | | | | | | |
Collapse
|
15
|
Liao Q, Fu X, Zhuo L, Chen H. An efficient model for predicting human diseases through miRNA based on multiple-types of contrastive learning. Front Microbiol 2023; 14:1325001. [PMID: 38163075 PMCID: PMC10755968 DOI: 10.3389/fmicb.2023.1325001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 11/16/2023] [Indexed: 01/03/2024] Open
Abstract
Multiple studies have demonstrated that microRNA (miRNA) can be deeply involved in the regulatory mechanism of human microbiota, thereby inducing disease. Developing effective methods to infer potential associations between microRNAs (miRNAs) and diseases can aid early diagnosis and treatment. Recent methods utilize machine learning or deep learning to predict miRNA-disease associations (MDAs), achieving state-of-the-art performance. However, the problem of sparse neighborhoods of nodes due to lack of data has not been well solved. To this end, we propose a new model named MTCL-MDA, which integrates multiple-types of contrastive learning strategies into a graph collaborative filtering model to predict potential MDAs. The model adopts a contrastive learning strategy based on topology, which alleviates the damage to model performance caused by sparse neighborhoods. In addition, the model also adopts a semantic-based contrastive learning strategy, which not only reduces the impact of noise introduced by topology-based contrastive learning, but also enhances the semantic information of nodes. Experimental results show that our model outperforms existing models on all evaluation metrics. Case analysis shows that our model can more accurately identify potential MDA, which is of great significance for the screening and diagnosis of real-life diseases. Our data and code are publicly available at: https://github.com/Lqingquan/MTCL-MDA.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
16
|
Dong B, Sun W, Xu D, Wang G, Zhang T. DAEMDA: A Method with Dual-Channel Attention Encoding for miRNA-Disease Association Prediction. Biomolecules 2023; 13:1514. [PMID: 37892196 PMCID: PMC10604960 DOI: 10.3390/biom13101514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 10/08/2023] [Indexed: 10/29/2023] Open
Abstract
A growing number of studies have shown that aberrant microRNA (miRNA) expression is closely associated with the evolution and development of various complex human diseases. These key biomarkers' identification and observation are significant for gaining a deeper understanding of disease pathogenesis and therapeutic mechanisms. Consequently, pinpointing potential miRNA-disease associations (MDA) has become a prominent bioinformatics subject, encouraging several new computational methods given the advances in graph neural networks (GNN). Nevertheless, these existing methods commonly fail to exploit the network nodes' global feature information, leaving the generation of high-quality embedding representations using graph properties as a critical unsolved issue. Addressing these challenges, we introduce the DAEMDA, a computational method designed to optimize the current models' efficacy. First, we construct similarity and heterogeneous networks involving miRNAs and diseases, relying on experimentally corroborated miRNA-disease association data and analogous information. Then, a newly-fashioned parallel dual-channel feature encoder, designed to better comprehend the global information within the heterogeneous network and generate varying embedding representations, follows this. Ultimately, employing a neural network classifier, we merge the dual-channel embedding representations and undertake association predictions between miRNA and disease nodes. The experimental results of five-fold cross-validation and case studies of major diseases based on the HMDD v3.2 database show that this method can generate high-quality embedded representations and effectively improve the accuracy of MDA prediction.
Collapse
Affiliation(s)
| | | | | | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (B.D.)
| | - Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (B.D.)
| |
Collapse
|
17
|
Gong H, Zhang D, Zhang X. TOAST: A novel method for identifying topologically associated domains based on graph auto-encoders and clustering. Comput Struct Biotechnol J 2023; 21:4759-4768. [PMID: 37822562 PMCID: PMC10562672 DOI: 10.1016/j.csbj.2023.09.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 09/16/2023] [Accepted: 09/16/2023] [Indexed: 10/13/2023] Open
Abstract
Topologically associated domains (TADs) play a pivotal role in disease detection. This study introduces a novel TADs recognition approach named TOAST, leveraging graph auto-encoders and clustering techniques. TOAST conceptualizes each genomic bin as a node of a graph and employs the Hi-C contact matrix as the graph's adjacency matrix. By employing graph auto-encoders, TOAST generates informative embeddings as features. Subsequently, the unsupervised clustering algorithm HDBSCAN is utilized to assign labels to each genomic bin, facilitating the identification of contiguous regions with the same label as TADs. Our experimental analysis of several simulated Hi-C data sets shows that TOAST can quickly and accurately identify TADs from different types of simulated Hi-C contact matrices, outperforming existing algorithms. We also determined the anchoring ratio of TAD boundaries by analyzing different TAD recognition algorithms, and obtained an average ratio of anchoring CTCF, SMC3, RAD21, POLR2A, H3K36me3, H3K9me3, H3K4me3, H3K4me1, Enhancer, and Promoters of 0.66, 0.47, 0.54, 0.27, 0.24, 0.12, 0.32, 0.41, 0.26, and 0.13, respectively. In conclusion, TOAST is a method that can quickly identify TAD boundary parameters that are easy to understand and have important biological significance. The TOAST web server can be accessed via http://223.223.185.189:4005/. The code of TOAST is available online at https://github.com/ghaiyan/TOAST.
Collapse
Affiliation(s)
- Haiyan Gong
- Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China
- School of Computer and Communication Engineering, Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing, 100083, China
- Shunde innovation School, University of Science and Technology Beijing, Foshan, 528399, Guangdong, China
| | - Dawei Zhang
- Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China
- Shunde innovation School, University of Science and Technology Beijing, Foshan, 528399, Guangdong, China
| | - Xiaotong Zhang
- School of Computer and Communication Engineering, Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing, 100083, China
- Shunde innovation School, University of Science and Technology Beijing, Foshan, 528399, Guangdong, China
| |
Collapse
|
18
|
Hu X, Liu D, Zhang J, Fan Y, Ouyang T, Luo Y, Zhang Y, Deng L. A comprehensive review and evaluation of graph neural networks for non-coding RNA and complex disease associations. Brief Bioinform 2023; 24:bbad410. [PMID: 37985451 DOI: 10.1093/bib/bbad410] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/07/2023] [Accepted: 10/25/2023] [Indexed: 11/22/2023] Open
Abstract
Non-coding RNAs (ncRNAs) play a critical role in the occurrence and development of numerous human diseases. Consequently, studying the associations between ncRNAs and diseases has garnered significant attention from researchers in recent years. Various computational methods have been proposed to explore ncRNA-disease relationships, with Graph Neural Network (GNN) emerging as a state-of-the-art approach for ncRNA-disease association prediction. In this survey, we present a comprehensive review of GNN-based models for ncRNA-disease associations. Firstly, we provide a detailed introduction to ncRNAs and GNNs. Next, we delve into the motivations behind adopting GNNs for predicting ncRNA-disease associations, focusing on data structure, high-order connectivity in graphs and sparse supervision signals. Subsequently, we analyze the challenges associated with using GNNs in predicting ncRNA-disease associations, covering graph construction, feature propagation and aggregation, and model optimization. We then present a detailed summary and performance evaluation of existing GNN-based models in the context of ncRNA-disease associations. Lastly, we explore potential future research directions in this rapidly evolving field. This survey serves as a valuable resource for researchers interested in leveraging GNNs to uncover the complex relationships between ncRNAs and diseases.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego,92093 CA, USA
| | - Yanhao Fan
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Tianxiang Ouyang
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yue Luo
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| | - Yuanpeng Zhang
- school of software, Xinjiang University, 830046 Urumqi, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University,410075 Changsha, China
| |
Collapse
|
19
|
Wang S, Li Y, Zhang Y, Pang S, Qiao S, Zhang Y, Wang F. Generative Adversarial Matrix Completion Network based on Multi-Source Data Fusion for miRNA-Disease Associations Prediction. Brief Bioinform 2023; 24:bbad270. [PMID: 37482409 DOI: 10.1093/bib/bbad270] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 06/16/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Numerous biological studies have shown that considering disease-associated micro RNAs (miRNAs) as potential biomarkers or therapeutic targets offers new avenues for the diagnosis of complex diseases. Computational methods have gradually been introduced to reveal disease-related miRNAs. Considering that previous models have not fused sufficiently diverse similarities, that their inappropriate fusion methods may lead to poor quality of the comprehensive similarity network and that their results are often limited by insufficiently known associations, we propose a computational model called Generative Adversarial Matrix Completion Network based on Multi-source Data Fusion (GAMCNMDF) for miRNA-disease association prediction. We create a diverse network connecting miRNAs and diseases, which is then represented using a matrix. The main task of GAMCNMDF is to complete the matrix and obtain the predicted results. The main innovations of GAMCNMDF are reflected in two aspects: GAMCNMDF integrates diverse data sources and employs a nonlinear fusion approach to update the similarity networks of miRNAs and diseases. Also, some additional information is provided to GAMCNMDF in the form of a 'hint' so that GAMCNMDF can work successfully even when complete data are not available. Compared with other methods, the outcomes of 10-fold cross-validation on two distinct databases validate the superior performance of GAMCNMDF with statistically significant results. It is worth mentioning that we apply GAMCNMDF in the identification of underlying small molecule-related miRNAs, yielding outstanding performance results in this specific domain. In addition, two case studies about two important neoplasms show that GAMCNMDF is a promising prediction method.
Collapse
Affiliation(s)
- ShuDong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - YunYin Li
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - YuanYuan Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - ShanChen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - SiBo Qiao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - Yu Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - FuYu Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| |
Collapse
|
20
|
Guo Y, Zhou D, Ruan X, Cao J. Variational gated autoencoder-based feature extraction model for inferring disease-miRNA associations based on multiview features. Neural Netw 2023; 165:491-505. [PMID: 37336034 DOI: 10.1016/j.neunet.2023.05.052] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 05/19/2023] [Accepted: 05/28/2023] [Indexed: 06/21/2023]
Abstract
MicroRNAs (miRNA) play critical roles in diverse biological processes of diseases. Inferring potential disease-miRNA associations enable us to better understand the development and diagnosis of complex human diseases via computational algorithms. The work presents a variational gated autoencoder-based feature extraction model to extract complex contextual features for inferring potential disease-miRNA associations. Specifically, our model fuses three different similarities of miRNAs into a comprehensive miRNA network and then combines two various similarities of diseases into a comprehensive disease network, respectively. Then, a novel graph autoencoder is designed to extract multilevel representations based on variational gate mechanisms from heterogeneous networks of miRNAs and diseases. Finally, a gate-based association predictor is devised to combine multiscale representations of miRNAs and diseases via a novel contrastive cross-entropy function, and then infer disease-miRNA associations. Experimental results indicate that our proposed model achieves remarkable association prediction performance, proving the efficacy of the variational gate mechanism and contrastive cross-entropy loss for inferring disease-miRNA associations.
Collapse
Affiliation(s)
- Yanbu Guo
- College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China.
| | - Dongming Zhou
- School of Information Science and Engineering, Yunnan University, Kunming 650500, China.
| | - Xiaoli Ruan
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.
| | - Jinde Cao
- School of Mathematics, Southeast University, Nanjing 211189, China; Yonsei Frontier Lab, Yonsei University, Seoul 03722, South Korea.
| |
Collapse
|
21
|
Zhu Y, Zhang F, Zhang S, Yi M. Predicting latent lncRNA and cancer metastatic event associations via variational graph auto-encoder. Methods 2023; 211:1-9. [PMID: 36709790 DOI: 10.1016/j.ymeth.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/05/2022] [Accepted: 01/20/2023] [Indexed: 01/27/2023] Open
Abstract
Long non-coding RNA (lncRNA) are shown to be closely associated with cancer metastatic events (CME, e.g., cancer cell invasion, intravasation, extravasation, proliferation) that collaboratively accelerate malignant cancer spread and cause high mortality rate in patients. Clinical trials may accurately uncover the relationships between lncRNAs and CMEs; however, it is time-consuming and expensive. With the accumulation of data, there is an urgent need to find efficient ways to identify these relationships. Herein, a graph embedding representation-based predictor (VGEA-LCME) for exploring latent lncRNA-CME associations is introduced. In VGEA-LCME, a heterogeneous combined network is constructed by integrating similarity and linkage matrix that can maintain internal and external characteristics of networks, and a variational graph auto-encoder serves as a feature generator to represent arbitrary lncRNA and CME pair. The final robustness predicted result is obtained by ensemble classifier strategy via cross-validation. Experimental comparisons and literature verification show better remarkable performance of VGEA-LCME, although the similarities between CMEs are challenging to calculate. In addition, VGEA-LCME can further identify organ-specific CMEs. To the best of our knowledge, this is the first computational attempt to discover the potential relationships between lncRNAs and CMEs. It may provide support and new insight for guiding experimental research of metastatic cancers. The source code and data are available at https://github.com/zhuyuan-cug/VGAE-LCME.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automation, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China; Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China; Engineering Research Center of Intelligent Technology for Geo-Exploration, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| | - Feng Zhang
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, 974 Heping Avenue, Qingshan District, 430081, Wuhan, Hubei, China.
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China.
| |
Collapse
|
22
|
Zhang H, Fang J, Sun Y, Xie G, Lin Z, Gu G. Predicting miRNA-Disease Associations via Node-Level Attention Graph Auto-Encoder. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1308-1318. [PMID: 35503834 DOI: 10.1109/tcbb.2022.3170843] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Previous studies have confirmed microRNA (miRNA), small single-stranded non-coding RNA, participates in various biological processes and plays vital roles in many complex human diseases. Therefore, developing an efficient method to infer potential miRNA disease associations could greatly help understand operational mechanisms for diseases at the molecular level. However, during these early stages for miRNA disease prediction, traditional biological experiments are laborious and expensive. Therefore, this study proposes a novel method called AGAEMD (node-level Attention Graph Auto-Encoder to predict potential MiRNA Disease associations). We first create a heterogeneous matrix incorporating miRNA similarity, disease similarity, and known miRNA-disease associations. Then these matrixes are input into a node-level attention encoder-decoder network which utilizes low dimensional dense embeddings to represent nodes and calculate association scores. To verify the effectiveness of the proposed method, we conduct a series of experiments on two benchmark datasets (the Human MicroRNA Disease Database v2.0 and v3.2) and report the averages over 10 runs in comparison with several state-of-the-art methods. Experimental results have demonstrated the excellent performance of AGAEMD in comparison with other methods. Three important diseases (Colon Neoplasms, Lung Neoplasms, Lupus Vulgaris) were applied in case studies. The results comfirm the reliable predictive performance of AGAEMD.
Collapse
|
23
|
Deep Learning with Graph Convolutional Networks: An Overview and Latest Applications in Computational Intelligence. INT J INTELL SYST 2023. [DOI: 10.1155/2023/8342104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Convolutional neural networks (CNNs) have received widespread attention due to their powerful modeling capabilities and have been successfully applied in natural language processing, image recognition, and other fields. On the other hand, traditional CNN can only deal with Euclidean spatial data. In contrast, many real-life scenarios, such as transportation networks, social networks, reference networks, and so on, exist in graph data. The creation of graph convolution operators and graph pooling is at the heart of migrating CNN to graph data analysis and processing. With the advancement of the Internet and technology, graph convolution network (GCN), as an innovative technology in artificial intelligence (AI), has received more and more attention. GCN has been widely used in different fields such as image processing, intelligent recommender system, knowledge-based graph, and other areas due to their excellent characteristics in processing non-European spatial data. At the same time, communication networks have also embraced AI technology in recent years, and AI serves as the brain of the future network and realizes the comprehensive intelligence of the future grid. Many complex communication network problems can be abstracted as graph-based optimization problems and solved by GCN, thus overcoming the limitations of traditional methods. This survey briefly describes the definition of graph-based machine learning, introduces different types of graph networks, summarizes the application of GCN in various research fields, analyzes the research status, and gives the future research direction.
Collapse
|
24
|
Gervits A, Sharan R. Predicting genetic interactions, cell line dependencies and drug sensitivities with variational graph auto-encoder. FRONTIERS IN BIOINFORMATICS 2022; 2:1025783. [PMID: 36530386 PMCID: PMC9755598 DOI: 10.3389/fbinf.2022.1025783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 11/21/2022] [Indexed: 09/10/2024] Open
Abstract
Large scale cancer genomics data provide crucial information about the disease and reveal points of intervention. However, systematic data have been collected in specific cell lines and their collection is laborious and costly. Hence, there is a need to develop computational models that can predict such data for any genomic context of interest. Here we develop novel models that build on variational graph auto-encoders and can integrate diverse types of data to provide high quality predictions of genetic interactions, cell line dependencies and drug sensitivities, outperforming previous methods. Our models, data and implementation are available at: https://github.com/aijag/drugGraphNet.
Collapse
Affiliation(s)
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv-Yafo, Israel
| |
Collapse
|
25
|
Fu Y, Yang R, Zhang L. Association prediction of CircRNAs and diseases using multi-homogeneous graphs and variational graph auto-encoder. Comput Biol Med 2022; 151:106289. [PMID: 36401973 DOI: 10.1016/j.compbiomed.2022.106289] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 10/19/2022] [Accepted: 11/06/2022] [Indexed: 11/12/2022]
Abstract
As a non-coding RNA molecule with closed-loop structure, circular RNA (circRNA) is tissue-specific and cell-specific in expression pattern. It regulates disease development by modulating the expression of disease-related genes. Therefore, exploring the circRNA-disease relationship can reveal the molecular mechanism of disease pathogenesis. Biological experiments for detecting circRNA-disease associations are time-consuming and laborious. Constrained by the sparsity of known circRNA-disease associations, existing algorithms cannot obtain relatively complete structural information to represent features accurately. To this end, this paper proposes a new predictor, VGAERF, combining Variational Graph Auto-Encoder (VGAE) and Random Forest (RF). Firstly, circRNA homogeneous graph structure and disease homogeneous graph structure are constructed by Gaussian interaction profile (GIP) kernel similarity, semantic similarity, and known circRNA-disease associations. VGAEs with the same structure are employed to extract the higher-order features by the encoding and decoding of input graph structures. To further increase the completeness of the network structure information, the deep features acquired from the two VGAEs are summed, and then train the RF with sparse data processing capability to perform the prediction task. On the independent test set, the Area Under ROC Curve (AUC), accuracy, and Area Under PR Curve (AUPR) of the proposed method reach up to 0.9803, 0.9345, and 0.9894, respectively. On the same dataset, the AUC, accuracy, and AUPR of VGAERF are 2.09%, 5.93%, and 1.86% higher than the best-performing method (AEDNN). It is anticipated that VGAERF will provide significant information to decipher the molecular mechanisms of circRNA-disease associations, and promote the diagnosis of circRNA-related diseases.
Collapse
Affiliation(s)
- Yao Fu
- The School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, 264209, China.
| | - Runtao Yang
- The School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, 264209, China.
| | - Lina Zhang
- The School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, 264209, China.
| |
Collapse
|
26
|
Wu QW, Cao RF, Xia JF, Ni JC, Zheng CH, Su YS. Extra Trees Method for Predicting LncRNA-Disease Association Based On Multi-Layer Graph Embedding Aggregation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3171-3178. [PMID: 34529571 DOI: 10.1109/tcbb.2021.3113122] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Lots of experimental studies have revealed the significant associations between lncRNAs and diseases. Identifying accurate associations will provide a new perspective for disease therapy. Calculation-based methods have been developed to solve these problems, but these methods have some limitations. In this paper, we proposed an accurate method, named MLGCNET, to discover potential lncRNA-disease associations. Firstly, we reconstructed similarity networks for both lncRNAs and diseases using top k similar information, and constructed a lncRNA-disease heterogeneous network (LDN). Then, we applied Multi-Layer Graph Convolutional Network on LDN to obtain latent feature representations of nodes. Finally, the Extra Trees was used to calculate the probability of association between disease and lncRNA. The results of extensive 5-fold cross-validation experiments show that MLGCNET has superior prediction performance compared to the state-of-the-art methods. Case studies confirm the performance of our model on specific diseases. All the experiment results prove the effectiveness and practicality of MLGCNET in predicting potential lncRNA-disease associations.
Collapse
|
27
|
Yue R, Dutta A. Computational systems biology in disease modeling and control, review and perspectives. NPJ Syst Biol Appl 2022; 8:37. [PMID: 36192551 PMCID: PMC9528884 DOI: 10.1038/s41540-022-00247-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/05/2022] [Indexed: 02/02/2023] Open
Abstract
Omics-based approaches have become increasingly influential in identifying disease mechanisms and drug responses. Considering that diseases and drug responses are co-expressed and regulated in the relevant omics data interactions, the traditional way of grabbing omics data from single isolated layers cannot always obtain valuable inference. Also, drugs have adverse effects that may impair patients, and launching new medicines for diseases is costly. To resolve the above difficulties, systems biology is applied to predict potential molecular interactions by integrating omics data from genomic, proteomic, transcriptional, and metabolic layers. Combined with known drug reactions, the resulting models improve medicines' therapeutical performance by re-purposing the existing drugs and combining drug molecules without off-target effects. Based on the identified computational models, drug administration control laws are designed to balance toxicity and efficacy. This review introduces biomedical applications and analyses of interactions among gene, protein and drug molecules for modeling disease mechanisms and drug responses. The therapeutical performance can be improved by combining the predictive and computational models with drug administration designed by control laws. The challenges are also discussed for its clinical uses in this work.
Collapse
Affiliation(s)
- Rongting Yue
- Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT, 06269, USA.
| | - Abhishek Dutta
- Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT, 06269, USA
| |
Collapse
|
28
|
Duan T, Kuang Z, Deng L. SVMMDR: Prediction of miRNAs-drug resistance using support vector machines based on heterogeneous network. Front Oncol 2022; 12:987609. [PMID: 36338674 PMCID: PMC9632662 DOI: 10.3389/fonc.2022.987609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 09/14/2022] [Indexed: 11/21/2022] Open
Abstract
In recent years, the miRNA is considered as a potential high-value therapeutic target because of its complex and delicate mechanism of gene regulation. The abnormal expression of miRNA can cause drug resistance, affecting the therapeutic effect of the disease. Revealing the associations between miRNAs-drug resistance can help in the design of effective drugs or possible drug combinations. However, current conventional experiments for identification of miRNAs-drug resistance are time-consuming and high-cost. Therefore, it’s of pretty realistic value to develop an accurate and efficient computational method to predicting miRNAs-drug resistance. In this paper, a method based on the Support Vector Machines (SVM) to predict the association between MiRNA and Drug Resistance (SVMMDR) is proposed. The SVMMDR integrates miRNAs-drug resistance association, miRNAs sequence similarity, drug chemical structure similarity and other similarities, extracts path-based Hetesim features, and obtains inclined diffusion feature through restart random walk. By combining the multiple feature, the prediction score between miRNAs and drug resistance is obtained based on the SVM. The innovation of the SVMMDR is that the inclined diffusion feature is obtained by inclined restart random walk, the node information and path information in heterogeneous network are integrated, and the SVM is used to predict potential miRNAs-drug resistance associations. The average AUC of SVMMDR obtained is 0.978 in 10-fold cross-validation.
Collapse
|
29
|
Dong TN, Schrader J, Mücke S, Khosla M. A message passing framework with multiple data integration for miRNA-disease association prediction. Sci Rep 2022; 12:16259. [PMID: 36171337 PMCID: PMC9519928 DOI: 10.1038/s41598-022-20529-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 09/14/2022] [Indexed: 11/08/2022] Open
Abstract
Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. The key idea is to enrich the existing miRNA/disease-protein-coding gene (PCG) associations via a message passing framework, followed by the use of disease ontology information for further feature filtering. The enriched and filtered PCG associations are then used to construct the inter-connected miRNA-PCG-disease network to train a structural deep network embedding (SDNE) model. Finally, the pre-trained embeddings and the biologically relevant features from the miRNA family and disease semantic similarity are concatenated to form the pair input representations to a Random Forest classifier whose task is to predict the miRNA-disease association probabilities. We present large-scale comparative experiments, ablation, and case studies to showcase our approach's superiority. Besides, we make the model prediction results for 1618 miRNAs and 3679 diseases, along with all related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster assessments and future adoption.
Collapse
Affiliation(s)
- Thi Ngan Dong
- L3S Research Center, Leibniz University of Hannover, Hannover, Germany.
| | - Johanna Schrader
- L3S Research Center, Leibniz University of Hannover, Hannover, Germany
| | - Stefanie Mücke
- Hannover Unified Biobank (HUB), Hannover Medical School, Hannover, Germany
| | - Megha Khosla
- Delft University of Technology (TU Delft), Delft, Netherlands
| |
Collapse
|
30
|
Zhong M, Li F, Chen W. Automatic arrhythmia detection with multi-lead ECG signals based on heterogeneous graph attention networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:12448-12471. [PMID: 36654006 DOI: 10.3934/mbe.2022581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Automatic arrhythmia detection is very important for cardiovascular health. It is generally performed by measuring the electrocardiogram (ECG) signals of standard multiple leads. However, the correlations of multiple leads are often ignored. In addition, an extensive and complex feature extraction process is usually needed in most existing studies. Therefore, these challenges will not only lead to the loss of overall lead information, but also cause the detection performance to depend on the quality of features. To solve these challenges, a novel multi-lead arrhythmia detection model based on a heterogeneous graph attention network is proposed in this paper. We have modeled the multi-lead data as a heterogeneous graph to integrate diverse information and construct intra-lead and inter-lead correlations in multi-lead data, providing a reasonable and effective the data model. A heterogeneous graph network with a dual-level attention strategy has been utilized to capture the interactions among diverse information and information types. At the same time, our model does not require any feature extraction process for the ECG signals, which avoids out complex feature engineering. Extensive experimental results show that multi-lead information and complex correlations can be well captured, thus confirming that the proposed model results in significant improvements in multi-lead arrhythmia detection.
Collapse
Affiliation(s)
- MingHao Zhong
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
| | - Fenghuan Li
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
| | - Weihong Chen
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China
| |
Collapse
|
31
|
Yang M, Huang ZA, Gu W, Han K, Pan W, Yang X, Zhu Z. Prediction of biomarker-disease associations based on graph attention network and text representation. Brief Bioinform 2022; 23:6651308. [PMID: 35901464 DOI: 10.1093/bib/bbac298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 06/28/2022] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. RESULTS Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. AVAILABILITY The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| | - Zhi-An Huang
- Center for Computer Science and Information Technology, City University of Hong Kong Dongguan Research Institute, Dongguan, China
| | - Wenhao Gu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China.,GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Kun Han
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Wenying Pan
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Xiao Yang
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| |
Collapse
|
32
|
Huang D, An J, Zhang L, Liu B. Computational method using heterogeneous graph convolutional network model combined with reinforcement layer for MiRNA-disease association prediction. BMC Bioinformatics 2022; 23:299. [PMID: 35879658 PMCID: PMC9316361 DOI: 10.1186/s12859-022-04843-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A large number of evidences from biological experiments have confirmed that miRNAs play an important role in the progression and development of various human complex diseases. However, the traditional experiment methods are expensive and time-consuming. Therefore, it is a challenging task that how to develop more accurate and efficient methods for predicting potential associations between miRNA and disease. RESULTS In the study, we developed a computational model that combined heterogeneous graph convolutional network with enhanced layer for miRNA-disease association prediction (HGCNELMDA). The major improvement of our method lies in through restarting the random walk optimized the original features of nodes and adding a reinforcement layer to the hidden layer of graph convolutional network retained similar information between nodes in the feature space. In addition, the proposed approach recalculated the influence of neighborhood nodes on target nodes by introducing the attention mechanism. The reliable performance of the HGCNELMDA was certified by the AUC of 93.47% in global leave-one-out cross-validation (LOOCV), and the average AUCs of 93.01% in fivefold cross-validation. Meanwhile, we compared the HGCNELMDA with the state‑of‑the‑art methods. Comparative results indicated that o the HGCNELMDA is very promising and may provide a cost‑effective alternative for miRNA-disease association prediction. Moreover, we applied HGCNELMDA to 3 different case studies to predict potential miRNAs related to lung cancer, prostate cancer, and pancreatic cancer. Results showed that 48, 50, and 50 of the top 50 predicted miRNAs were supported by experimental association evidence. Therefore, the HGCNELMDA is a reliable method for predicting disease-related miRNAs. CONCLUSIONS The results of the HGCNELMDA method in the LOOCV (leave-one-out cross validation, LOOCV) and 5-cross validations were 93.47% and 93.01%, respectively. Compared with other typical methods, the performance of HGCNELMDA is higher. Three cases of lung cancer, prostate cancer, and pancreatic cancer were studied. Among the predicted top 50 candidate miRNAs, 48, 50, and 50 were verified in the biological database HDMMV2.0. Therefore; this further confirms the feasibility and effectiveness of our method. Therefore, this further confirms the feasibility and effectiveness of our method. To facilitate extensive studies for future disease-related miRNAs research, we developed a freely available web server called HGCNELMDA is available at http://124.221.62.44:8080/HGCNELMDA.jsp .
Collapse
Affiliation(s)
- Dan Huang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| | - JiYong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - Lei Zhang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
| | - BaiLong Liu
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China
| |
Collapse
|
33
|
Wang W, Chen H. Predicting miRNA-disease associations based on graph attention networks and dual Laplacian regularized least squares. Brief Bioinform 2022; 23:6645486. [PMID: 35849099 DOI: 10.1093/bib/bbac292] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/23/2022] [Accepted: 06/26/2022] [Indexed: 01/05/2023] Open
Abstract
Increasing biomedical evidence has proved that the dysregulation of miRNAs is associated with human complex diseases. Identification of disease-related miRNAs is of great importance for disease prevention, diagnosis and remedy. To reduce the time and cost of biomedical experiments, there is a strong incentive to develop efficient computational methods to infer potential miRNA-disease associations. Although many computational approaches have been proposed to address this issue, the prediction accuracy needs to be further improved. In this study, we present a computational framework MKGAT to predict possible associations between miRNAs and diseases through graph attention networks (GATs) using dual Laplacian regularized least squares. We use GATs to learn embeddings of miRNAs and diseases on each layer from initial input features of known miRNA-disease associations, intra-miRNA similarities and intra-disease similarities. We then calculate kernel matrices of miRNAs and diseases based on Gaussian interaction profile (GIP) with the learned embeddings. We further fuse the kernel matrices of each layer and initial similarities with attention mechanism. Dual Laplacian regularized least squares are finally applied for new miRNA-disease association predictions with the fused miRNA and disease kernels. Compared with six state-of-the-art methods by 5-fold cross-validations, our method MKGAT receives the highest AUROC value of 0.9627 and AUPR value of 0.7372. We use MKGAT to predict related miRNAs for three cancers and discover that all the top 50 predicted results in the three diseases are confirmed by existing databases. The excellent performance indicates that MKGAT would be a useful computational tool for revealing disease-related miRNAs.
Collapse
Affiliation(s)
- Wengang Wang
- School of Software, East China Jiaotong University, Nanchang 330013, China
| | - Hailin Chen
- School of Software, East China Jiaotong University, Nanchang 330013, China
| |
Collapse
|
34
|
Identification of MiRNA–Disease Associations Based on Information of Multi-Module and Meta-Path. Molecules 2022; 27:molecules27144443. [PMID: 35889314 PMCID: PMC9321348 DOI: 10.3390/molecules27144443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 12/10/2022] Open
Abstract
Cumulative research reveals that microRNAs (miRNAs) are involved in many critical biological processes including cell proliferation, differentiation and apoptosis. It is of great significance to figure out the associations between miRNAs and human diseases that are the basis for finding biomarkers for diagnosis and targets for treatment. To overcome the time-consuming and labor-intensive problems faced by traditional experiments, a computational method was developed to identify potential associations between miRNAs and diseases based on the graph attention network (GAT) with different meta-path mode and support vector (SVM). Firstly, we constructed a multi-module heterogeneous network based on the meta-path and learned the latent features of different modules by GAT. Secondly, we found the average of the latent features with weight to obtain a final node representation. Finally, we characterized miRNA–disease-association pairs with the node representation and trained an SVM to recognize potential associations. Based on the five-fold cross-validation and benchmark datasets, the proposed method achieved an area under the precision–recall curve (AUPR) of 0.9379 and an area under the receiver–operating characteristic curve (AUC) of 0.9472. The results demonstrate that our method has an outstanding practical application performance and can provide a reference for the discovery of new biomarkers and therapeutic targets.
Collapse
|
35
|
Yin MM, Liu JX, Gao YL, Kong XZ, Zheng CH. NCPLP: A Novel Approach for Predicting Microbe-Associated Diseases With Network Consistency Projection and Label Propagation. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5079-5087. [PMID: 33119529 DOI: 10.1109/tcyb.2020.3026652] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A growing number of clinical studies have provided substantial evidence of a close relationship between the microbe and the disease. Thus, it is necessary to infer potential microbe-disease associations. But traditional approaches use experiments to validate these associations that often spend a lot of materials and time. Hence, more reliable computational methods are expected to be applied to predict disease-associated microbes. In this article, an innovative mean for predicting microbe-disease associations is proposed, which is based on network consistency projection and label propagation (NCPLP). Given that most existing algorithms use the Gaussian interaction profile (GIP) kernel similarity as the similarity criterion between microbe pairs and disease pairs, in this model, Medical Subject Headings descriptors are considered to calculate disease semantic similarity. In addition, 16S rRNA gene sequences are borrowed for the calculation of microbe functional similarity. In view of the gene-based sequence information, we use two conventional methods (BLAST+ and MEGA7) to assess the similarity between each pair of microbes from different perspectives. Especially, network consistency projection is added to obtain network projection scores from the microbe space and the disease space. Ultimately, label propagation is utilized to reliably predict microbes related to diseases. NCPLP achieves better performance in various evaluation indicators and discovers a greater number of potential associations between microbes and diseases. Also, case studies further confirm the reliable prediction performance of NCPLP. To conclude, our algorithm NCPLP has the ability to discover these underlying microbe-disease associations and can provide help for biological study.
Collapse
|
36
|
Lou Z, Cheng Z, Li H, Teng Z, Liu Y, Tian Z. Predicting miRNA-disease associations via learning multimodal networks and fusing mixed neighborhood information. Brief Bioinform 2022; 23:6582005. [PMID: 35524503 DOI: 10.1093/bib/bbac159] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/29/2022] [Accepted: 04/10/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION In recent years, a large number of biological experiments have strongly shown that miRNAs play an important role in understanding disease pathogenesis. The discovery of miRNA-disease associations is beneficial for disease diagnosis and treatment. Since inferring these associations through biological experiments is time-consuming and expensive, researchers have sought to identify the associations utilizing computational approaches. Graph Convolutional Networks (GCNs), which exhibit excellent performance in link prediction problems, have been successfully used in miRNA-disease association prediction. However, GCNs only consider 1st-order neighborhood information at one layer but fail to capture information from high-order neighbors to learn miRNA and disease representations through information propagation. Therefore, how to aggregate information from high-order neighborhood effectively in an explicit way is still challenging. RESULTS To address such a challenge, we propose a novel method called mixed neighborhood information for miRNA-disease association (MINIMDA), which could fuse mixed high-order neighborhood information of miRNAs and diseases in multimodal networks. First, MINIMDA constructs the integrated miRNA similarity network and integrated disease similarity network respectively with their multisource information. Then, the embedding representations of miRNAs and diseases are obtained by fusing mixed high-order neighborhood information from multimodal network which are the integrated miRNA similarity network, integrated disease similarity network and the miRNA-disease association networks. Finally, we concentrate the multimodal embedding representations of miRNAs and diseases and feed them into the multilayer perceptron (MLP) to predict their underlying associations. Extensive experimental results show that MINIMDA is superior to other state-of-the-art methods overall. Moreover, the outstanding performance on case studies for esophageal cancer, colon tumor and lung cancer further demonstrates the effectiveness of MINIMDA. AVAILABILITY AND IMPLEMENTATION https://github.com/chengxu123/MINIMDA and http://120.79.173.96/.
Collapse
Affiliation(s)
- Zhengzheng Lou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhaoxu Cheng
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Hui Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Yang Liu
- Departments of Cerebrovascular Diseases, The Second Affiliated Hospital of Zhengzhou University, Zhengzhou 450000, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
| |
Collapse
|
37
|
Deng L, Huang Y, Liu X, Liu H. Graph2MDA: a multi-modal variational graph embedding model for predicting microbe-drug associations. Bioinformatics 2022; 38:1118-1125. [PMID: 34864873 DOI: 10.1093/bioinformatics/btab792] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/22/2021] [Accepted: 11/17/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accumulated clinical studies show that microbes living in humans interact closely with human hosts, and get involved in modulating drug efficacy and drug toxicity. Microbes have become novel targets for the development of antibacterial agents. Therefore, screening of microbe-drug associations can benefit greatly drug research and development. With the increase of microbial genomic and pharmacological datasets, we are greatly motivated to develop an effective computational method to identify new microbe-drug associations. RESULTS In this article, we proposed a novel method, Graph2MDA, to predict microbe-drug associations by using variational graph autoencoder (VGAE). We constructed multi-modal attributed graphs based on multiple features of microbes and drugs, such as molecular structures, microbe genetic sequences and function annotations. Taking as input the multi-modal attribute graphs, VGAE was trained to learn the informative and interpretable latent representations of each node and the whole graph, and then a deep neural network classifier was used to predict microbe-drug associations. The hyperparameter analysis and model ablation studies showed the sensitivity and robustness of our model. We evaluated our method on three independent datasets and the experimental results showed that our proposed method outperformed six existing state-of-the-art methods. We also explored the meaning of the learned latent representations of drugs and found that the drugs show obvious clustering patterns that are significantly consistent with drug ATC classification. Moreover, we conducted case studies on two microbes and two drugs and found 75-95% predicted associations have been reported in PubMed literature. Our extensive performance evaluations validated the effectiveness of our proposed method. AVAILABILITY AND IMPLEMENTATION Source codes and preprocessed data are available at https://github.com/moen-hyb/Graph2MDA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yibiao Huang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xuejun Liu
- School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
| | - Hui Liu
- School of Computer Science and Technology, Nanjing Tech University, Nanjing 211816, China
| |
Collapse
|
38
|
Zhong T, Li Z, You ZH, Nie R, Zhao H. Predicting miRNA-disease associations based on graph random propagation network and attention network. Brief Bioinform 2022; 23:6515233. [PMID: 35079767 DOI: 10.1093/bib/bbab589] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/07/2021] [Accepted: 12/22/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous experiments have demonstrated that abnormal expression of microRNAs (miRNAs) in organisms is often accompanied by the emergence of specific diseases. The research of miRNAs can promote the prevention and drug research of specific diseases. However, there are still many undiscovered links between miRNAs and diseases, which greatly limits the research of miRNAs. Therefore, for exploring the unknown miRNA-disease associations, we combine the graph random propagation network based on DropFeature with attention network to propose a novel deep learning model to predict the miRNA-disease associations (GRPAMDA). Specifically, we firstly construct the miRNA-disease heterogeneous graph based on miRNA-disease association information. Secondly, we adopt DropFeature to randomly delete the features of nodes in the graph and then perform propagation operations to enhance the features of miRNA and disease nodes. Thirdly, we employ the attention mechanism to fuse the features of random propagation by aggregating the enhanced neighbor features of miRNA and disease nodes. Finally, miRNA-disease association scores are generated by a fully connected layer. The average area under the curve of GRPAMDA model based on 5-fold cross-validation is 93.46% on HMDD v2.0. Case studies of esophageal tumors, lymphomas and prostate tumors show that 48, 47 and 46 of the top 50 miRNAs associated with these diseases are confirmed by dbDEMC and miR2Disease database, respectively. In short, the GRPAMDA model can be used as a valuable method to study miRNA-disease associations.
Collapse
Affiliation(s)
- Tangbo Zhong
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Zhengwei Li
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Huan Zhao
- Engineering Research Center of Mine Digitalization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
39
|
Fu H, Huang F, Liu X, Qiu Y, Zhang W. MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics 2022; 38:426-434. [PMID: 34499148 DOI: 10.1093/bioinformatics/btab651] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/07/2021] [Accepted: 09/06/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION There are various interaction/association bipartite networks in biomolecular systems. Identifying unobserved links in biomedical bipartite networks helps to understand the underlying molecular mechanisms of human complex diseases and thus benefits the diagnosis and treatment of diseases. Although a great number of computational methods have been proposed to predict links in biomedical bipartite networks, most of them heavily depend on features and structures involving the bioentities in one specific bipartite network, which limits the generalization capacity of applying the models to other bipartite networks. Meanwhile, bioentities usually have multiple features, and how to leverage them has also been challenging. RESULTS In this study, we propose a novel multi-view graph convolution network (MVGCN) framework for link prediction in biomedical bipartite networks. We first construct a multi-view heterogeneous network (MVHN) by combining the similarity networks with the biomedical bipartite network, and then perform a self-supervised learning strategy on the bipartite network to obtain node attributes as initial embeddings. Further, a neighborhood information aggregation (NIA) layer is designed for iteratively updating the embeddings of nodes by aggregating information from inter- and intra-domain neighbors in every view of the MVHN. Next, we combine embeddings of multiple NIA layers in each view, and integrate multiple views to obtain the final node embeddings, which are then fed into a discriminator to predict the existence of links. Extensive experiments show MVGCN performs better than or on par with baseline methods and has the generalization capacity on six benchmark datasets involving three typical tasks. AVAILABILITY AND IMPLEMENTATION Source code and data can be downloaded from https://github.com/fuhaitao95/MVGCN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
40
|
Bamunu Mudiyanselage T, Lei X, Senanayake N, Zhang Y, Pan Y. Predicting CircRNA disease associations using novel node classification and link prediction models on Graph Convolutional Networks. Methods 2021; 198:32-44. [PMID: 34748953 DOI: 10.1016/j.ymeth.2021.10.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 09/21/2021] [Accepted: 10/22/2021] [Indexed: 12/17/2022] Open
Abstract
Accumulated studies have discovered that circular RNAs (CircRNAs) are closely related to many complex human diseases. Due to this close relationship, CircRNAs can be used as good biomarkers for disease diagnosis and therapeutic targets for treatments. However, the number of experimentally verified circRNA-disease associations are still fewer and also conducting wet-lab experiments are constrained by the small scale and cost of time and labour. Therefore, effective computational methods are required to predict associations between circRNAs and diseases which will be promising candidates for small scale biological and clinical experiments. In this paper, we propose novel computational models based on Graph Convolution Networks (GCN) for the potential circRNA-disease association prediction. Currently most of the existing prediction methods use shallow learning algorithms. Instead, the proposed models combine the strengths of deep learning and graphs for the computation. First, they integrate multi-source similarity information into the association network. Next, models predict potential associations using graph convolution which explore this important relational knowledge of that network structure. Two circRNA-disease association prediction models, GCN based Node Classification (GCN-NC) and GCN based Link Prediction (GCN-LP) are introduced in this work and they demonstrate promising results in various experiments and outperforms other existing methods. Further, a case study proves that some of the predicted results of the novel computational models were confirmed by published literature and all top results could be verified using gene-gene interaction networks.
Collapse
Affiliation(s)
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China.
| | - Nipuna Senanayake
- Department of Computer Science, Georgia State University, Atlanta, USA.
| | - Yanqing Zhang
- Department of Computer Science, Georgia State University, Atlanta, USA.
| | - Yi Pan
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
41
|
Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021; 23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Chinese Academy of Sciences, Xinjiang Technical Institute of Physics and Chemistry, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
42
|
Pan Y, Lei X, Zhang Y. Association predictions of genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, radiomics, drug, symptoms, environment factor, and disease networks: A comprehensive approach. Med Res Rev 2021; 42:441-461. [PMID: 34346083 DOI: 10.1002/med.21847] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 05/22/2021] [Accepted: 07/07/2021] [Indexed: 12/12/2022]
Abstract
Currently, the research of multi-omics, such as genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, and radiomics, are hot spots. The relationship between multi-omics data, drugs, and diseases has received extensive attention from researchers. At the same time, multi-omics can effectively predict the diagnosis, prognosis, and treatment of diseases. In essence, these research entities, such as genes, RNAs, proteins, microbes, metabolites, pathways as well as pathological and medical imaging data, can all be represented by the network at different levels. And some computer and biology scholars have tried to use computational methods to explore the potential relationships between biological entities. We summary a comprehensive research strategy, that is to build a multi-omics heterogeneous network, covering multimodal data, and use the current popular computational methods to make predictions. In this study, we first introduce the calculation method of the similarity of biological entities at the data level, second discuss multimodal data fusion and methods of feature extraction. Finally, the challenges and opportunities at this stage are summarized. Some scholars have used such a framework to calculate and predict. We also summarize them and discuss the challenges. We hope that our review could help scholars who are interested in the field of bioinformatics, biomedical image, and computer research.
Collapse
Affiliation(s)
- Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
43
|
Zhang XM, Liang L, Liu L, Tang MJ. Graph Neural Networks and Their Current Applications in Bioinformatics. Front Genet 2021; 12:690049. [PMID: 34394185 PMCID: PMC8360394 DOI: 10.3389/fgene.2021.690049] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/28/2021] [Indexed: 12/22/2022] Open
Abstract
Graph neural networks (GNNs), as a branch of deep learning in non-Euclidean space, perform particularly well in various tasks that process graph structure data. With the rapid accumulation of biological network data, GNNs have also become an important tool in bioinformatics. In this research, a systematic survey of GNNs and their advances in bioinformatics is presented from multiple perspectives. We first introduce some commonly used GNN models and their basic principles. Then, three representative tasks are proposed based on the three levels of structural information that can be learned by GNNs: node classification, link prediction, and graph generation. Meanwhile, according to the specific applications for various omics data, we categorize and discuss the related studies in three aspects: disease prediction, drug discovery, and biomedical imaging. Based on the analysis, we provide an outlook on the shortcomings of current studies and point out their developing prospect. Although GNNs have achieved excellent results in many biological tasks at present, they still face challenges in terms of low-quality data processing, methodology, and interpretability and have a long road ahead. We believe that GNNs are potentially an excellent method that solves various biological problems in bioinformatics research.
Collapse
Affiliation(s)
- Xiao-Meng Zhang
- School of Information, Yunnan Normal University, Kunming, China
| | - Li Liang
- School of Information, Yunnan Normal University, Kunming, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
| | - Ming-Jing Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, China
- School of Life Sciences, Yunnan Normal University, Kunming, China
| |
Collapse
|
44
|
Li A, Deng Y, Tan Y, Chen M. A novel miRNA-disease association prediction model using dual random walk with restart and space projection federated method. PLoS One 2021; 16:e0252971. [PMID: 34138933 PMCID: PMC8211179 DOI: 10.1371/journal.pone.0252971] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 05/26/2021] [Indexed: 12/27/2022] Open
Abstract
A large number of studies have shown that the variation and disorder of miRNAs are important causes of diseases. The recognition of disease-related miRNAs has become an important topic in the field of biological research. However, the identification of disease-related miRNAs by biological experiments is expensive and time consuming. Thus, computational prediction models that predict disease-related miRNAs must be developed. A novel network projection-based dual random walk with restart (NPRWR) was used to predict potential disease-related miRNAs. The NPRWR model aims to estimate and accurately predict miRNA-disease associations by using dual random walk with restart and network projection technology, respectively. The leave-one-out cross validation (LOOCV) was adopted to evaluate the prediction performance of NPRWR. The results show that the area under the receiver operating characteristic curve(AUC) of NPRWR was 0.9029, which is superior to that of other advanced miRNA-disease associated prediction methods. In addition, lung and kidney neoplasms were selected to present a case study. Among the first 50 miRNAs predicted, 50 and 49 miRNAs have been proven by in databases or relevant literature. Moreover, NPRWR can be used to predict isolated diseases and new miRNAs. LOOCV and the case study achieved good prediction results. Thus, NPRWR will become an effective and accurate disease-miRNA association prediction model.
Collapse
Affiliation(s)
- Ang Li
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang, China
| | - Yingwei Deng
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang, China
- Hainan Key Laboratory for Computational Science and Application, Haikou, China
| | - Yan Tan
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang, China
| | - Min Chen
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang, China
| |
Collapse
|
45
|
Ding Y, Lei X, Liao B, Wu FX. Predicting miRNA-Disease Associations Based on Multi-View Variational Graph Auto-Encoder with Matrix Factorization. IEEE J Biomed Health Inform 2021; 26:446-457. [PMID: 34111017 DOI: 10.1109/jbhi.2021.3088342] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MicroRNAs (miRNAs) have been proved to play critical roles in diverse biological processes, including the human disease development process. Exploring the potential associations between miRNAs and diseases can help us better understand complex disease mechanisms. Given that traditional biological experiments are expensive and time-consuming, computational models can serve as efficient means to uncover potential miRNA-disease associations. This study presents a new computational model based on variational graph auto-encoder with matrix factorization (VGAMF) for miRNA-disease association prediction. More specifically, VGAMF first integrates four different types of information about miRNAs into an miRNA comprehensive similarity network and two types of information about diseases into a disease comprehensive similarity network, respectively. Then, VGAMF gets the non-linear representations of miRNAs and diseases, respectively, from those two comprehensive similarity networks with variational graph auto-encoders. Simultaneously, a non-negative matrix factorization is conducted on the miRNA-disease association matrix to get the linear representations of miRNAs and diseases. Finally, a fully connected neural network combines linear and non-linear representations of miRNAs and diseases to get the final predicted association score for all miRNA-disease pairs. In the 10-fold cross-validation experiments, VGAMF achieves an average AUC of 0.9280 on HMDD v2.0 and 0.9470 on HMDD v3.2, which outperforms other competing methods. Besides, the case studies on colon cancer and esophageal cancer further demonstrate the effectiveness of VGAMF in predicting novel miRNA-disease associations.
Collapse
|
46
|
Li J, Peng D, Xie Y, Dai Z, Zou X, Li Z. Novel Potential Small Molecule-MiRNA-Cancer Associations Prediction Model Based on Fingerprint, Sequence, and Clinical Symptoms. J Chem Inf Model 2021; 61:2208-2219. [PMID: 33899462 DOI: 10.1021/acs.jcim.0c01458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
As an important biomarker in organisms, miRNA is closely related to various small molecules and diseases. Research on small molecule-miRNA-cancer associations is helpful for the development of cancer treatment drugs and the discovery of pathogenesis. It is very urgent to develop theoretical methods for identifying potential small molecular-miRNA-cancer associations, because experimental approaches are usually time-consuming, laborious, and expensive. To overcome this problem, we developed a new computational method, in which features derived from structure, sequence, and symptoms were utilized to characterize small molecule, miRNA, and cancer, respectively. A feature vector was construct to characterize small molecule-miRNA-cancer association by concatenating these features, and a random forest algorithm was utilized to construct a model for recognizing potential association. Based on the 5-fold cross-validation and benchmark data set, the model achieved an accuracy of 93.20 ± 0.52%, a precision of 93.22 ± 0.51%, a recall of 93.20 ± 0.53%, and an F1-measure of 93.20 ± 0.52%. The areas under the receiver operating characteristic curve and precision recall curve were 0.9873 and 0.9870. The real prediction ability and application performance of the developed method have also been further evaluated and verified through an independent data set test and case study. Some potential small molecules and miRNAs related to cancer have been identified and are worthy of further experimental research. It is anticipated that our model could be regarded as a useful high-throughput virtual screening tool for drug research and development. All source codes can be downloaded from https://github.com/LeeKamlong/Multi-class-SMMCA.
Collapse
Affiliation(s)
- Jinlong Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Dongdong Peng
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Yun Xie
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Zong Dai
- School of Biomedical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
- Key Laboratory of Digital Quality Evaluation of Chinese Materia Medica of State Administration of Traditional Chinese Medicine, Guangzhou 510006, People's Republic of China
| |
Collapse
|
47
|
|
48
|
Zhao XM, Wu FX. Deep networks and network representation in bioinformatics. Methods 2021:S1046-2023(21)00102-X. [PMID: 33894378 DOI: 10.1016/j.ymeth.2021.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence and Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, China.
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, Department of Mechanical Engineering and Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
49
|
|
50
|
Wu QW, Xia JF, Ni JC, Zheng CH. GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest. Brief Bioinform 2021; 22:6067881. [PMID: 33415333 DOI: 10.1093/bib/bbaa391] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 11/26/2020] [Accepted: 11/30/2020] [Indexed: 12/11/2022] Open
Abstract
Predicting disease-related long non-coding RNAs (lncRNAs) is beneficial to finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. In this paper, we proposed a machine learning techniques-based classification approach to identify disease-related lncRNAs by graph auto-encoder (GAE) and random forest (RF) (GAERF). First, we combined the relationship of lncRNA, miRNA and disease into a heterogeneous network. Then, low-dimensional representation vectors of nodes were learned from the network by GAE, which reduce the dimension and heterogeneity of biological data. Taking these feature vectors as input, we trained a RF classifier to predict new lncRNA-disease associations (LDAs). Related experiment results show that the proposed method for the representation of lncRNA-disease characterizes them accurately. GAERF achieves superior performance owing to the ensemble learning method, outperforming other methods significantly. Moreover, case studies further demonstrated that GAERF is an effective method to predict LDAs.
Collapse
Affiliation(s)
- Qing-Wen Wu
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Jun-Feng Xia
- Institute of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Jian-Cheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|