1
|
Yao HB, Hou ZJ, Zhang WG, Li H, Chen Y. Prediction of MicroRNA-Disease Potential Association Based on Sparse Learning and Multilayer Random Walks. J Comput Biol 2024; 31:241-256. [PMID: 38377572 DOI: 10.1089/cmb.2023.0266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024] Open
Abstract
More and more studies have shown that microRNAs (miRNAs) play an indispensable role in the study of complex diseases in humans. Traditional biological experiments to detect miRNA-disease associations are expensive and time-consuming. Therefore, it is necessary to propose efficient and meaningful computational models to predict miRNA-disease associations. In this study, we aim to propose a miRNA-disease association prediction model based on sparse learning and multilayer random walks (SLMRWMDA). The miRNA-disease association matrix is decomposed and reconstructed by the sparse learning method to obtain richer association information, and at the same time, the initial probability matrix for the random walk with restart algorithm is obtained. The disease similarity network, miRNA similarity network, and miRNA-disease association network are used to construct heterogeneous networks, and the stable probability is obtained based on the topological structure features of diseases and miRNAs through a multilayer random walk algorithm to predict miRNA-disease potential association. The experimental results show that the prediction accuracy of this model is significantly improved compared with the previous related models. We evaluated the model using global leave-one-out cross-validation (global LOOCV) and fivefold cross-validation (5-fold CV). The area under the curve (AUC) value for the LOOCV is 0.9368. The mean AUC value for 5-fold CV is 0.9335 and the variance is 0.0004. In the case study, the results show that SLMRWMDA is effective in inferring the potential association of miRNA-disease.
Collapse
Affiliation(s)
- Hai-Bin Yao
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Zhen-Jie Hou
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Wen-Guang Zhang
- Life Sciences, Inner Mongolia Agricultural University, Hohhot, China
| | - Han Li
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Yan Chen
- Computer Science and Artificial Intelligence and Aliyun School of Big Data, Changzhou University, Changzhou, China
| |
Collapse
|
2
|
Wang S, Li Y, Zhang Y, Pang S, Qiao S, Zhang Y, Wang F. Generative Adversarial Matrix Completion Network based on Multi-Source Data Fusion for miRNA-Disease Associations Prediction. Brief Bioinform 2023; 24:bbad270. [PMID: 37482409 DOI: 10.1093/bib/bbad270] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 06/16/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Numerous biological studies have shown that considering disease-associated micro RNAs (miRNAs) as potential biomarkers or therapeutic targets offers new avenues for the diagnosis of complex diseases. Computational methods have gradually been introduced to reveal disease-related miRNAs. Considering that previous models have not fused sufficiently diverse similarities, that their inappropriate fusion methods may lead to poor quality of the comprehensive similarity network and that their results are often limited by insufficiently known associations, we propose a computational model called Generative Adversarial Matrix Completion Network based on Multi-source Data Fusion (GAMCNMDF) for miRNA-disease association prediction. We create a diverse network connecting miRNAs and diseases, which is then represented using a matrix. The main task of GAMCNMDF is to complete the matrix and obtain the predicted results. The main innovations of GAMCNMDF are reflected in two aspects: GAMCNMDF integrates diverse data sources and employs a nonlinear fusion approach to update the similarity networks of miRNAs and diseases. Also, some additional information is provided to GAMCNMDF in the form of a 'hint' so that GAMCNMDF can work successfully even when complete data are not available. Compared with other methods, the outcomes of 10-fold cross-validation on two distinct databases validate the superior performance of GAMCNMDF with statistically significant results. It is worth mentioning that we apply GAMCNMDF in the identification of underlying small molecule-related miRNAs, yielding outstanding performance results in this specific domain. In addition, two case studies about two important neoplasms show that GAMCNMDF is a promising prediction method.
Collapse
Affiliation(s)
- ShuDong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - YunYin Li
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - YuanYuan Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - ShanChen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - SiBo Qiao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - Yu Zhang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| | - FuYu Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum (East China), 66 Changjiang Xi Lu, 266580, Shandong, China
| |
Collapse
|
3
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
4
|
Le DH. A network-based method for predicting disease-associated enhancers. PLoS One 2021; 16:e0260432. [PMID: 34879086 PMCID: PMC8654176 DOI: 10.1371/journal.pone.0260432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/09/2021] [Indexed: 11/18/2022] Open
Abstract
Background Enhancers regulate transcription of target genes, causing a change in expression level. Thus, the aberrant activity of enhancers can lead to diseases. To date, a large number of enhancers have been identified, yet a small portion of them have been found to be associated with diseases. This raises a pressing need to develop computational methods to predict associations between diseases and enhancers. Results In this study, we assumed that enhancers sharing target genes could be associated with similar diseases to predict the association. Thus, we built an enhancer functional interaction network by connecting enhancers significantly sharing target genes, then developed a network diffusion method RWDisEnh, based on a random walk with restart algorithm, on networks of diseases and enhancers to globally measure the degree of the association between diseases and enhancers. RWDisEnh performed best when the disease similarities are integrated with the enhancer functional interaction network by known disease-enhancer associations in the form of a heterogeneous network of diseases and enhancers. It was also superior to another network diffusion method, i.e., PageRank with Priors, and a neighborhood-based one, i.e., MaxLink, which simply chooses the closest neighbors of known disease-associated enhancers. Finally, we showed that RWDisEnh could predict novel enhancers, which are either directly or indirectly associated with diseases. Conclusions Taken together, RWDisEnh could be a potential method for predicting disease-enhancer associations.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
5
|
Pang S, Zhuang Y, Wang X, Wang F, Qiao S. EOESGC: predicting miRNA-disease associations based on embedding of embedding and simplified graph convolutional network. BMC Med Inform Decis Mak 2021; 21:319. [PMID: 34789236 PMCID: PMC8597227 DOI: 10.1186/s12911-021-01671-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 10/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A large number of biological studies have shown that miRNAs are inextricably linked to many complex diseases. Studying the miRNA-disease associations could provide us a root cause understanding of the underlying pathogenesis in which promotes the progress of drug development. However, traditional biological experiments are very time-consuming and costly. Therefore, we come up with an efficient models to solve this challenge. RESULTS In this work, we propose a deep learning model called EOESGC to predict potential miRNA-disease associations based on embedding of embedding and simplified convolutional network. Firstly, integrated disease similarity, integrated miRNA similarity, and miRNA-disease association network are used to construct a coupled heterogeneous graph, and the edges with low similarity are removed to simplify the graph structure and ensure the effectiveness of edges. Secondly, the Embedding of embedding model (EOE) is used to learn edge information in the coupled heterogeneous graph. The training rule of the model is that the associated nodes are close to each other and the unassociated nodes are far away from each other. Based on this rule, edge information learned is added into node embedding as supplementary information to enrich node information. Then, node embedding of EOE model training as a new feature of miRNA and disease, and information aggregation is performed by simplified graph convolution model, in which each level of convolution can aggregate multi-hop neighbor information. In this step, we only use the miRNA-disease association network to further simplify the graph structure, thus reducing the computational complexity. Finally, feature embeddings of both miRNA and disease are spliced into the MLP for prediction. On the EOESGC evaluation part, the AUC, AUPR, and F1-score of our model are 0.9658, 0.8543 and 0.8644 by 5-fold cross-validation respectively. Compared with the latest published models, our model shows better results. In addition, we predict the top 20 potential miRNAs for breast cancer and lung cancer, most of which are validated in the dbDEMC and HMDD3.2 databases. CONCLUSION The comprehensive experimental results show that EOESGC can effectively identify the potential miRNA-disease associations.
Collapse
Affiliation(s)
- Shanchen Pang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Yu Zhuang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Xinzeng Wang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, China
| | - Fuyu Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| | - Sibo Qiao
- College of Computer Science and Technology, China University of Petroleum, Qingdao, China
| |
Collapse
|
6
|
Nguyen VT, Le TTK, Than K, Tran DH. Predicting miRNA-disease associations using improved random walk with restart and integrating multiple similarities. Sci Rep 2021; 11:21071. [PMID: 34702958 PMCID: PMC8548500 DOI: 10.1038/s41598-021-00677-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 10/15/2021] [Indexed: 12/20/2022] Open
Abstract
Predicting beneficial and valuable miRNA-disease associations (MDAs) by doing biological laboratory experiments is costly and time-consuming. Proposing a forceful and meaningful computational method for predicting MDAs is essential and captivated many computer scientists in recent years. In this paper, we proposed a new computational method to predict miRNA-disease associations using improved random walk with restart and integrating multiple similarities (RWRMMDA). We used a WKNKN algorithm as a pre-processing step to solve the problem of sparsity and incompletion of data to reduce the negative impact of a large number of missing associations. Two heterogeneous networks in disease and miRNA spaces were built by integrating multiple similarity networks, respectively, and different walk probabilities could be designated to each linked neighbor node of the disease or miRNA node in line with its degree in respective networks. Finally, an improve extended random walk with restart algorithm based on miRNA similarity-based and disease similarity-based heterogeneous networks was used to calculate miRNA-disease association prediction probabilities. The experiments showed that our proposed method achieved a momentous performance with Global LOOCV AUC (Area Under Roc Curve) and AUPR (Area Under Precision-Recall Curve) values of 0.9882 and 0.9066, respectively. And the best AUC and AUPR values under fivefold cross-validation of 0.9855 and 0.8642 which are proven by statistical tests, respectively. In comparison with other previous related methods, it outperformed than NTSHMDA, PMFMDA, IMCMDA and MCLPMDA methods in both AUC and AUPR values. In case studies of Breast Neoplasms, Carcinoma Hepatocellular and Stomach Neoplasms diseases, it inferred 1, 12 and 7 new associations out of top 40 predicted associated miRNAs for each disease, respectively. All of these new inferred associations have been confirmed in different databases or literatures.
Collapse
Affiliation(s)
- Van Tinh Nguyen
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
- Faculty of Information Technology, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Vietnam
| | - Thi Tu Kien Le
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
| | - Khoat Than
- Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Dang Hung Tran
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam.
| |
Collapse
|
7
|
Shen F, Cai W, Gan X, Feng J, Chen Z, Guo M, Wei F, Cao J, Xu B. Prediction of Genetic Factors of Hyperthyroidism Based on Gene Interaction Network. Front Cell Dev Biol 2021; 9:700355. [PMID: 34409035 PMCID: PMC8365469 DOI: 10.3389/fcell.2021.700355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 06/02/2021] [Indexed: 12/25/2022] Open
Abstract
The number of hyperthyroidism patients is increasing these years. As a disease that can lead to cardiovascular disease, it brings great potential health risks to humans. Since hyperthyroidism can induce the occurrence of many diseases, studying its genetic factors will promote the early diagnosis and treatment of hyperthyroidism and its related diseases. Previous studies have used genome-wide association analysis (GWAS) to identify genes related to hyperthyroidism. However, these studies only identify significant sites related to the disease from a statistical point of view and ignore the complex regulation relationship between genes. In addition, mutation is not the only genetic factor of causing hyperthyroidism. Identifying hyperthyroidism-related genes from gene interactions would help researchers discover the disease mechanism. In this paper, we purposed a novel machine learning method for identifying hyperthyroidism-related genes based on gene interaction network. The method, which is called “RW-RVM,” is a combination of Random Walk (RW) and Relevance Vector Machines (RVM). RW was implemented to encode the gene interaction network. The features of genes were the regulation relationship between genes and non-coding RNAs. Finally, multiple RVMs were applied to identify hyperthyroidism-related genes. The result of 10-cross validation shows that the area under the receiver operating characteristic curve (AUC) of our method reached 0.9, and area under the precision-recall curve (AUPR) was 0.87. Seventy-eight novel genes were found to be related to hyperthyroidism. We investigated two genes of these novel genes with existing literature, which proved the accuracy of our result and method.
Collapse
Affiliation(s)
- Fei Shen
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Wensong Cai
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Xiaoxiong Gan
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Jianhua Feng
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Zhen Chen
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Mengli Guo
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Fang Wei
- Department of General Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China
| | - Jie Cao
- Department of General Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China
| | - Bo Xu
- Department of Thyroid Surgery, School of Medicine, Guangzhou First People's Hospital, South China University of Technology, Guangzhou, China.,Department of Thyroid Surgery, Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
8
|
Pelia R, Venkateswaran S, Matthews JD, Haberman Y, Cutler DJ, Hyams JS, Denson LA, Kugathasan S. Profiling non-coding RNA levels with clinical classifiers in pediatric Crohn's disease. BMC Med Genomics 2021; 14:194. [PMID: 34325702 PMCID: PMC8323253 DOI: 10.1186/s12920-021-01041-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Crohn's disease (CD) is a heritable chronic inflammatory disorder. Non-coding RNAs (ncRNAs) play an important role in epigenetic regulation by affecting gene expression, but can also directly affect protein function, thus having a substantial impact on biological processes. We investigated whether non-coding RNAs (ncRNA) at diagnosis are dysregulated during CD at different CD locations and future disease behaviors to determine if ncRNA signatures can serve as an index to outcomes. METHODS Using subjects belonging to the RISK cohort, we analyzed ncRNA from the ileal biopsies of 345 CD and 71 non-IBD controls, and ncRNA from rectal biopsies of 329 CD and 61 non-IBD controls. Sequence alignment was done (STAR package) using Human Genome version 38 (hg38) as reference panel. The differential expression (DE) analysis was performed with EdgeR package and DE ncRNAs were identified with a threshold of fold change (FC) > 2 and FDR < 0.05 after multiple test corrections. RESULTS In total, we identified 130 CD specific DE ncRNAs (89 in ileum and 41 in rectum) when compared to non-IBD controls. Similarly, 35 DE ncRNAs were identified between B1 and B2 in ileum, whereas no differences among CD disease behaviors were noticed in rectum. We also found inflammation specific ncRNAs between inflamed and non-inflamed groups in ileal biopsies. Overall, we observed that expression of mir1244-2, mir1244-3, mir1244-4, and RN7SL2 were increased during CD, regardless of disease behavior, location, or inflammatory status. Lastly, we tested ncRNA expression at baseline as potential tool to predict the disease status, disease behaviors and disease inflammation at 3-year follow up. CONCLUSIONS We have identified ncRNAs that are specific to disease location, disease behavior, and disease inflammation in CD. Both ileal and rectal specific ncRNA are changing over the course of CD, specifically during the disease progression in the intestinal mucosa. Collectively, our findings show changes in ncRNA during CD and may have a clinical utility in early identification and characterization of disease progression.
Collapse
Affiliation(s)
- Ranjit Pelia
- Division of Pediatric Gastroenterology, Department of Pediatrics, Emory University School of Medicine and Children's Healthcare of Atlanta, 1760 Haygood Drive, W-427, Atlanta, GA, 30322, USA
| | - Suresh Venkateswaran
- Division of Pediatric Gastroenterology, Department of Pediatrics, Emory University School of Medicine and Children's Healthcare of Atlanta, 1760 Haygood Drive, W-427, Atlanta, GA, 30322, USA
| | - Jason D Matthews
- Division of Pediatric Gastroenterology, Department of Pediatrics, Emory University School of Medicine and Children's Healthcare of Atlanta, 1760 Haygood Drive, W-427, Atlanta, GA, 30322, USA
| | - Yael Haberman
- Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Sheba Medical Center, Tel-HaShomer, Affiliated With the Tel-Aviv University, Tel-Aviv, Israel
| | - David J Cutler
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | | | - Lee A Denson
- Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Subra Kugathasan
- Division of Pediatric Gastroenterology, Department of Pediatrics, Emory University School of Medicine and Children's Healthcare of Atlanta, 1760 Haygood Drive, W-427, Atlanta, GA, 30322, USA.
- Department of Human Genetics, Emory University, Atlanta, GA, USA.
| |
Collapse
|
9
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
10
|
SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 2021; 17:e1009165. [PMID: 34252084 PMCID: PMC8345837 DOI: 10.1371/journal.pcbi.1009165] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/06/2021] [Accepted: 06/08/2021] [Indexed: 11/21/2022] Open
Abstract
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases. Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.
Collapse
|
11
|
Ding Y, Lei X, Liao B, Wu FX. Predicting miRNA-Disease Associations Based on Multi-View Variational Graph Auto-Encoder with Matrix Factorization. IEEE J Biomed Health Inform 2021; 26:446-457. [PMID: 34111017 DOI: 10.1109/jbhi.2021.3088342] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MicroRNAs (miRNAs) have been proved to play critical roles in diverse biological processes, including the human disease development process. Exploring the potential associations between miRNAs and diseases can help us better understand complex disease mechanisms. Given that traditional biological experiments are expensive and time-consuming, computational models can serve as efficient means to uncover potential miRNA-disease associations. This study presents a new computational model based on variational graph auto-encoder with matrix factorization (VGAMF) for miRNA-disease association prediction. More specifically, VGAMF first integrates four different types of information about miRNAs into an miRNA comprehensive similarity network and two types of information about diseases into a disease comprehensive similarity network, respectively. Then, VGAMF gets the non-linear representations of miRNAs and diseases, respectively, from those two comprehensive similarity networks with variational graph auto-encoders. Simultaneously, a non-negative matrix factorization is conducted on the miRNA-disease association matrix to get the linear representations of miRNAs and diseases. Finally, a fully connected neural network combines linear and non-linear representations of miRNAs and diseases to get the final predicted association score for all miRNA-disease pairs. In the 10-fold cross-validation experiments, VGAMF achieves an average AUC of 0.9280 on HMDD v2.0 and 0.9470 on HMDD v3.2, which outperforms other competing methods. Besides, the case studies on colon cancer and esophageal cancer further demonstrate the effectiveness of VGAMF in predicting novel miRNA-disease associations.
Collapse
|
12
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
13
|
Xiao Y, Xiao Z, Feng X, Chen Z, Kuang L, Wang L. A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs. BMC Bioinformatics 2020; 21:555. [PMID: 33267800 PMCID: PMC7709313 DOI: 10.1186/s12859-020-03906-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 11/25/2020] [Indexed: 12/25/2022] Open
Abstract
Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.
Collapse
Affiliation(s)
- Yubin Xiao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, People's Republic of China
| | - Zheng Xiao
- Hunan Province Key Laboratory of Tumor Cellular and Molecular Pathology, Cancer Research Institute, University of South China, Hengyang, 421001, Hunan, People's Republic of China
| | - Xiang Feng
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China
| | - Linai Kuang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, People's Republic of China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410001, People's Republic of China. .,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, 411105, People's Republic of China.
| |
Collapse
|
14
|
Wang L, Chen Y, Zhang N, Chen W, Zhang Y, Gao R. QIMCMDA: MiRNA-Disease Association Prediction by q-Kernel Information and Matrix Completion. Front Genet 2020; 11:594796. [PMID: 33193744 PMCID: PMC7643770 DOI: 10.3389/fgene.2020.594796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 09/21/2020] [Indexed: 12/27/2022] Open
Abstract
Studies have shown that microRNAs (miRNAs) are closely associated with many human diseases, but we have not yet fully understand the role and potential molecular mechanisms of miRNAs in the process of disease development. However, ordinary biological experiments often require higher costs, and computational methods can be used to quickly and effectively predict the potential miRNA-disease association effect at a lower cost, and can be used as a useful reference for experimental methods. For miRNA-disease association prediction, we have proposed a new method called Matrix completion algorithm based on q-kernel information (QIMCMDA). We use fivefold cross-validation and leave-one-out cross-validation to prove the effectiveness of QIMCMDA. LOOCV shows that AUC can reach 0.9235, and its performance is significantly better than other commonly used technologies. In addition, we applied QIMCMDA to case studies of three human diseases, and the results show that our method performs well in inferring potential interaction between miRNAs and diseases. It is expected that QIMCMDA will become an excellent supplement in the field of biomedical research in the future.
Collapse
Affiliation(s)
- Lin Wang
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Yaguang Chen
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Wei Chen
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Jinan, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan, China
| |
Collapse
|
15
|
Ding Y, Tian LP, Lei X, Liao B, Wu FX. Variational graph auto-encoders for miRNA-disease association prediction. Methods 2020; 192:25-34. [PMID: 32798654 DOI: 10.1016/j.ymeth.2020.08.004] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 08/03/2020] [Accepted: 08/08/2020] [Indexed: 02/07/2023] Open
Abstract
Cumulative experimental studies have demonstrated the critical roles of microRNAs (miRNAs) in the diverse fundamental and important biological processes, and in the development of numerous complex human diseases. Thus, exploring the relationships between miRNAs and diseases is helpful with understanding the mechanisms, the detection, diagnosis, and treatment of complex diseases. As the identification of miRNA-disease associations via traditional biological experiments is time-consuming and expensive, an effective computational prediction method is appealing. In this study, we present a deep learning framework with variational graph auto-encoder for miRNA-disease association prediction (VGAE-MDA). VGAE-MDA first gets the representations of miRNAs and diseases from the heterogeneous networks constructed by miRNA-miRNA similarity, disease-disease similarity, and known miRNA-disease associations. Then, VGAE-MDA constructs two sub-networks: miRNA-based network and disease-based network. Combining the representations based on the heterogeneous network, two variational graph auto-encoders (VGAE) are deployed for calculating the miRNA-disease association scores from two sub-networks, respectively. Lastly, VGAE-MDA obtains the final predicted association score for a miRNA-disease pair by integrating the scores from these two trained networks. Unlike the previous model, the VGAE-MDA can mitigate the effect of noises from random selection of negative samples. Besides, the use of graph convolutional neural (GCN) network can naturally incorporate the node features from the graph structure while the variational autoencoder (VAE) makes use of latent variables to predict associations from the perspective of data distribution. The experimental results show that VGAE-MDA outperforms the state-of-the-art approaches in miRNA-disease association prediction. Besides, the effectiveness of our model has been further demonstrated by case studies.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Li-Ping Tian
- School of Information, Beijing Wuzi University, Beijing 101125, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada; Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada; Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
16
|
Ding Y, Chen B, Lei X, Liao B, Wu FX. Predicting novel CircRNA-disease associations based on random walk and logistic regression model. Comput Biol Chem 2020; 87:107287. [PMID: 32446243 DOI: 10.1016/j.compbiolchem.2020.107287] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/09/2020] [Indexed: 12/24/2022]
Abstract
Circular RNAs (circRNAs), a large group of small endogenous noncoding RNA molecules, have been proved to modulate protein-coding genes in the human genome. In recent years, many experimental studies have demonstrated that circRNAs are dysregulated in a number of diseases, and they can serve as biomarkers for disease diagnosis and prognosis. However, it is expensive and time-consuming to identify circRNA-disease associations by biological experiments and few computational models have been proposed for novel circRNA-disease association prediction. In this study, we develop a computational model based on the random walk and the logistic regression (RWLR) to predict circRNA-disease associations. Firstly, a circRNA-circRNA similarity network is constructed by calculating their functional similarity of circRNA based on circRNA-related gene ontology. Then, a random walk with restart is implemented on the circRNA similarity network, and the features of each pair of circRNA-disease are extracted based on the results of the random walk and the circRNA-disease association matrix. Finally, a logistic regression model is used to predict novel circRNA-disease associations. Leave one out validation (LOOCV), five-fold cross validation (5CV) and ten-fold cross validation (10CV) are adopted to evaluate the prediction performance of RWLR, by comparing with the latest two methods PWCDA and DWNN-RLS. The experiment results show that our RWLR has higher AUC values of LOOCV, 5CV and 10CV than the other two latest methods, which demonstrates that RWLR has a better performance than other computational methods. What's more, case studies also illustrate the reliability and effectiveness of RWLR for circRNA-disease association prediction.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 1L5, Canada
| | - Bolin Chen
- School of Computer Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 1L5, Canada; Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada; Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
17
|
Ding Y, Wang F, Lei X, Liao B, Wu FX. Deep belief network-Based Matrix Factorization Model for MicroRNA-Disease Associations Prediction. Evol Bioinform Online 2020; 16:1176934320919707. [PMID: 32523330 PMCID: PMC7235669 DOI: 10.1177/1176934320919707] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 03/11/2020] [Indexed: 12/11/2022] Open
Abstract
MicroRNAs (miRNAs) are small single-stranded noncoding RNAs that have shown to play a critical role in regulating gene expression. In past decades, cumulative experimental studies have verified that miRNAs are implicated in many complex human diseases and might be potential biomarkers for various types of diseases. With the increase of miRNA-related data and the development of analysis methodologies, some computational methods have been developed for predicting miRNA-disease associations, which are more economical and time-saving than traditional biological experimental approaches. In this study, a novel computational model, deep belief network (DBN)-based matrix factorization (DBN-MF), is proposed for miRNA-disease association prediction. First, the raw interaction features of miRNAs and diseases were obtained from the miRNA-disease adjacent matrix. Second, 2 DBNs were used for unsupervised learning of the features of miRNAs and diseases, respectively, based on the raw interaction features. Finally, a classifier consisting of 2 DBNs and a cosine score function was trained with the initial weights of DBN from the last step. During the training, the miRNA-disease adjacent matrix was factorized into 2 feature matrices for the representation of miRNAs and diseases, and the final prediction label was obtained according to the feature matrices. The experimental results show that the proposed model outperforms the state-of-the-art approaches in miRNA-disease association prediction based on the 10-fold cross-validation. Besides, the effectiveness of our model was further demonstrated by case studies.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Fei Wang
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.,Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.,Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
18
|
Wu M, Yang Y, Wang H, Ding J, Zhu H, Xu Y. IMPMD: An Integrated Method for Predicting Potential Associations Between miRNAs and Diseases. Curr Genomics 2020; 20:581-591. [PMID: 32581646 PMCID: PMC7290057 DOI: 10.2174/1389202920666191023090215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 10/16/2019] [Indexed: 01/06/2023] Open
Abstract
Background With the rapid development of biological research, microRNAs (miRNAs) have increasingly attracted worldwide attention. The increasing biological studies and scientific experiments have proven that miRNAs are related to the occurrence and development of a large number of key biological processes which cause complex human diseases. Thus, identifying the association between miRNAs and disease is helpful to diagnose the diseases. Although some studies have found considerable associations between miRNAs and diseases, there are still a lot of associations that need to be identified. Experimental methods to uncover miRNA-disease associations are time-consuming and expensive. Therefore, effective computational methods are urgently needed to predict new associations. Methodology In this work, we propose an integrated method for predicting potential associations between miRNAs and diseases (IMPMD). The enhanced similarity for miRNAs is obtained by combination of functional similarity, gaussian similarity and Jaccard similarity. To diseases, it is obtained by combination of semantic similarity, gaussian similarity and Jaccard similarity. Then, we use these two enhanced similarities to construct the features and calculate cumulative score to choose robust features. Finally, the general linear regression is applied to assign weights for Support Vector Machine, K-Nearest Neighbor and Logistic Regression algorithms. Results IMPMD obtains AUC of 0.9386 in 10-fold cross-validation, which is better than most of the previous models. To further evaluate our model, we implement IMPMD on two types of case studies for lung cancer and breast cancer. 49 (Lung Cancer) and 50 (Breast Cancer) out of the top 50 related miRNAs are validated by experimental discoveries. Conclusion We built a software named IMPMD which can be freely downloaded from https://github.com/Sunmile/IMPMD.
Collapse
Affiliation(s)
- Meiqi Wu
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Yingxi Yang
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Hui Wang
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Jun Ding
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Huan Zhu
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| | - Yan Xu
- 1Department of Information and Computer Science, University of Science and Technology Beijing, Beijing100083, China; 2Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China; 3Institute of Computing Technology, Chinese Academy of Sciences, Beijing100080, China
| |
Collapse
|
19
|
An improved random forest-based computational model for predicting novel miRNA-disease associations. BMC Bioinformatics 2019; 20:624. [PMID: 31795954 PMCID: PMC6889672 DOI: 10.1186/s12859-019-3290-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/21/2019] [Indexed: 01/29/2023] Open
Abstract
Background A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. Results Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model’s ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. Conclusions Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future.
Collapse
|
20
|
Huang Z, Liu L, Gao Y, Shi J, Cui Q, Li J, Zhou Y. Benchmark of computational methods for predicting microRNA-disease associations. Genome Biol 2019; 20:202. [PMID: 31594544 PMCID: PMC6781296 DOI: 10.1186/s13059-019-1811-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 09/03/2019] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND A series of miRNA-disease association prediction methods have been proposed to prioritize potential disease-associated miRNAs. Independent benchmarking of these methods is warranted to assess their effectiveness and robustness. RESULTS Based on more than 8000 novel miRNA-disease associations from the latest HMDD v3.1 database, we perform systematic comparison among 36 readily available prediction methods. Their overall performances are evaluated with rigorous precision-recall curve analysis, where 13 methods show acceptable accuracy (AUPRC > 0.200) while the top two methods achieve a promising AUPRC over 0.300, and most of these methods are also highly ranked when considering only the causal miRNA-disease associations as the positive samples. The potential of performance improvement is demonstrated by combining different predictors or adopting a more updated miRNA similarity matrix, which would result in up to 16% and 46% of AUPRC augmentations compared to the best single predictor and the predictors using the previous similarity matrix, respectively. Our analysis suggests a common issue of the available methods, which is that the prediction results are severely biased toward well-annotated diseases with many associated miRNAs known and cannot further stratify the positive samples by discriminating the causal miRNA-disease associations from the general miRNA-disease associations. CONCLUSION Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate miRNA-disease association predictors for their purpose, but also suggest the future directions for the development of more robust miRNA-disease association predictors.
Collapse
Affiliation(s)
- Zhou Huang
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Leibo Liu
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China
| | - Yuanxu Gao
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Jiangcheng Shi
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
- Center of Bioinformatics, Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China.
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.
| |
Collapse
|
21
|
Wiczling P, Daghir-Wojtkowiak E, Kaliszan R, Markuszewski MJ, Limon J, Koczkowska M, Stukan M, Kuźniacka A, Ratajska M. Bayesian multilevel model of micro RNA levels in ovarian-cancer and healthy subjects. PLoS One 2019; 14:e0221764. [PMID: 31465488 PMCID: PMC6715278 DOI: 10.1371/journal.pone.0221764] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 08/14/2019] [Indexed: 12/31/2022] Open
Abstract
In transcriptomics, micro RNAs (miRNAs) has gained much interest especially as potential disease indicators. However, apart from holding a great promise related to their clinical application, a lot of inconsistent results have been published. Our aim was to compare the miRNA expression levels in ovarian cancer and healthy subjects using the Bayesian multilevel model and to assess their potential usefulness in diagnosis. We have analyzed a case-control observational data on expression profiling of 49 preselected miRNA-based ovarian cancer indicators in 119 controls and 59 patients. A Bayesian multilevel model was used to characterize the effect of disease on miRNA levels controlling for differences in age and body weight. The difference between the miRNA level and health status of the patient on the scale of the data variability were discussed in the context of their potential usefulness in diagnosis. Additionally, the cross-validated area under the ROC curve (AUC) was used to assess the expected out-of-sample discrimination index of a different sets of miRNAs. The proposed model allowed us to describe the set of miRNA levels in patients and controls. Three highly correlated miRNAs: miR-101-3p, miR-142-5p, miR-148a-3p rank the highest with almost identical effect sizes that ranges from 0.45 to 1.0. For those miRNAs the credible interval for AUC ranged from 0.63 to 0.67 indicating their limited discrimination potential. A little benefit in adding information from other miRNAs was observed. There were several miRNAs in the dataset (miR-604, hsa-miR-221-5p) for which inferences were uncertain. For those miRNAs more experimental effort is needed to fully assess their effect in the context of new hits discovery and usefulness as disease indicators. The proposed multilevel Bayesian model can be used to characterize the panel of miRNA profile and to assess the difference in expression levels between healthy and cancer individuals.
Collapse
Affiliation(s)
- Paweł Wiczling
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gen. J. Hallera, Gdańsk, Poland
| | - Emilia Daghir-Wojtkowiak
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gen. J. Hallera, Gdańsk, Poland
| | - Roman Kaliszan
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gen. J. Hallera, Gdańsk, Poland
| | - Michał Jan Markuszewski
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Gen. J. Hallera, Gdańsk, Poland
| | - Janusz Limon
- Department of Biology and Genetics, Medical University of Gdańsk, Dębinki, Gdańsk, Poland
| | - Magdalena Koczkowska
- Department of Biology and Genetics, Medical University of Gdańsk, Dębinki, Gdańsk, Poland
| | - Maciej Stukan
- Department of Gynecological Oncology, Gdynia Oncology Centre, Powstania Styczniowego, Gdynia, Poland
| | - Alina Kuźniacka
- Department of Biology and Genetics, Medical University of Gdańsk, Dębinki, Gdańsk, Poland
| | - Magdalena Ratajska
- Department of Biology and Genetics, Medical University of Gdańsk, Dębinki, Gdańsk, Poland
| |
Collapse
|