1
|
Zhou F, Yin MM, Jiao CN, Zhao JX, Zheng CH, Liu JX. Predicting miRNA-Disease Associations Through Deep Autoencoder With Multiple Kernel Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5570-5579. [PMID: 34860656 DOI: 10.1109/tnnls.2021.3129772] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Determining microRNA (miRNA)-disease associations (MDAs) is an integral part in the prevention, diagnosis, and treatment of complex diseases. However, wet experiments to discern MDAs are inefficient and expensive. Hence, the development of reliable and efficient data integrative models for predicting MDAs is of significant meaning. In the present work, a novel deep learning method for predicting MDAs through deep autoencoder with multiple kernel learning (DAEMKL) is presented. Above all, DAEMKL applies multiple kernel learning (MKL) in miRNA space and disease space to construct miRNA similarity network and disease similarity network, respectively. Then, for each disease or miRNA, its feature representation is learned from the miRNA similarity network and disease similarity network via the regression model. After that, the integrated miRNA feature representation and disease feature representation are input into deep autoencoder (DAE). Furthermore, the novel MDAs are predicted through reconstruction error. Ultimately, the AUC results show that DAEMKL achieves outstanding performance. In addition, case studies of three complex diseases further prove that DAEMKL has excellent predictive performance and can discover a large number of underlying MDAs. On the whole, our method DAEMKL is an effective method to identify MDAs.
Collapse
|
2
|
Li Y, Zhang M, Shang J, Li F, Ren Q, Liu JX. iLncDA-RSN: identification of lncRNA-disease associations based on reliable similarity networks. Front Genet 2023; 14:1249171. [PMID: 37614816 PMCID: PMC10442839 DOI: 10.3389/fgene.2023.1249171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 07/27/2023] [Indexed: 08/25/2023] Open
Abstract
Identification of disease-associated long non-coding RNAs (lncRNAs) is crucial for unveiling the underlying genetic mechanisms of complex diseases. Multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, in this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential lncRNA-disease associations (LDAs). Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then Gaussian interaction profile (GIP) kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.
Collapse
Affiliation(s)
| | | | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, China
| | | | | | | |
Collapse
|
3
|
Shen Y, Liu JX, Yin MM, Zheng CH, Gao YL. BMPMDA: Prediction of MiRNA-Disease Associations Using a Space Projection Model Based on Block Matrix. Interdiscip Sci 2023; 15:88-99. [PMID: 36335274 DOI: 10.1007/s12539-022-00542-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/07/2022]
Abstract
With the high-quality development of bioinformatics technology, miRNA-disease associations (MDAs) are gradually being uncovered. At present, convenient and efficient prediction methods, which solve the problem of resource-consuming in traditional wet experiments, need to be further put forward. In this study, a space projection model based on block matrix is presented for predicting MDAs (BMPMDA). Specifically, two block matrices are first composed of the known association matrix and similarity to increase comprehensiveness. For the integrity of information in the heterogeneous network, matrix completion (MC) is utilized to mine potential MDAs. Considering the neighborhood information of data points, linear neighborhood similarity (LNS) is regarded as a measure of similarity. Next, LNS is projected onto the corresponding completed association matrix to derive the projection score. Finally, the AUC and AUPR values for BMPMDA reach 0.9691 and 0.6231, respectively. Additionally, the majority of novel MDAs in three disease cases are identified in existing databases and literature. It suggests that BMPMDA can serve as a reliable prediction model for biological research.
Collapse
Affiliation(s)
- Yi Shen
- Qufu Normal University, Rizhao, 276800, China
| | | | | | - Chun-Hou Zheng
- Co-Innovation Center for Information Supply and Assurance Technology, Anhui University, Hefei, 230000, China
| | - Ying-Lian Gao
- Library of Qufu Normal University, Qufu Normal University, Rizhao, 276800, China.
| |
Collapse
|
4
|
Zhou F, Yin MM, Zhao JX, Shang J, Liu JX. A Method Based On Dual-Network Information Fusion to Predict MiRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:52-60. [PMID: 34882558 DOI: 10.1109/tcbb.2021.3133006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
MicroRNAs (miRNAs) are single-stranded small RNAs. An increasing number of studies have shown that miRNAs play a vital role in many important biological processes. However, some experimental methods to predict unknown miRNA-disease associations (MDAs) are time-consuming and costly. Only a small percentage of MDAs are verified by researchers. Therefore, there is a great need for high-speed and efficient methods to predict novel MDAs. In this paper, a new computational method based on Dual-Network Information Fusion (DNIF) is developed to predict potential MDAs. Specifically, on the one hand, two enhanced sub-models are integrated to reconstruct an effective prediction framework; on the other hand, the prediction performance of the algorithm is improved by fully fusing multiple omics data information, including validated miRNA-disease associations network, miRNA functional similarity, disease semantic similarity and Gaussian interaction profile (GIP) kernel network associations. As a result, DNIF achieves the excellent performance under situation of 5-fold cross validation (average AUC of 0.9571). In the cases study of three important human diseases, our model has achieved satisfactory performance in predicting potential miRNAs for certain diseases. The reliable experimental results demonstrate that DNIF could serve as an effective calculation method to accelerate the identification of MDAs.
Collapse
|
5
|
SGAEMDA: Predicting miRNA-Disease Associations Based on Stacked Graph Autoencoder. Cells 2022; 11:cells11243984. [PMID: 36552748 PMCID: PMC9776508 DOI: 10.3390/cells11243984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 11/30/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
MicroRNA (miRNA)-disease association (MDA) prediction is critical for disease prevention, diagnosis, and treatment. Traditional MDA wet experiments, on the other hand, are inefficient and costly.Therefore, we proposed a multi-layer collaborative unsupervised training base model called SGAEMDA (Stacked Graph Autoencoder-Based Prediction of Potential miRNA-Disease Associations). First, from the original miRNA and disease data, we defined two types of initial features: similarity features and association features. Second, stacked graph autoencoder is then used to learn unsupervised low-dimensional representations of meaningful higher-order similarity features, and we concatenate the association features with the learned low-dimensional representations to obtain the final miRNA-disease pair features. Finally, we used a multilayer perceptron (MLP) to predict scores for unknown miRNA-disease associations. SGAEMDA achieved a mean area under the ROC curve of 0.9585 and 0.9516 in 5-fold and 10-fold cross-validation, which is significantly higher than the other baseline methods. Furthermore, case studies have shown that SGAEMDA can accurately predict candidate miRNAs for brain, breast, colon, and kidney neoplasms.
Collapse
|
6
|
Zhou F, Yin MM, Jiao CN, Cui Z, Zhao JX, Liu JX. Bipartite graph-based collaborative matrix factorization method for predicting miRNA-disease associations. BMC Bioinformatics 2021; 22:573. [PMID: 34837953 PMCID: PMC8627000 DOI: 10.1186/s12859-021-04486-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 11/17/2021] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND With the rapid development of various advanced biotechnologies, researchers in related fields have realized that microRNAs (miRNAs) play critical roles in many serious human diseases. However, experimental identification of new miRNA-disease associations (MDAs) is expensive and time-consuming. Practitioners have shown growing interest in methods for predicting potential MDAs. In recent years, an increasing number of computational methods for predicting novel MDAs have been developed, making a huge contribution to the research of human diseases and saving considerable time. In this paper, we proposed an efficient computational method, named bipartite graph-based collaborative matrix factorization (BGCMF), which is highly advantageous for predicting novel MDAs. RESULTS By combining two improved recommendation methods, a new model for predicting MDAs is generated. Based on the idea that some new miRNAs and diseases do not have any associations, we adopt the bipartite graph based on the collaborative matrix factorization method to complete the prediction. The BGCMF achieves a desirable result, with AUC of up to 0.9514 ± (0.0007) in the five-fold cross-validation experiments. CONCLUSIONS Five-fold cross-validation is used to evaluate the capabilities of our method. Simulation experiments are implemented to predict new MDAs. More importantly, the AUC value of our method is higher than those of some state-of-the-art methods. Finally, many associations between new miRNAs and new diseases are successfully predicted by performing simulation experiments, indicating that BGCMF is a useful method to predict more potential miRNAs with roles in various diseases.
Collapse
Affiliation(s)
- Feng Zhou
- The School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Meng-Meng Yin
- The School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Cui-Na Jiao
- The School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Zhen Cui
- The School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Jing-Xiu Zhao
- The School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- The School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|
7
|
Nguyen VT, Le TTK, Than K, Tran DH. Predicting miRNA-disease associations using improved random walk with restart and integrating multiple similarities. Sci Rep 2021; 11:21071. [PMID: 34702958 PMCID: PMC8548500 DOI: 10.1038/s41598-021-00677-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 10/15/2021] [Indexed: 12/20/2022] Open
Abstract
Predicting beneficial and valuable miRNA-disease associations (MDAs) by doing biological laboratory experiments is costly and time-consuming. Proposing a forceful and meaningful computational method for predicting MDAs is essential and captivated many computer scientists in recent years. In this paper, we proposed a new computational method to predict miRNA-disease associations using improved random walk with restart and integrating multiple similarities (RWRMMDA). We used a WKNKN algorithm as a pre-processing step to solve the problem of sparsity and incompletion of data to reduce the negative impact of a large number of missing associations. Two heterogeneous networks in disease and miRNA spaces were built by integrating multiple similarity networks, respectively, and different walk probabilities could be designated to each linked neighbor node of the disease or miRNA node in line with its degree in respective networks. Finally, an improve extended random walk with restart algorithm based on miRNA similarity-based and disease similarity-based heterogeneous networks was used to calculate miRNA-disease association prediction probabilities. The experiments showed that our proposed method achieved a momentous performance with Global LOOCV AUC (Area Under Roc Curve) and AUPR (Area Under Precision-Recall Curve) values of 0.9882 and 0.9066, respectively. And the best AUC and AUPR values under fivefold cross-validation of 0.9855 and 0.8642 which are proven by statistical tests, respectively. In comparison with other previous related methods, it outperformed than NTSHMDA, PMFMDA, IMCMDA and MCLPMDA methods in both AUC and AUPR values. In case studies of Breast Neoplasms, Carcinoma Hepatocellular and Stomach Neoplasms diseases, it inferred 1, 12 and 7 new associations out of top 40 predicted associated miRNAs for each disease, respectively. All of these new inferred associations have been confirmed in different databases or literatures.
Collapse
Affiliation(s)
- Van Tinh Nguyen
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
- Faculty of Information Technology, Hanoi University of Industry, 298 Cau Dien Street, Bac Tu Liem District, Hanoi, Vietnam
| | - Thi Tu Kien Le
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
| | - Khoat Than
- Hanoi University of Science and Technology, Hanoi, Vietnam
| | - Dang Hung Tran
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam.
| |
Collapse
|
8
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
9
|
SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 2021; 17:e1009165. [PMID: 34252084 PMCID: PMC8345837 DOI: 10.1371/journal.pcbi.1009165] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/06/2021] [Accepted: 06/08/2021] [Indexed: 11/21/2022] Open
Abstract
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases. Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.
Collapse
|
10
|
Chu Y, Wang X, Dai Q, Wang Y, Wang Q, Peng S, Wei X, Qiu J, Salahub DR, Xiong Y, Wei DQ. MDA-GCNFTG: identifying miRNA-disease associations based on graph convolutional networks via graph sampling through the feature and topology graph. Brief Bioinform 2021; 22:6261915. [PMID: 34009265 DOI: 10.1093/bib/bbab165] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 04/02/2021] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate identification of the miRNA-disease associations (MDAs) helps to understand the etiology and mechanisms of various diseases. However, the experimental methods are costly and time-consuming. Thus, it is urgent to develop computational methods towards the prediction of MDAs. Based on the graph theory, the MDA prediction is regarded as a node classification task in the present study. To solve this task, we propose a novel method MDA-GCNFTG, which predicts MDAs based on Graph Convolutional Networks (GCNs) via graph sampling through the Feature and Topology Graph to improve the training efficiency and accuracy. This method models both the potential connections of feature space and the structural relationships of MDA data. The nodes of the graphs are represented by the disease semantic similarity, miRNA functional similarity and Gaussian interaction profile kernel similarity. Moreover, we considered six tasks simultaneously on the MDA prediction problem at the first time, which ensure that under both balanced and unbalanced sample distribution, MDA-GCNFTG can predict not only new MDAs but also new diseases without known related miRNAs and new miRNAs without known related diseases. The results of 5-fold cross-validation show that the MDA-GCNFTG method has achieved satisfactory performance on all six tasks and is significantly superior to the classic machine learning methods and the state-of-the-art MDA prediction methods. Moreover, the effectiveness of GCNs via the graph sampling strategy and the feature and topology graph in MDA-GCNFTG has also been demonstrated. More importantly, case studies for two diseases and three miRNAs are conducted and achieved satisfactory performance.
Collapse
Affiliation(s)
- Yanyi Chu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Xuhong Wang
- School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University, China
| | - Qiuying Dai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Yanjing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Qiankun Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, China
| | | | | | - Dennis Russell Salahub
- Department of Chemistry, University of Calgary, Fellow Royal Society of Canada and Fellow of the American Association for the Advancement of Science, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P.R. China
| |
Collapse
|
11
|
Zhu R, Wang Y, Liu JX, Dai LY. IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier. BMC Bioinformatics 2021; 22:175. [PMID: 33794766 PMCID: PMC8017839 DOI: 10.1186/s12859-021-04104-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 03/24/2021] [Indexed: 12/23/2022] Open
Abstract
Background Identifying lncRNA-disease associations not only helps to better comprehend the underlying mechanisms of various human diseases at the lncRNA level but also speeds up the identification of potential biomarkers for disease diagnoses, treatments, prognoses, and drug response predictions. However, as the amount of archived biological data continues to grow, it has become increasingly difficult to detect potential human lncRNA-disease associations from these enormous biological datasets using traditional biological experimental methods. Consequently, developing new and effective computational methods to predict potential human lncRNA diseases is essential. Results Using a combination of incremental principal component analysis (IPCA) and random forest (RF) algorithms and by integrating multiple similarity matrices, we propose a new algorithm (IPCARF) based on integrated machine learning technology for predicting lncRNA-disease associations. First, we used two different models to compute a semantic similarity matrix of diseases from a directed acyclic graph of diseases. Second, a characteristic vector for each lncRNA-disease pair is obtained by integrating disease similarity, lncRNA similarity, and Gaussian nuclear similarity. Then, the best feature subspace is obtained by applying IPCA to decrease the dimension of the original feature set. Finally, we train an RF model to predict potential lncRNA-disease associations. The experimental results show that the IPCARF algorithm effectively improves the AUC metric when predicting potential lncRNA-disease associations. Before the parameter optimization procedure, the AUC value predicted by the IPCARF algorithm under 10-fold cross-validation reached 0.8529; after selecting the optimal parameters using the grid search algorithm, the predicted AUC of the IPCARF algorithm reached 0.8611. Conclusions We compared IPCARF with the existing LRLSLDA, LRLSLDA-LNCSIM, TPGLDA, NPCMF, and ncPred prediction methods, which have shown excellent performance in predicting lncRNA-disease associations. The compared results of 10-fold cross-validation procedures show that the predictions of the IPCARF method are better than those of the other compared methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04104-9.
Collapse
Affiliation(s)
- Rong Zhu
- School of Computer Science, Qufu Normal University, Rizhao, China.,Department of Internet of Things Engineering, Wuxi Taihu University, Wuxi, China
| | - Yong Wang
- Experimental Teaching Center, Qufu Normal University, Rizhao, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Ling-Yun Dai
- School of Computer Science, Qufu Normal University, Rizhao, China.
| |
Collapse
|
12
|
Gao MM, Cui Z, Gao YL, Wang J, Liu JX. Multi-Label Fusion Collaborative Matrix Factorization for Predicting LncRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:881-890. [PMID: 32324583 DOI: 10.1109/jbhi.2020.2988720] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As we all know, science and technology are developing faster and faster. Many experts and scholars have demonstrated that human diseases are related to lncRNA, but only a few associations have been confirmed, and many unknown associations need to be found. In the process of finding associations, it takes a lot of time, so finding an efficient way to predict the associations between lncRNAs and diseases is particularly important. In this paper, we propose a multi-label fusion collaborative matrix factorization (MLFCMF) approach for predicting lncRNA-disease associations (LDAs). Firstly, the lncRNA space and disease space are optimized by multi-label to enhance the intrinsic link between lncRNA and disease and to tap potential information. Multi-label learning can encode a variety of data information from the sample space. Secondly, to learn multi-label information in the data space, the fusion method is used to handle the relationship between multiple labels. More comprehensive information will be obtained by weighing the effects of different labels. The addition of Gaussian interaction profile (GIP) kernel can increase the network similarity. Finally, the lncRNA-disease associations are predicted by the method of collaborative matrix factorization. The ten-fold cross-validation method is used to evaluate the MLFCMF method, and our method finally obtains an AUC value of 0.8612. Detailed analysis of ovarian cancer, colorectal cancer, and lung cancer in the simulation experiment results. So it can be seen that our method MLFCMF is an effective model for predicting lncRNA-disease associations.
Collapse
|
13
|
Liu JX, Cui Z, Gao YL, Kong XZ. WGRCMF: A Weighted Graph Regularized Collaborative Matrix Factorization Method for Predicting Novel LncRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:257-265. [PMID: 32287024 DOI: 10.1109/jbhi.2020.2985703] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In recent years, many human diseases have been determined to be associated with certain lncRNAs. Only a small percentage of all lncRNA-disease associations (LDAs) have been discovered by researchers. Predicting novel LDAs is time-consuming and costly. It is crucial to propose a method that can effectively identify potential LDAs to solve this problem based on the available datasets. Although some current methods can effectively predict potential LDAs, the prediction accuracy needs to be improved, and there are few known associations. Moreover, there are notable errors in the method of constructing the network and the bipartite graph, which interfere with the final results. A weighted graph regularized collaborative matrix factorization (WGRCMF) method is proposed to predict novel LDAs. We introduce the graph regularization terms into the collaborative matrix factorization. Considering that manifold learning can recover low-dimensional manifold structures from high-dimensional sampled data, we can find low-dimensional manifolds in high-dimensional space. In addition, a weight matrix is also introduced into the method, the significance of which is to prevent unknown associations from contributing to the final prediction matrix. Finally, the prediction accuracy of this method is better than those of other methods. In several cancer cases, we implemented the corresponding simulation experiments. According to the experimental results, the proposed method is feasible and effective.
Collapse
|
14
|
Wu TR, Yin MM, Jiao CN, Gao YL, Kong XZ, Liu JX. MCCMF: collaborative matrix factorization based on matrix completion for predicting miRNA-disease associations. BMC Bioinformatics 2020; 21:454. [PMID: 33054708 PMCID: PMC7556955 DOI: 10.1186/s12859-020-03799-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 10/02/2020] [Indexed: 02/06/2023] Open
Abstract
Background MicroRNAs (miRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a method, collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations. Results The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix. Then the Weight K Nearest Known Neighbors method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the fivefold cross-validation, with an AUC of 0.9569 (0.0005). Conclusions The AUC value of MCCMF is higher than other advanced methods in the fivefold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, accuracy, precision, recall and f-measure are also added. The final experimental results demonstrate that MCCMF outperforms other methods in predicting miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.
Collapse
Affiliation(s)
- Tian-Ru Wu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Meng-Meng Yin
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Cui-Na Jiao
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Ying-Lian Gao
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Xiang-Zhen Kong
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|
15
|
Le DH, Tran TTH. RWRMTN: a tool for predicting disease-associated microRNAs based on a microRNA-target gene network. BMC Bioinformatics 2020; 21:244. [PMID: 32539680 PMCID: PMC7296691 DOI: 10.1186/s12859-020-03578-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 06/01/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The misregulation of microRNA (miRNA) has been shown to cause diseases. Recently, we have proposed a computational method based on a random walk framework on a miRNA-target gene network to predict disease-associated miRNAs. The prediction performance of our method is better than that of some existing state-of-the-art network- and machine learning-based methods since it exploits the mutual regulation between miRNAs and their target genes in the miRNA-target gene interaction networks. RESULTS To facilitate the use of this method, we have developed a Cytoscape app, named RWRMTN, to predict disease-associated miRNAs. RWRMTN can work on any miRNA-target gene network. Highly ranked miRNAs are supported with evidence from the literature. They then can also be visualized based on the rankings and in relationships with the query disease and their target genes. In addition, automation functions are also integrated, which allow RWRMTN to be used in workflows from external environments. We demonstrate the ability of RWRMTN in predicting breast and lung cancer-associated miRNAs via workflows in Cytoscape and other environments. CONCLUSIONS Considering a few computational methods have been developed as software tools for convenient uses, RWRMTN is among the first GUI-based tools for the prediction of disease-associated miRNAs which can be used in workflows in different environments.
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, No 7, Bang Lang 1 Street, Viet Hung Ward, Long Bien District, Hanoi, Vietnam.
| | - Trang T H Tran
- Department of Computational Biomedicine, Vingroup Big Data Institute, No 7, Bang Lang 1 Street, Viet Hung Ward, Long Bien District, Hanoi, Vietnam
| |
Collapse
|
16
|
Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020; 18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open
Abstract
Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph's local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, United States
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, United States
| |
Collapse
|
17
|
A random forest based computational model for predicting novel lncRNA-disease associations. BMC Bioinformatics 2020; 21:126. [PMID: 32216744 PMCID: PMC7099795 DOI: 10.1186/s12859-020-3458-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/18/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources. RESULTS To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models. CONCLUSIONS Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.
Collapse
|
18
|
Tan H, Sun Q, Li G, Xiao Q, Ding P, Luo J, Liang C. Multiview Consensus Graph Learning for lncRNA-Disease Association Prediction. Front Genet 2020; 11:89. [PMID: 32153646 PMCID: PMC7047769 DOI: 10.3389/fgene.2020.00089] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Accepted: 01/27/2020] [Indexed: 12/11/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) are a class of noncoding RNA molecules longer than 200 nucleotides. Recent studies have uncovered their functional roles in diverse cellular processes and tumorigenesis. Therefore, identifying novel disease-related lncRNAs might deepen our understanding of disease etiology. However, due to the relatively small number of verified associations between lncRNAs and diseases, it remains a challenging task to reliably and effectively predict the associated lncRNAs for given diseases. In this paper, we propose a novel multiview consensus graph learning method to infer potential disease-related lncRNAs. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases by taking advantage of the known associations. We then iteratively learn a consensus graph from the multiple input matrices and simultaneously optimize the predicted association probability based on a multi-label learning framework. To convey the utility of our method, three state-of-the-art methods are compared with our method on three widely used datasets. The experiment results illustrate that our method could obtain the best prediction performance under different cross validation schemes. The case study analysis implemented for uterine cervical neoplasms further confirmed the utility of our method in identifying lncRNAs as potential prognostic biomarkers in practice.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Quanmeng Sun
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
19
|
Ha J, Park C, Park C, Park S. IMIPMF: Inferring miRNA-disease interactions using probabilistic matrix factorization. J Biomed Inform 2020; 102:103358. [DOI: 10.1016/j.jbi.2019.103358] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 11/11/2019] [Accepted: 12/12/2019] [Indexed: 12/09/2022]
|
20
|
Lee B, Zhang S, Poleksic A, Xie L. Heterogeneous Multi-Layered Network Model for Omics Data Integration and Analysis. Front Genet 2020; 10:1381. [PMID: 32063919 PMCID: PMC6997577 DOI: 10.3389/fgene.2019.01381] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 12/18/2019] [Indexed: 01/08/2023] Open
Abstract
Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven successful in integrating diverse biological data for the representation of the hierarchy of biological system. The HMLN provides unparalleled opportunities but imposes new computational challenges on establishing causal genotype-phenotype associations and understanding environmental impact on organisms. In this review, we focus on the recent advances in developing novel computational methods for the inference of novel biological relations from the HMLN. We first discuss the properties of biological HMLN. Then we survey four categories of state-of-the-art methods (matrix factorization, random walk, knowledge graph, and deep learning). Thirdly, we demonstrate their applications to omics data integration and analysis. Finally, we outline strategies for future directions in the development of new HMLN models.
Collapse
Affiliation(s)
- Bohyun Lee
- Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States
| | - Shuo Zhang
- Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States
| | - Aleksandar Poleksic
- Department of Computer Science, The University of Northern Iowa, Cedar Falls, IA, United States
| | - Lei Xie
- Ph.D. Program in Computer Science, The City University of New York, New York, NY, United States
- Ph.D. Program in Biochemistry and Biology, The City University of New York, New York, NY, United States
- Department of Computer Science, Hunter College, The City University of New York, New York, NY, United States
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, Ithaca, NY, United States
| |
Collapse
|
21
|
Cui Z, Liu JX, Gao YL, Zheng CH, Wang J. RCMF: a robust collaborative matrix factorization method to predict miRNA-disease associations. BMC Bioinformatics 2019; 20:686. [PMID: 31874608 PMCID: PMC6929455 DOI: 10.1186/s12859-019-3260-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Predicting miRNA-disease associations (MDAs) is time-consuming and expensive. It is imminent to improve the accuracy of prediction results. So it is crucial to develop a novel computing technology to predict new MDAs. Although some existing methods can effectively predict novel MDAs, there are still some shortcomings. Especially when the disease matrix is processed, its sparsity is an important factor affecting the final results. Results A robust collaborative matrix factorization (RCMF) is proposed to predict novel MDAs. The L2,1-norm are introduced to our method to achieve the highest AUC value than other advanced methods. Conclusions 5-fold cross validation is used to evaluate our method, and simulation experiments are used to predict novel associations on Gold Standard Dataset. Finally, our prediction accuracy is better than other existing advanced methods. Therefore, our approach is effective and feasible in predicting novel MDAs.
Collapse
Affiliation(s)
- Zhen Cui
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China. .,Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, China.
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, 276826, China
| | - Chun-Hou Zheng
- Co-Innovation Center for Information Supply & Assurance Technology, Anhui University, Hefei, 230601, China
| | - Juan Wang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| |
Collapse
|