1
|
Su Y, Liu J, Wu Q, Gao Z, Wang J, Li H, Zheng C. AMPFLDAP: Adaptive Message Passing and Feature Fusion on Heterogeneous Network for LncRNA-Disease Associations Prediction. Interdiscip Sci 2024:10.1007/s12539-024-00610-5. [PMID: 38581626 DOI: 10.1007/s12539-024-00610-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 01/03/2024] [Accepted: 01/03/2024] [Indexed: 04/08/2024]
Abstract
Exploration of the intricate connections between long noncoding RNA (lncRNA) and diseases, referred to as lncRNA-disease associations (LDAs), plays a pivotal and indispensable role in unraveling the underlying molecular mechanisms of diseases and devising practical treatment approaches. It is imperative to employ computational methods for predicting lncRNA-disease associations to circumvent the need for superfluous experimental endeavors. Graph-based learning models have gained substantial popularity in predicting these associations, primarily because of their capacity to leverage node attributes and relationships within the network. Nevertheless, there remains much room for enhancing the performance of these techniques by incorporating and harmonizing the node attributes more effectively. In this context, we introduce a novel model, i.e., Adaptive Message Passing and Feature Fusion (AMPFLDAP), for forecasting lncRNA-disease associations within a heterogeneous network. Firstly, we constructed a heterogeneous network involving lncRNA, microRNA (miRNA), and diseases based on established associations and employing Gaussian interaction profile kernel similarity as a measure. Then, an adaptive topological message passing mechanism is suggested to address the information aggregation for heterogeneous networks. The topological features of nodes in the heterogeneous network were extracted based on the adaptive topological message passing mechanism. Moreover, an attention mechanism is applied to integrate both topological and semantic information to achieve the multimodal features of biomolecules, which are further used to predict potential LDAs. The experimental results demonstrated that the performance of the proposed AMPFLDAP is superior to seven state-of-the-art methods. Furthermore, to validate its efficacy in practical scenarios, we conducted detailed case studies involving three distinct diseases, which conclusively demonstrated AMPFLDAP's effectiveness in the prediction of LDAs.
Collapse
Affiliation(s)
- Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Jingjing Liu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Qingwen Wu
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Zhen Gao
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Jing Wang
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
| | - Haitao Li
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| |
Collapse
|
2
|
Lu C, Xie M. LDAEXC: LncRNA-Disease Associations Prediction with Deep Autoencoder and XGBoost Classifier. Interdiscip Sci 2023:10.1007/s12539-023-00573-z. [PMID: 37308797 DOI: 10.1007/s12539-023-00573-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 05/14/2023] [Accepted: 05/15/2023] [Indexed: 06/14/2023]
Abstract
Numerous scientific evidences have revealed that long non-coding RNAs (lncRNAs) are involved in the progression of human complex diseases and biological life activities. Therefore, identifying novel and potential disease-related lncRNAs is helpful to diagnosis, prognosis and therapy of many human complex diseases. Since traditional laboratory experiments are cost and time-consuming, a great quantity of computer algorithms have been proposed for predicting the relationships between lncRNAs and diseases. However, there are still much room for the improvement. In this paper, we introduce an accurate framework named LDAEXC to infer LncRNA-Disease Associations with deep autoencoder and XGBoost Classifier. LDAEXC utilizes different similarity views of lncRNAs and human diseases to construct features for each data sources. Then, the reduced features are obtained by feeding the constructed feature vectors into a deep autoencoder, and at last an XGBoost classifier is leveraged to calculate the latent lncRNA-disease-associated scores using reduced features. The fivefold cross-validation experiments on four datasets showed that LDAEXC reached AUC scores of 0.9676 ± 0.0043, 0.9449 ± 0.022, 0.9375 ± 0.0331 and 0.9556 ± 0.0134, respectively, significantly higher than other advanced similar computer methods. Extensive experiment results and case studies of two complex diseases (colon and breast cancers) further indicated the practicability and excellent prediction performance of LDAEXC in inferring unknown lncRNA-disease associations. TLDAEXC utilizes disease semantic similarity, lncRNA expression similarity, and Gaussian interaction profile kernel similarity of lncRNAs and diseases for feature construction. The constructed features are fed to a deep autoencoder to extract reduced features, and an XGBoost classifier is used to predict the lncRNA-disease associations based on the reduced features. The fivefold and tenfold cross-validation experiments on a benchmark dataset showed that LDAEXC could achieve AUC scores of 0.9676 and 0.9682, respectively, significantly higher than other state-of-the-art similar methods.
Collapse
Affiliation(s)
- Cuihong Lu
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China.
| |
Collapse
|
3
|
Teng Z, Shi L, Yu H, Wu C, Tian Z. Measuring functional similarity of lncRNAs based on variable K-mer profiles of nucleotide sequences. Methods 2023; 212:21-30. [PMID: 36813016 DOI: 10.1016/j.ymeth.2023.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 02/10/2023] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open
Abstract
Long non-coding RNAs are a class of essential non-coding RNAs with a length of more than 200 nts. Recent studies have indicated that lncRNAs have various complex regulatory functions, which play great impacts on many fundamental biological processes. However, measuring the functional similarity between lncRNAs by traditional wet-experiments is time-consuming and labor intensive, computational-based approaches have been an effective choice to tackle this problem. Meanwhile, most sequences-based computation methods measure the functional similarity of lncRNAs with their fixed length vector representations, which could not capture the features on larger k-mers. Therefore, it is urgent to improve the predict performance of the potential regulatory functions of lncRNAs. In this study, we propose a novel approach called MFSLNC to comprehensively measure functional similarity of lncRNAs based on variable k-mer profiles of nucleotide sequences. MFSLNC employs the dictionary tree storage, which could comprehensively represent lncRNAs with long k-mers. The functional similarity between lncRNAs is evaluated by the Jaccard similarity. MFSLNC verified the similarity between two lncRNAs with the same mechanism, detecting homologous sequence pairs between human and mouse. Besides, MFSLNC is also applied to lncRNA-disease associations, combined with the association prediction model WKNKN. Moreover, we also proved that our method can more effectively calculate the similarity of lncRNAs by comparing with the classical methods based on the lncRNA-mRNA association data. The detected AUC value of prediction is 0.867, which achieves good performance in the comparison of similar models.
Collapse
Affiliation(s)
- Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Linyue Shi
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Haihao Yu
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin 150040, China
| | - Chengyan Wu
- Baotou Teacher's College, Inner Mongolia University of Science and Technology, Baotou 014030, China
| | - Zhen Tian
- College of Information Engineering, Zhengzhou University, Zhengzhou 450001, China.
| |
Collapse
|
4
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
5
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
6
|
Du XX, Liu Y, Wang B, Zhang JF. lncRNA-disease association prediction method based on the nearest neighbor matrix completion model. Sci Rep 2022; 12:21653. [PMID: 36522410 PMCID: PMC9755128 DOI: 10.1038/s41598-022-25730-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
State-of-the-art medical studies proved that long noncoding ribonucleic acids (lncRNAs) are closely related to various diseases. However, their large-scale detection in biological experiments is problematic and expensive. To aid screening and improve the efficiency of biological experiments, this study introduced a prediction model based on the nearest neighbor concept for lncRNA-disease association prediction. We used a new similarity algorithm in the model that fused potential associations. The experimental validation of the proposed algorithm proved its superiority over the available Cosine, Pearson, and Jaccard similarity algorithms. Satisfactory results in the comparative leave-one-out cross-validation test (with AUC = 0.96) confirmed its excellent predictive performance. Finally, the proposed model's reliability was confirmed by performing predictions using a new dataset, yielding AUC = 0.92.
Collapse
Affiliation(s)
- Xiao-xin Du
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| | - Yan Liu
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| | - Bo Wang
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| | - Jian-fei Zhang
- grid.412616.60000 0001 0002 2355College of Computer and Control, Qiqihar University, Qiqihar, 161006 China
| |
Collapse
|
7
|
Wang MN, Lei LL, He W, Ding DW. SPCMLMI: A structural perturbation-based matrix completion method to predict lncRNA–miRNA interactions. Front Genet 2022; 13:1032428. [DOI: 10.3389/fgene.2022.1032428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 10/28/2022] [Indexed: 11/17/2022] Open
Abstract
Accumulating evidence indicated that the interaction between lncRNA and miRNA is crucial for gene regulation, which can regulate gene transcription, further affecting the occurrence and development of many complex diseases. Accurate identification of interactions between lncRNAs and miRNAs is helpful for the diagnosis and therapeutics of complex diseases. However, the number of known interactions of lncRNA with miRNA is still very limited, and identifying their interactions through biological experiments is time-consuming and expensive. There is an urgent need to develop more accurate and efficient computational methods to infer lncRNA–miRNA interactions. In this work, we developed a matrix completion approach based on structural perturbation to infer lncRNA–miRNA interactions (SPCMLMI). Specifically, we first calculated the similarities of lncRNA and miRNA, including the lncRNA expression profile similarity, miRNA expression profile similarity, lncRNA sequence similarity, and miRNA sequence similarity. Second, a bilayer network was constructed by integrating the known interaction network, lncRNA similarity network, and miRNA similarity network. Finally, a structural perturbation-based matrix completion method was used to predict potential interactions of lncRNA with miRNA. To evaluate the prediction performance of SPCMLMI, five-fold cross validation and a series of comparison experiments were implemented. SPCMLMI achieved AUCs of 0.8984 and 0.9891 on two different datasets, which is superior to other compared methods. Case studies for lncRNA XIST and miRNA hsa-mir-195–5-p further confirmed the effectiveness of our method in inferring lncRNA–miRNA interactions. Furthermore, we found that the structural consistency of the bilayer network was higher than that of other related networks. The results suggest that SPCMLMI can be used as a useful tool to predict interactions between lncRNAs and miRNAs.
Collapse
|
8
|
Zhang Y, Ye F, Gao X. MCA-Net: Multi-Feature Coding and Attention Convolutional Neural Network for Predicting lncRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2907-2919. [PMID: 34283719 DOI: 10.1109/tcbb.2021.3098126] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
With the advent of the era of big data, it is troublesome to accurately predict the associations between lncRNAs and diseases based on traditional biological experiments due to its time-consuming and subjective. In this paper, we propose a novel deep learning method for predicting lncRNA-disease associations using multi-feature coding and attention convolutional neural network (MCA-Net). We first calculate six similarity features to extract different types of lncRNA and disease feature information. Second, a multi-feature coding method is proposed to construct the feature vectors of lncRNA-disease association samples by integrating the six similarity features. Furthermore, an attention convolutional neural network is developed to identify lncRNA-disease associations under 10-fold cross-validation. Finally, we evaluate the performance of MCA-Net from different perspectives including the effects of the model parameters, distinct deep learning models, and the necessity of attention mechanism. We also compare MCA-Net with several state-of-the-art methods on three publicly available datasets, i.e., LncRNADisease, Lnc2Cancer, and LncRNADisease2.0. The results show that our MCA-Net outperforms the state-of-the-art methods on all three dataset. Besides, case studies on breast cancer and lung cancer further verify that MCA-Net is effective and accurate for the lncRNA-disease association prediction.
Collapse
|
9
|
Yao D, Zhang T, Zhan X, Zhang S, Zhan X, Zhang C. Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations. Front Genet 2022; 13:995532. [PMID: 36092871 PMCID: PMC9448985 DOI: 10.3389/fgene.2022.995532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 08/01/2022] [Indexed: 11/20/2022] Open
Abstract
More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- *Correspondence: Dengju Yao,
| | - Tao Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Shuli Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South University of Science and Technology, Shenzhen, China
| | - Chao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
10
|
Wang B, Liu R, Zheng X, Du X, Wang Z. lncRNA-disease association prediction based on matrix decomposition of elastic network and collaborative filtering. Sci Rep 2022; 12:12700. [PMID: 35882886 PMCID: PMC9325687 DOI: 10.1038/s41598-022-16594-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022] Open
Abstract
In recent years, with the continuous development and innovation of high-throughput biotechnology, more and more evidence show that lncRNA plays an essential role in biological life activities and is related to the occurrence of various diseases. However, due to the high cost and time-consuming of traditional biological experiments, the number of associations between lncRNAs and diseases that rely on experiments to verify is minimal. Computer-aided study of lncRNA-disease association is an important method to study the development of the lncRNA-disease association. Using the existing data to establish a prediction model and predict the unknown lncRNA-disease association can make the biological experiment targeted and improve its accuracy of the biological experiment. Therefore, we need to find an accurate and efficient method to predict the relationship between lncRNA and diseases and help biologists complete the diagnosis and treatment of diseases. Most of the current lncRNA-disease association predictions do not consider the model instability caused by the actual data. Also, predictive models may produce data that overfit is not considered. This paper proposes a lncRNA-disease association prediction model (ENCFLDA) that combines an elastic network with matrix decomposition and collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association between unknown lncRNA and disease, updates the matrix by matrix decomposition combined with the elastic network, and then obtains the final prediction matrix by collaborative filtering. This method uses the existing lncRNA-miRNA association data and miRNA-disease association data to predict the association of unknown lncRNAs with diseases. First, since the known lncRNA-disease association matrix is very sparse, the cosine similarity and KNN are used to update the lncRNA-disease association matrix. The matrix is then updated by matrix decomposition combined with an elastic net algorithm, to increase the stability of the overall prediction model and eliminate data overfitting. The final prediction matrix is then obtained through collaborative filtering based on lncRNA.Through simulation experiments, the results show that the AUC value of ENCFLDA can reach 0.9148 under the framework of LOOCV, which is higher than the prediction result of the latest model.
Collapse
Affiliation(s)
- Bo Wang
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China.
| | - RunJie Liu
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - XiaoDong Zheng
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - XiaoXin Du
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| | - ZhengFei Wang
- College of Computer and Control, Qiqihar University, Qiqihar, 161006, China
| |
Collapse
|
11
|
Zhou L, Tang Y, Yan G. A New Estimation Method for the Biological Interaction Predicting Problems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1415-1423. [PMID: 33406043 DOI: 10.1109/tcbb.2021.3049642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
For the past decades, computational methods have been developed to predict various interactions in biological problems. Usually these methods treated the predicting problems as semi-supervised problem or positive-unlabeled(PU) learning problem. Researchers focused on the prediction of unlabeled samples and hoped to find novel interactions in the datasets they collected. However, most of the computational methods could only predict a small proportion of undiscovered interactions and the total number was unknown. In this paper, we developed an estimation method with deep learning to calculate the number of undiscovered interactions in the unlabeled samples, derived its asymptotic interval estimation, and applied it to the compound synergism dataset, drug-target interaction(DTI) dataset and MicroRNA-disease interaction dataset successfully. Moreover, this method could reveal which dataset contained more undiscovered interactions and would be a guidance for the experimental validation. Furthermore, we compared our method with some mixture proportion estimators and demonstarted the efficacy of our method. Finally, we proved that AUC and AUPR were related with the number of undiscovered interactions, which was regarded as another evaluation indicator for the computational methods.
Collapse
|
12
|
Yang L, Li LP, Yi HC. DeepWalk based method to predict lncRNA-miRNA associations via lncRNA-miRNA-disease-protein-drug graph. BMC Bioinformatics 2022; 22:621. [PMID: 35216549 PMCID: PMC8875942 DOI: 10.1186/s12859-022-04579-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 01/18/2022] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) play a crucial role in diverse biological processes and have been confirmed to be concerned with various diseases. Largely uncharacterized of the physiological role and functions of lncRNA remains. MicroRNAs (miRNAs), which are usually 20-24 nucleotides, have several critical regulatory parts in cells. LncRNA can be regarded as a sponge to adsorb miRNA and indirectly regulate transcription and translation. Thus, the identification of lncRNA-miRNA associations is essential and valuable. RESULTS In our work, we present DWLMI to infer the potential associations between lncRNAs and miRNAs by representing them as vectors via a lncRNA-miRNA-disease-protein-drug graph. Specifically, DeepWalk can be used to learn the behavior representation of vertices. The methods of fingerprint, k-mer and MeSH descriptors were mainly used to learn the attribute representation of vertices. By combining the above two kinds of information, unknown lncRNA-miRNA associations can be predicted by the random forest classifier. Under the five-fold cross-validation, the proposed DWLMI model obtained an average prediction accuracy of 95.22% with a sensitivity of 94.35% at the AUC of 98.56%. CONCLUSIONS The experimental results demonstrated that DWLMI can effectively predict the potential lncRNA-miRNA associated pairs, and the results can provide a new insight for related non-coding RNA researchers in the field of combing biology big data with deep learning.
Collapse
Affiliation(s)
- Long Yang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Li-Ping Li
- College of Grassland and Environmental Science, Xinjiang Agricultural University, Urumqi, 830052, China.
| | - Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
13
|
Chen M, Deng Y, Li A, Tan Y. Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network. Front Genet 2022; 13:798632. [PMID: 35186029 PMCID: PMC8854791 DOI: 10.3389/fgene.2022.798632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
Long noncoding RNA (lncRNA), a type of more than 200 nucleotides non-coding RNA, is related to various complex diseases. To precisely identify the potential lncRNA–disease association is important to understand the disease pathogenesis, to develop new drugs, and to design individualized diagnosis and treatment methods for different human diseases. Compared with the complexity and high cost of biological experiments, computational methods can quickly and effectively predict potential lncRNA–disease associations. Thus, it is a promising avenue to develop computational methods for lncRNA-disease prediction. However, owing to the low prediction accuracy ofstate of the art methods, it is vastly challenging to accurately and effectively identify lncRNA-disease at present. This article proposed an integrated method called LPARP, which is based on label-propagation algorithm and random projection to address the issue. Specifically, the label-propagation algorithm is initially used to obtain the estimated scores of lncRNA–disease associations, and then random projections are used to accurately predict disease-related lncRNAs.The empirical experiments showed that LAPRP achieved good prediction on three golddatasets, which is superior to existing state-of-the-art prediction methods. It can also be used to predict isolated diseases and new lncRNAs. Case studies of bladder cancer, esophageal squamous-cell carcinoma, and colorectal cancer further prove the reliability of the method. The proposed LPARP algorithm can predict the potential lncRNA–disease interactions stably and effectively with fewer data. LPARP can be used as an effective and reliable tool for biomedical research.
Collapse
|
14
|
Wang L, Zhong C. gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network. BMC Bioinformatics 2022; 23:11. [PMID: 34983363 PMCID: PMC8729153 DOI: 10.1186/s12859-021-04548-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/21/2021] [Indexed: 01/20/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance. Results In this paper, we propose a novel computational method (gGATLDA) to predict LDAs based on graph-level graph attention network. Firstly, we extract the enclosing subgraphs of each lncRNA-disease pair. Secondly, we construct the feature vectors by integrating lncRNA similarity and disease similarity as node attributes in subgraphs. Finally, we train a graph neural network (GNN) model by feeding the subgraphs and feature vectors to it, and use the trained GNN model to predict lncRNA-disease potential association scores. The experimental results show that our method can achieve higher area under the receiver operation characteristic curve (AUC), area under the precision recall curve (AUPR), accuracy and F1-Score than the state-of-the-art methods in five fold cross-validation. Case studies show that our method can effectively identify lncRNAs associated with breast cancer, gastric cancer, prostate cancer, and renal cancer. Conclusion The experimental results indicate that our method is a useful approach for predicting potential LDAs.
Collapse
Affiliation(s)
- Li Wang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.,School of Computer, Electronics and Information, Guangxi University, Nanning, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, China. .,Key Laboratory of Parallel and Distributed Computing in Guangxi Colleges and Universities, Guangxi University, Nanning, China.
| |
Collapse
|
15
|
Zhang Y, Chen M, Huang L, Xie X, Li X, Jin H, Wang X, Wei H. Fusion of KATZ measure and space projection to fast probe potential lncRNA-disease associations in bipartite graphs. PLoS One 2021; 16:e0260329. [PMID: 34807960 PMCID: PMC8608294 DOI: 10.1371/journal.pone.0260329] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/06/2021] [Indexed: 11/19/2022] Open
Abstract
It is well known that numerous long noncoding RNAs (lncRNAs) closely relate to the physiological and pathological processes of human diseases and can serves as potential biomarkers. Therefore, lncRNA-disease associations that are identified by computational methods as the targeted candidates reduce the cost of biological experiments focusing on deep study furtherly. However, inaccurate construction of similarity networks and inadequate numbers of observed known lncRNA–disease associations, such inherent problems make many mature computational methods that have been developed for many years still exit some limitations. It motivates us to explore a new computational method that was fused with KATZ measure and space projection to fast probing potential lncRNA-disease associations (namely KATZSP). KATZSP is comprised of following key steps: combining all the global information with which to change Boolean network of known lncRNA–disease associations into the weighted networks; changing the similarities calculation into counting the number of walks that connect lncRNA nodes and disease nodes in bipartite graphs; obtaining the space projection scores to refine the primary prediction scores. The process to fuse KATZ measure and space projection was simplified and uncomplicated with needing only one attenuation factor. The leave-one-out cross validation (LOOCV) experimental results showed that, compared with other state-of-the-art methods (NCPLDA, LDAI-ISPS and IIRWR), KATZSP had a higher predictive accuracy shown with area-under-the-curve (AUC) value on the three datasets built, while KATZSP well worked on inferring potential associations related to new lncRNAs (or isolated diseases). The results from real cases study (such as pancreas cancer, lung cancer and colorectal cancer) further confirmed that KATZSP is capable of superior predictive ability to be applied as a guide for traditional biological experiments.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, China
- The Future Laboratory, Tsinghua University, Beijing, China
| | - Xiaolan Xie
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xin Li
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Hong Jin
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xiaohua Wang
- Pharmacy School, Guilin Medical University, Guilin, China
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, China
| |
Collapse
|
16
|
Xuan P, Zhan L, Cui H, Zhang T, Nakaguchi T, Zhang W. Graph Triple-Attention Network for Disease-related LncRNA Prediction. IEEE J Biomed Health Inform 2021; 26:2839-2849. [PMID: 34813484 DOI: 10.1109/jbhi.2021.3130110] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Abnormal expressions of long non-coding RNAs (lncRNAs) are associated with various human diseases. Identifying disease-related lncRNAs can help clarify complex disease pathogeneses. The latest methods for lncRNA-disease association prediction rely on diverse data about lncRNAs and diseases. These methods, however, cannot adequately integrate the neighbour topological information of lncRNA and disease nodes. Moreover, more intrinsic features of lncRNA-disease node pairs can be explored to better predict the latent associations between lncRNAs and diseases. We developed a novel method, named GTAN, to predict the association propensities between lncRNAs and diseases. GTAN integrates various information about lncRNAs and diseases, including similarities, associations and interactions among lncRNAs, diseases and miRNAs, and exploits neighbour topology and attribute representations of a pair of lncRNA-disease nodes. We adopted in GTAN a graph neural network architecture with three attention mechanisms and multi-layer convolutional neural networks. First, a neighbour-level self-attention mechanism is constructed to learn the importance of each neighbour for an interested lncRNA or disease node. Second, topology-level attention is proposed to enhance contextual dependencies among multiple local topology representations of the lncRNA or disease node. An attention-enhanced graph neural network framework is then established to learn a topology representation of top-ranked neighbours for a pair of lncRNA-disease nodes. GTAN also has attribute-level attention to distinguish various contributions of attributes of the lncRNA-disease pair. Finally, attribute representation is learned by multi-layer CNN to integrate detailed features and representative features of the pair. Extensive experimental results demonstrated that GTAN outperformed state-of-the-art methods. The improved recall rates also showed GTANs capacity for retrieving more actual lncRNA-disease associations in the top-ranked candidates. The ablation studies confirmed the important contributions of three attention mechanisms. Case studies on lung cancer, prostate cancer and colon cancer further showed GTANs ability in discovering potential lncRNA candidates related to diseases.
Collapse
|
17
|
Zhao X, Yang Y, Yin M. MHRWR: Prediction of lncRNA-Disease Associations Based on Multiple Heterogeneous Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2577-2585. [PMID: 32086216 DOI: 10.1109/tcbb.2020.2974732] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In the last few years, accumulating evidences had demonstrated that long non-coding RNAs (lncRNAs) participated in the regulation of target gene expression and played an important role in biological processes and human disease development. Thus, prediction of the associations between lncRNAs and disease had become a hot research in the fields of human sophisticated diseases. Most of these methods considered the information of two networks (lncRNA, disease) while neglected other networks. In this study, we designed a multi-layer network by integrating the similarity networks of lncRNAs, diseases and genes, and the known association networks of lncRNA-disease, lncRNAs-gene, and disease-gene, and then we developed a model called MHRWR for predicting the lncRNA-disease potential associations based on random walk with restart. The performance of MHRWR was evaluated by experimentally verified lncRNA-disease associations based on leave-one-out cross validation. MHRWR obtained a reliable AUC value of 0.91344, which significantly outperformed some previous methods. To further validate the reproducibility of performance, we used the model of MHRWR to verify related lncRNAs of colon cancer, colorectal cancer and lung adenocarcinoma in the case studies. The codes of MHRWR is available on: https://github.com/yangyq505/MHRWR.
Collapse
|
18
|
Zheng K, You ZH, Wang L, Li YR, Zhou JR, Zeng HT. MISSIM: An Incremental Learning-Based Model With Applications to the Prediction of miRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1733-1742. [PMID: 32749964 DOI: 10.1109/tcbb.2020.3013837] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few years, the prediction models have shown remarkable performance in most biological correlation prediction tasks. These tasks traditionally use a fixed dataset, and the model, once trained, is deployed as is. These models often encounter training issues such as sensitivity to hyperparameter tuning and "catastrophic forgetting" when adding new data. However, with the development of biomedicine and the accumulation of biological data, new predictive models are required to face the challenge of adapting to change. To this end, we propose a computational approach based on Broad learning system (BLS) to predict potential disease-associated miRNAs that retain the ability to distinguish prior training associations when new data need to be adapted. In particular, we are introducing incremental learning to the field of biological association prediction for the first time and proposed a new method for quantifying sequence similarity. In the performance evaluation, the AUC in the 5-fold cross-validation was 0.9400 +/- 0.0041. To better assess the effectiveness of MISSIM, we compared it with various classifiers and former prediction models. Its performance is superior to the previous method. Besides, the case study on identifying miRNAs associated with breast neoplasms, lung neoplasms and esophageal neoplasms show that 34, 36 and 35 out of the top 40 associations predicted by MISSIM are confirmed by recent biomedical resources. These results provide ample convincing evidence of this approach have potential value and prospect in promoting biomedical research productivity.
Collapse
|
19
|
Xie G, Zhu Y, Lin Z, Sun Y, Gu G, Wang W, Chen H. HOPMCLDA: predicting lncRNA-disease associations based on high-order proximity and matrix completion. Mol Omics 2021; 17:760-768. [PMID: 34251001 DOI: 10.1039/d1mo00138h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In recent years, emerging evidence has shown that long noncoding RNAs (lncRNAs) have important roles in the biological processes of complex diseases. However, experiments to determine the associations between diseases and lncRNAs are time consuming and costly. Therefore, there is a need to develop effective computational methods for exploring potential lncRNA-disease associations. In this study, we present a computational prediction method based on high-order proximity and matrix completion to predict lncRNA-disease associations (HOPMCLDA). HOPMCLDA integrates explicit similarity and high-order proximity information on lncRNAs and diseases and constructs a heterogeneous disease-lncRNA network to utilize similarity information. Finally, nuclear norm regularization is carried out on the heterogeneous network for the recovery of a lncRNA-disease association matrix. By implementing leave-one-out cross validation (LOOCV) and five-fold cross validation (5-fold CV), we compare HOPMCLDA with five other methods. HOPMCLDA outperforms the other methods, with area under the receiver operating characteristic curve values of 0.8755 and 0.8353 ± 0.0045 using LOOCV and 5-fold CV, respectively. Furthermore, case studies of three human diseases (gastric cancer, osteosarcoma, and hepatocellular carcinoma) confirm the reliable predictive performance of HOPMCLDA.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| | - Yinting Zhu
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| | - Zhiyi Lin
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| | - Yuping Sun
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| | - Guosheng Gu
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| | - Weiming Wang
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| | - Hui Chen
- School of Computers, Guangdong University of Technology, Guangzhou, China.
| |
Collapse
|
20
|
Li Z, Jiang K, Qin S, Zhong Y, Elofsson A. GCSENet: A GCN, CNN and SENet ensemble model for microRNA-disease association prediction. PLoS Comput Biol 2021; 17:e1009048. [PMID: 34081706 PMCID: PMC8205154 DOI: 10.1371/journal.pcbi.1009048] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 06/15/2021] [Accepted: 05/06/2021] [Indexed: 12/21/2022] Open
Abstract
Recently, an increasing number of studies have demonstrated that miRNAs are involved in human diseases, indicating that miRNAs might be a potential pathogenic factor for various diseases. Therefore, figuring out the relationship between miRNAs and diseases plays a critical role in not only the development of new drugs, but also the formulation of individualized diagnosis and treatment. As the prediction of miRNA-disease association via biological experiments is expensive and time-consuming, computational methods have a positive effect on revealing the association. In this study, a novel prediction model integrating GCN, CNN and Squeeze-and-Excitation Networks (GCSENet) was constructed for the identification of miRNA-disease association. The model first captured features by GCN based on a heterogeneous graph including diseases, genes and miRNAs. Then, considering the different effects of genes on each type of miRNA and disease, as well as the different effects of the miRNA-gene and disease-gene relationships on miRNA-disease association, a feature weight was set and a combination of miRNA-gene and disease-gene associations was added as feature input for the convolution operation in CNN. Furthermore, the squeeze and excitation blocks of SENet were applied to determine the importance of each feature channel and enhance useful features by means of the attention mechanism, thus achieving a satisfactory prediction of miRNA-disease association. The proposed method was compared against other state-of-the-art methods. It achieved an AUROC score of 95.02% and an AUPR score of 95.55% in a 10-fold cross-validation, which led to the finding that the proposed method is superior to these popular methods on most of the performance evaluation indexes. Identifying miRNA-disease associations accelerates the understanding towards pathogenicity, which is beneficial for the development of treatment tools for diseases. Different from existing methods, our GCSENet captures the deep relationship between miRNA and disease through three heterogeneous graphs (disease, gene and miRNA) to promote an accurate prediction result. We performed the 10-fold cross validation to evaluate the performance of GCSENet, which can outperform many classic methods. Furthermore, we carried out case studies on four important diseases, which were used to evaluate the performance of our model regarding to the associations with experimental evidences in literature. The result shows that most predicted miRNAs (48 for lung neoplasms, 48 for heart failure, 48 for breast cancer and 50 for glioblastoma) in the top 50 predictions were confirmed in HMDD v3.0. As a result, it shows that GCSENet can make reliable predictions and guide experiments to uncover more miRNA-disease associations.
Collapse
Affiliation(s)
- Zhong Li
- Department of Mathematical Sciences, School of Science, Zhejiang Sci-Tech University, Hangzhou, China
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Solna, Sweden
- * E-mail:
| | - Kaiyancheng Jiang
- Department of Mathematical Sciences, School of Science, Zhejiang Sci-Tech University, Hangzhou, China
| | - Shengwei Qin
- Department of Mathematical Sciences, School of Science, Zhejiang Sci-Tech University, Hangzhou, China
| | - Yijun Zhong
- Department of Mathematical Sciences, School of Science, Zhejiang Sci-Tech University, Hangzhou, China
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Solna, Sweden
| |
Collapse
|
21
|
Du B, Tang L, Liu L, Zhou W. Predicting LncRNA-Disease Association Based on Generative Adversarial Network. Curr Gene Ther 2021; 22:144-151. [PMID: 33998988 DOI: 10.2174/1566523221666210506131055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/19/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Increasing research reveals that long non-coding RNAs (lncRNAs) play an important role in various biological processes of human diseases. Nonetheless, only a handful of lncRNA-disease associations have been experimentally verified. The study of lncRNA-disease association prediction based on the computational model has provided a preliminary basis for biological experiments to a great degree so as to cut down the huge cost of wet lab experiments. OBJECTIVE This study aims to learn the real distribution of lncRNA-disease association from a limited number of known lncRNA-disease association data. This paper proposes a new lncRNA-disease association prediction model called LDA-GAN based on a generative adversarial network (GAN). METHOD Aiming at the problems of slow convergence rate, training instabilities, and unavailability of discrete data in traditional GAN, LDA-GAN utilizes the Gumbel-softmax technology to construct a differentiable process for simulating discrete sampling. Meanwhile, the generator and the discriminator of LDA-GAN are integrated to establish the overall optimization goal based on the pairwise loss function. RESULTS Experiments on standard datasets demonstrate that LDA-GAN achieves not only high stability and high efficiency in the process of confrontation learning but also gives full play to the semi-supervised learning advantage of generative adversarial learning framework for unlabeled data, which further improves the prediction accuracy of lncRNA-disease association. Besides, case studies show that LDA-GAN can accurately generate potential diseases for several lncRNAs.
Collapse
Affiliation(s)
- Biao Du
- School of Information, Yunnan Normal University, Kunming. China
| | - Lin Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming. China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming. China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming. China
| |
Collapse
|
22
|
Chen Q, Lai D, Lan W, Wu X, Chen B, Liu J, Chen YPP, Wang J. ILDMSF: Inferring Associations Between Long Non-Coding RNA and Disease Based on Multi-Similarity Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1106-1112. [PMID: 31443046 DOI: 10.1109/tcbb.2019.2936476] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The dysregulation and mutation of long non-coding RNAs (lncRNAs) have been proved to result in a variety of human diseases. Identifying potential disease-related lncRNAs may benefit disease diagnosis, treatment and prognosis. A number of methods have been proposed to predict the potential lncRNA-disease relationships. However, most of them may give rise to incorrect results due to relying on single similarity measure. This article proposes a novel framework (ILDMSF) by fusing the lncRNA similarities and disease similarities, which are measured by lncRNA-related gene and known lncRNA-disease interaction and disease semantic interaction, and known lncRNA-disease interaction, respectively. Further, the support vector machine is employed to identify the potential lncRNA-disease associations based on the integrated similarity. The leave-one-out cross validation is performed to compare ILDMSF with other state of the art methods. The experimental results demonstrate our method is prospective in exploring potential correlations between lncRNA and disease.
Collapse
|
23
|
Yu B, Cui R, Lan Y, Zhang J, Liu B. Long non-coding RNA H19 as a diagnostic marker in peripheral blood of patients with sepsis. Am J Transl Res 2021; 13:2923-2930. [PMID: 34017457 PMCID: PMC8129366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 12/12/2020] [Indexed: 06/12/2023]
Abstract
OBJECTIVE To evaluate the diagnostic and prognostic value of circulating long non-coding RNA H19 (H19) in sepsis. METHODS A total of 104 septic patients admitted to our hospital from June 2018 to April 2019 were enrolled as the disease group, and another 92 healthy individuals were selected as the control group. The relative expression of H19 in peripheral blood was quantified by quantitative real-time polymerase chain reaction, and the diagnostic value in sepsis was assessed by receiver operating characteristic curve analysis. Pearson correlation coefficient was used to analyze the correlation between H19 and other inflammatory markers. RESULTS Compared with the control group, the expression of peripheral blood H19 in the disease group was significantly down-regulated. The area under the curve (AUC) of H19 for diagnosing sepsis was 0.849. The expression of H19 in the survival group was significantly up-regulated compared with that in the death group, and the AUC in the survival group was 0.865. The relative expression of H19 was negatively correlated with PCT, CRP, IL-6, TNF-α, CK-MB, and cTnI. Multivariate logistic regression showed that patients with high lactic acid, coagulation dysfunction, high levels of PCT, CRP, IL-6, TNF-α, CK-MB, and cTnI, but low H19 expression had an increased risk of sepsis. CONCLUSION Peripheral blood H19 may be used for early diagnosis, clinical assessment, and prognosis of sepsis.
Collapse
Affiliation(s)
- Baojiang Yu
- Department of Clinical Laboratory, Weihai Hosptial of Traditional Chinese MedicineWeihai 264200, Shandong, China
| | - Rong Cui
- Department of Clinical Laboratory, Weihai Hosptial of Traditional Chinese MedicineWeihai 264200, Shandong, China
| | - Yuehong Lan
- Xiaoji Center Health CenterYantai 265100, Shandong, China
| | - Jinjing Zhang
- Department of Clinical Laboratory, Weihai Hosptial of Traditional Chinese MedicineWeihai 264200, Shandong, China
| | - Bing Liu
- Department of Clinical Laboratory, Weihai Hosptial of Traditional Chinese MedicineWeihai 264200, Shandong, China
| |
Collapse
|
24
|
Gao MM, Cui Z, Gao YL, Wang J, Liu JX. Multi-Label Fusion Collaborative Matrix Factorization for Predicting LncRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:881-890. [PMID: 32324583 DOI: 10.1109/jbhi.2020.2988720] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
As we all know, science and technology are developing faster and faster. Many experts and scholars have demonstrated that human diseases are related to lncRNA, but only a few associations have been confirmed, and many unknown associations need to be found. In the process of finding associations, it takes a lot of time, so finding an efficient way to predict the associations between lncRNAs and diseases is particularly important. In this paper, we propose a multi-label fusion collaborative matrix factorization (MLFCMF) approach for predicting lncRNA-disease associations (LDAs). Firstly, the lncRNA space and disease space are optimized by multi-label to enhance the intrinsic link between lncRNA and disease and to tap potential information. Multi-label learning can encode a variety of data information from the sample space. Secondly, to learn multi-label information in the data space, the fusion method is used to handle the relationship between multiple labels. More comprehensive information will be obtained by weighing the effects of different labels. The addition of Gaussian interaction profile (GIP) kernel can increase the network similarity. Finally, the lncRNA-disease associations are predicted by the method of collaborative matrix factorization. The ten-fold cross-validation method is used to evaluate the MLFCMF method, and our method finally obtains an AUC value of 0.8612. Detailed analysis of ovarian cancer, colorectal cancer, and lung cancer in the simulation experiment results. So it can be seen that our method MLFCMF is an effective model for predicting lncRNA-disease associations.
Collapse
|
25
|
Xie G, Huang B, Sun Y, Wu C, Han Y. RWSF-BLP: a novel lncRNA-disease association prediction model using random walk-based multi-similarity fusion and bidirectional label propagation. Mol Genet Genomics 2021; 296:473-483. [PMID: 33590345 DOI: 10.1007/s00438-021-01764-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 01/28/2021] [Indexed: 12/13/2022]
Abstract
An increasing number of studies and experiments have demonstrated that long noncoding RNAs (lncRNAs) have a massive impact on various biological processes. Predicting potential associations between lncRNAs and diseases not only can improve our understanding of the molecular mechanisms of human diseases but also can facilitate the identification of biomarkers for disease diagnosis, treatment, and prevention. However, identifying such associations through experiments is costly and demanding, thereby prompting researchers to develop computational methods to complement these experiments. In this paper, we constructed a novel model called RWSF-BLP (a novel lncRNA-disease association prediction model using Random Walk-based multi-Similarity Fusion and Bidirectional Label Propagation), which applies an efficient random walk-based multi-similarity fusion (RWSF) method to fuse different similarity matrices and utilizes bidirectional label propagation to predict potential lncRNA-disease associations. Leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold-CV) were implemented in the evaluation RWSF-BLP performance. Results showed that, RWSF-BLP has reliable AUCs of 0.9086 and 0.9115 ± 0.0044 under the framework of LOOCV and 5-fold-CV and outperformed other four canonical methods. Case studies on lung cancer and leukemia demonstrated that potential lncRNA-disease associations can be predicted through our method. Therefore, our method can accurately infer potential lncRNA-disease associations and may be a good choice in future biomedical research.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Bin Huang
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Yuping Sun
- School of Computer Science, Guangdong University of Technology, Guangzhou, China.
| | - Changhai Wu
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Yuqiong Han
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
26
|
Liu JX, Cui Z, Gao YL, Kong XZ. WGRCMF: A Weighted Graph Regularized Collaborative Matrix Factorization Method for Predicting Novel LncRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:257-265. [PMID: 32287024 DOI: 10.1109/jbhi.2020.2985703] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In recent years, many human diseases have been determined to be associated with certain lncRNAs. Only a small percentage of all lncRNA-disease associations (LDAs) have been discovered by researchers. Predicting novel LDAs is time-consuming and costly. It is crucial to propose a method that can effectively identify potential LDAs to solve this problem based on the available datasets. Although some current methods can effectively predict potential LDAs, the prediction accuracy needs to be improved, and there are few known associations. Moreover, there are notable errors in the method of constructing the network and the bipartite graph, which interfere with the final results. A weighted graph regularized collaborative matrix factorization (WGRCMF) method is proposed to predict novel LDAs. We introduce the graph regularization terms into the collaborative matrix factorization. Considering that manifold learning can recover low-dimensional manifold structures from high-dimensional sampled data, we can find low-dimensional manifolds in high-dimensional space. In addition, a weight matrix is also introduced into the method, the significance of which is to prevent unknown associations from contributing to the final prediction matrix. Finally, the prediction accuracy of this method is better than those of other methods. In several cancer cases, we implemented the corresponding simulation experiments. According to the experimental results, the proposed method is feasible and effective.
Collapse
|
27
|
Seifuddin F, Pirooznia M. Bioinformatics Approaches for Functional Prediction of Long Noncoding RNAs. Methods Mol Biol 2021; 2254:1-13. [PMID: 33326066 DOI: 10.1007/978-1-0716-1158-6_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
There is accumulating evidence that long noncoding RNAs (lncRNAs) play crucial roles in biological processes and diseases. In recent years, computational models have been widely used to predict potential lncRNA-disease relations. In this chapter, we systematically describe various computational algorithms and prediction tools that have been developed to elucidate the roles of lncRNAs in diseases, coding potential/functional characterization, or ascertaining their involvement in critical biological processes as well as provide a comprehensive summary of these applications.
Collapse
Affiliation(s)
- Fayaz Seifuddin
- Bioinformatics and Computational Biology, National Heart, Lung, and Blood Institute National Institutes of Health, Bethesda, MD, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology, National Heart, Lung, and Blood Institute National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
28
|
HAUBRW: Hybrid algorithm and unbalanced bi-random walk for predicting lncRNA-disease associations. Genomics 2020; 112:4777-4787. [PMID: 33348478 DOI: 10.1016/j.ygeno.2020.08.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 08/01/2020] [Accepted: 08/17/2020] [Indexed: 01/24/2023]
Abstract
An increasing number of research shows that long non-coding RNA plays a key role in many important biological processes. However, the number of disease-related lncRNAs found by researchers remains relatively small, and experimental identification is time consuming and labor intensive. In this study, we propose a novel method, namely HAUBRW, to predict undiscovered lncRNA-disease associations. First, the hybrid algorithm, which combines the heat spread algorithm and the probability diffusion algorithm, redistributes the resources. Second, unbalanced bi-random walk, is used to infer undiscovered lncRNA disease associations. Seven advanced models, i.e. BRWLDA, DSCMF, RWRlncD, IDLDA, KATZ, Ping's, and Yang's were compared with our method, and simulation results show that the AUC of our method is more perfect than the other models. In addition, case studies have shown that HAUBRW can effectively predict candidate lncRNAs for breast, osteosarcoma and cervical cancer. Therefore, our approach may be a good choice in future biomedical research.
Collapse
|
29
|
Zhang Y, Ye F, Xiong D, Gao X. LDNFSGB: prediction of long non-coding rna and disease association using network feature similarity and gradient boosting. BMC Bioinformatics 2020; 21:377. [PMID: 32883200 PMCID: PMC7469344 DOI: 10.1186/s12859-020-03721-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 08/21/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND A large number of experimental studies show that the mutation and regulation of long non-coding RNAs (lncRNAs) are associated with various human diseases. Accurate prediction of lncRNA-disease associations can provide a new perspective for the diagnosis and treatment of diseases. The main function of many lncRNAs is still unclear and using traditional experiments to detect lncRNA-disease associations is time-consuming. RESULTS In this paper, we develop a novel and effective method for the prediction of lncRNA-disease associations using network feature similarity and gradient boosting (LDNFSGB). In LDNFSGB, we first construct a comprehensive feature vector to effectively extract the global and local information of lncRNAs and diseases through considering the disease semantic similarity (DISSS), the lncRNA function similarity (LNCFS), the lncRNA Gaussian interaction profile kernel similarity (LNCGS), the disease Gaussian interaction profile kernel similarity (DISGS), and the lncRNA-disease interaction (LNCDIS). Particularly, two methods are used to calculate the DISSS (LNCFS) for considering the local and global information of disease semantics (lncRNA functions) respectively. An autoencoder is then used to reduce the dimensionality of the feature vector to obtain the optimal feature parameter from the original feature set. Furthermore, we employ the gradient boosting algorithm to obtain the lncRNA-disease association prediction. CONCLUSIONS In this study, hold-out, leave-one-out cross-validation, and ten-fold cross-validation methods are implemented on three publicly available datasets to evaluate the performance of LDNFSGB. Extensive experiments show that LDNFSGB dramatically outperforms other state-of-the-art methods. The case studies on six diseases, including cancers and non-cancers, further demonstrate the effectiveness of our method in real-world applications.
Collapse
Affiliation(s)
- Yuan Zhang
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - Fei Ye
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - Dapeng Xiong
- Department of Computational Biology, Ithaca, New York 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York 14853, USA
| | - Xieping Gao
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China.
- College of Medical Imaging and Inspection, Xiangnan University, Chenzhou 423000, China.
| |
Collapse
|
30
|
Fan W, Shang J, Li F, Sun Y, Yuan S, Liu JX. IDSSIM: an lncRNA functional similarity calculation model based on an improved disease semantic similarity method. BMC Bioinformatics 2020; 21:339. [PMID: 32736513 PMCID: PMC7430881 DOI: 10.1186/s12859-020-03699-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 07/23/2020] [Indexed: 12/17/2022] Open
Abstract
Background It has been widely accepted that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human diseases. Many association prediction models have been proposed for predicting lncRNA functions and identifying potential lncRNA-disease associations. Nevertheless, among them, little effort has been attempted to measure lncRNA functional similarity, which is an essential part of association prediction models. Results In this study, we presented an lncRNA functional similarity calculation model, IDSSIM for short, based on an improved disease semantic similarity method, highlight of which is the introduction of information content contribution factor into the semantic value calculation to take into account both the hierarchical structures of disease directed acyclic graphs and the disease specificities. IDSSIM and three state-of-the-art models, i.e., LNCSIM1, LNCSIM2, and ILNCSIM, were evaluated by applying their disease semantic similarity matrices and the lncRNA functional similarity matrices, as well as corresponding matrices of human lncRNA-disease associations coming from either lncRNADisease database or MNDR database, into an association prediction method WKNKN for lncRNA-disease association prediction. In addition, case studies of breast cancer and adenocarcinoma were also performed to validate the effectiveness of IDSSIM. Conclusions Results demonstrated that in terms of ROC curves and AUC values, IDSSIM is superior to compared models, and can improve accuracy of disease semantic similarity effectively, leading to increase the association prediction ability of the IDSSIM-WKNKN model; in terms of case studies, most of potential disease-associated lncRNAs predicted by IDSSIM can be confirmed by databases and literatures, implying that IDSSIM can serve as a promising tool for predicting lncRNA functions, identifying potential lncRNA-disease associations, and pre-screening candidate lncRNAs to perform biological experiments. The IDSSIM code, all experimental data and prediction results are available online at https://github.com/CDMB-lab/IDSSIM.
Collapse
Affiliation(s)
- Wenwen Fan
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Junliang Shang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China.
| | - Feng Li
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Yan Sun
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Shasha Yuan
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| | - Jin-Xing Liu
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
| |
Collapse
|
31
|
Yan C, Zhang Z, Bao S, Hou P, Zhou M, Xu C, Sun J. Computational Methods and Applications for Identifying Disease-Associated lncRNAs as Potential Biomarkers and Therapeutic Targets. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 21:156-171. [PMID: 32585624 PMCID: PMC7321789 DOI: 10.1016/j.omtn.2020.05.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/06/2020] [Accepted: 05/18/2020] [Indexed: 12/12/2022]
Abstract
Long non-coding RNAs (lncRNAs) have been recognized as critical components of a broad genomic regulatory network and play pivotal roles in physiological and pathological processes. Identification of disease-associated lncRNAs is becoming increasingly crucial for fundamentally improving our understanding of molecular mechanisms of disease and developing novel biomarkers and therapeutic targets. Considering lower efficiency and higher time and labor cost of biological experiments, computer-aided inference of disease-associated RNAs has become a promising avenue for facilitating the study of lncRNA functions and provides complementary value for experimental studies. In this study, we first summarize data and knowledge resources publicly available for the study of lncRNA-disease associations. Then, we present an updated systematic overview of dozens of computational methods and models for inferring lncRNA-disease associations proposed in recent years. Finally, we explore the perspectives and challenges for further studies. Our study provides a guide for biologists and medical scientists to look for dedicated resources and more competent tools for accelerating the unraveling of disease-associated lncRNAs.
Collapse
Affiliation(s)
- Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P.R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P.R. China
| | - Siqi Bao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P.R. China
| | - Ping Hou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P.R. China
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P.R. China
| | - Chongyong Xu
- Department of Radiology, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou 325027, P.R. China.
| | - Jie Sun
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P.R. China.
| |
Collapse
|
32
|
Wang L, You ZH, Huang DS, Zhou F. Combining High Speed ELM Learning with a Deep Convolutional Neural Network Feature Encoding for Predicting Protein-RNA Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:972-980. [PMID: 30296240 DOI: 10.1109/tcbb.2018.2874267] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Emerging evidence has shown that RNA plays a crucial role in many cellular processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological experiments provide a lot of valuable information for the initial identification of RNA-protein interactions (RPIs), but with the increasing complexity of RPIs networks, this method gradually falls into expensive and time-consuming situations. Therefore, there is an urgent need for high speed and reliable methods to predict RNA-protein interactions. In this study, we propose a computational method for predicting the RNA-protein interactions using sequence information. The deep learning convolution neural network (CNN) algorithm is utilized to mine the hidden high-level discriminative features from the RNA and protein sequences and feed it into the extreme learning machine (ELM) classifier. The experimental results with 5-fold cross-validation indicate that the proposed method achieves superior performance on benchmark datasets (RPI1807, RPI2241, and RPI369) with the accuracy of 98.83, 90.83, and 85.63 percent, respectively. We further evaluate the performance of the proposed model by comparing it with the state-of-the-art SVM classifier and other existing methods on the same benchmark data set. In addition, we predicted the independent NPInter v2.0 data set using the model trained on RPI369. The experimental results show that our model can serve as a useful tool for predicting RNA-protein interactions.
Collapse
|
33
|
Zhu X, Wang X, Zhao H, Pei T, Kuang L, Wang L. BHCMDA: A New Biased Heat Conduction Based Method for Potential MiRNA-Disease Association Prediction. Front Genet 2020; 11:384. [PMID: 32425979 PMCID: PMC7212362 DOI: 10.3389/fgene.2020.00384] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 03/27/2020] [Indexed: 01/04/2023] Open
Abstract
Recent studies have indicated that microRNAs (miRNAs) are closely related to sundry human sophisticated diseases. According to the surmise that functionally similar miRNAs are more likely associated with phenotypically similar diseases, researchers have proposed a variety of valid computational models through integrating known miRNA-disease associations, disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity to discover the potential miRNA-disease relationships in biomedical researches. Taking account of the limitations of previous computational models, a new computational model based on biased heat conduction for MiRNA-Disease Association prediction (BHCMDA) was proposed in this paper, which can achieve the AUC of 0.8890 in LOOCV (Leave-One-Out Cross Validation) and the mean AUC of 0.9060, 0.8931 under the framework of twofold cross validation, fivefold cross validation, respectively. In addition, BHCMDA was further implemented to the case studies of three vital human cancers, and simulation results illustrated that there were 88% (Esophageal Neoplasms), 92% (Colonic Neoplasms) and 92% (Lymphoma) out of top 50 predicted miRNAs having been confirmed by experimental literatures, separately, which demonstrated the good performance of BHCMDA as well. Thence, BHCMDA would be a useful calculative resource for potential miRNA-disease association prediction.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China
| | - Xuzai Wang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Haochen Zhao
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Tingrui Pei
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China.,Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China
| | - Lei Wang
- Key Laboratory of Hunan Province for Internet of Things and Information Security, Xiangtan University, Xiangtan, China.,College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
34
|
A random forest based computational model for predicting novel lncRNA-disease associations. BMC Bioinformatics 2020; 21:126. [PMID: 32216744 PMCID: PMC7099795 DOI: 10.1186/s12859-020-3458-1] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/18/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources. RESULTS To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models. CONCLUSIONS Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.
Collapse
|
35
|
Zhang Y, Chen M, Li A, Cheng X, Jin H, Liu Y. LDAI-ISPS: LncRNA-Disease Associations Inference Based on Integrated Space Projection Scores. Int J Mol Sci 2020; 21:E1508. [PMID: 32098405 PMCID: PMC7073162 DOI: 10.3390/ijms21041508] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/18/2020] [Accepted: 02/19/2020] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNAs (long ncRNAs, lncRNAs) of all kinds have been implicated in a range of cell developmental processes and diseases, while they are not translated into proteins. Inferring diseases associated lncRNAs by computational methods can be helpful to understand the pathogenesis of diseases, but those current computational methods still have not achieved remarkable predictive performance: such as the inaccurate construction of similarity networks and inadequate numbers of known lncRNA-disease associations. In this research, we proposed a lncRNA-disease associations inference based on integrated space projection scores (LDAI-ISPS) composed of the following key steps: changing the Boolean network of known lncRNA-disease associations into the weighted networks via combining all the global information (e.g., disease semantic similarities, lncRNA functional similarities, and known lncRNA-disease associations); obtaining the space projection scores via vector projections of the weighted networks to form the final prediction scores without biases. The leave-one-out cross validation (LOOCV) results showed that, compared with other methods, LDAI-ISPS had a higher accuracy with area-under-the-curve (AUC) value of 0.9154 for inferring diseases, with AUC value of 0.8865 for inferring new lncRNAs (whose associations related to diseases are unknown), with AUC value of 0.7518 for inferring isolated diseases (whose associations related to lncRNAs are unknown). A case study also confirmed the predictive performance of LDAI-ISPS as a helper for traditional biological experiments in inferring the potential LncRNA-disease associations and isolated diseases.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Min Chen
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang 421002, China
| | - Ang Li
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang 421002, China
| | - Xiaohui Cheng
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Hong Jin
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Yarong Liu
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| |
Collapse
|
36
|
Li J, Zhao Y, Zhou S, Zhou Y, Lang L. Inferring lncRNA Functional Similarity Based on Integrating Heterogeneous Network Data. Front Bioeng Biotechnol 2020; 8:27. [PMID: 32117916 PMCID: PMC7015864 DOI: 10.3389/fbioe.2020.00027] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 01/13/2020] [Indexed: 01/26/2023] Open
Abstract
Although lncRNAs lack the potential to be translated into proteins directly, their complicated and diversiform functions make them as a window into decoding the mechanisms of human physiological activities. Accumulating experiment studies have identified associations between lncRNA dysfunction and many important complex diseases. However, known experimentally confirmed lncRNA functions are still very limited. It is urgent to build effective computational models for rapid predicting of unknown lncRNA functions on a large scale. To this end, valid similarity measure between known and unknown lncRNAs plays a vital role. In this paper, an original model was developed to calculate functional similarities between lncRNAs by integrating heterogeneous network data. In this model, a novel integrated network was constructed based on the data of four single lncRNA functional similarity networks (miRNA-based similarity network, disease-based similarity network, GTEx expression-based network and NONCODE expression-based network). Using the lncRNA pairs that share the target mRNAs as the benchmark, the results show that this integrated network is more effective than any single networks with an AUC of 0.736 in the cross validation, while the AUC of four single networks were 0.703, 0.733, 0.611, and 0.602. To implement our model, a web server named IHNLncSim was constructed for inferring lncRNA functional similarity based on integrating heterogeneous network data. Moreover, the modules of network visualization and disease-based lncRNA function enrichment analysis were added into IHNLncSim. It is anticipated that IHNLncSim could be an effective bioinformatics tool for the researches of lncRNA regulation function studies. IHNLncSim is freely available at http://www.lirmed.com/ihnlncsim.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Yingshu Zhao
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Siyuan Zhou
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Yuan Zhou
- MOE Key Lab of Cardiovascular Sciences, Department of Biomedical Informatics, Center for Noncoding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing, China
| | - Liying Lang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| |
Collapse
|
37
|
Wang Y, Yu G, Wang J, Fu G, Guo M, Domeniconi C. Weighted matrix factorization on multi-relational data for LncRNA-disease association prediction. Methods 2020; 173:32-43. [DOI: 10.1016/j.ymeth.2019.06.015] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 06/01/2019] [Accepted: 06/13/2019] [Indexed: 02/07/2023] Open
|
38
|
Chen X, Sun YZ, Guan NN, Qu J, Huang ZA, Zhu ZX, Li JQ. Computational models for lncRNA function prediction and functional similarity calculation. Brief Funct Genomics 2020; 18:58-82. [PMID: 30247501 DOI: 10.1093/bfgp/ely031] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 07/17/2018] [Accepted: 08/30/2018] [Indexed: 02/01/2023] Open
Abstract
From transcriptional noise to dark matter of biology, the rapidly changing view of long non-coding RNA (lncRNA) leads to deep understanding of human complex diseases induced by abnormal expression of lncRNAs. There is urgent need to discern potential functional roles of lncRNAs for further study of pathology, diagnosis, therapy, prognosis, prevention of human complex disease and disease biomarker detection at lncRNA level. Computational models are anticipated to be an effective way to combine current related databases for predicting most potential lncRNA functions and calculating lncRNA functional similarity on the large scale. In this review, we firstly illustrated the biological function of lncRNAs from five biological processes and briefly depicted the relationship between mutations or dysfunctions of lncRNAs and human complex diseases involving cancers, nervous system disorders and others. Then, 17 publicly available lncRNA function-related databases containing four types of functional information content were introduced. Based on these databases, dozens of developed computational models are emerging to help characterize the functional roles of lncRNAs. We therefore systematically described and classified both 16 lncRNA function prediction models and 9 lncRNA functional similarity calculation models into 8 types for highlighting their core algorithm and process. Finally, we concluded with discussions about the advantages and limitations of these computational models and future directions of lncRNA function prediction and functional similarity calculation. We believe that constructing systematic functional annotation systems is essential to strengthen the prediction accuracy of computational models, which will accelerate the identification process of novel lncRNA functions in the future.
Collapse
Affiliation(s)
- Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Ya-Zhou Sun
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Na-Na Guan
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Jia Qu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Zhi-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Ze-Xuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
39
|
Fan Y, Cui J, Zhu Q. Heterogeneous graph inference based on similarity network fusion for predicting lncRNA–miRNA interaction. RSC Adv 2020; 10:11634-11642. [PMID: 35496629 PMCID: PMC9050493 DOI: 10.1039/c9ra11043g] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 03/14/2020] [Indexed: 12/28/2022] Open
Abstract
LncRNA and miRNA are two non-coding RNA types that are popular in current research. LncRNA interacts with miRNA to regulate gene transcription, further affecting human health and disease. Accurate identification of lncRNA–miRNA interactions contributes to the in-depth study of the biological functions and mechanisms of non-coding RNA. However, relying on biological experiments to obtain interaction information is time-consuming and expensive. Considering the rapid accumulation of gene information and the few computational methods, it is urgent to supplement the effective computational models to predict lncRNA–miRNA interactions. In this work, we propose a heterogeneous graph inference method based on similarity network fusion (SNFHGILMI) to predict potential lncRNA–miRNA interactions. First, we calculated multiple similarity data, including lncRNA sequence similarity, miRNA sequence similarity, lncRNA Gaussian nuclear similarity, and miRNA Gaussian nuclear similarity. Second, the similarity network fusion method was employed to integrate the data and get the similarity network of lncRNA and miRNA. Then, we constructed a bipartite network by combining the known interaction network and similarity network of lncRNA and miRNA. Finally, the heterogeneous graph inference method was introduced to construct a prediction model. On the real dataset, the model SNFHGILMI achieved AUC of 0.9501 and 0.9426 ± 0.0035 based on LOOCV and 5-fold cross validation, respectively. Furthermore, case studies also demonstrate that SNFHGILMI is a high-performance prediction method that can accurately predict new lncRNA–miRNA interactions. The Matlab code and readme file of SNFHGILMI can be downloaded from https://github.com/cj-DaSE/SNFHGILMI. LncRNA and miRNA are two non-coding RNA types that are popular in current research.![]()
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer and Information Security
- Guilin University of Electronic Technology
- Guilin 541004
- China
| | - Juan Cui
- School of Computer and Information Security
- Guilin University of Electronic Technology
- Guilin 541004
- China
| | - QingQi Zhu
- School of Computer and Information Security
- Guilin University of Electronic Technology
- Guilin 541004
- China
| |
Collapse
|
40
|
Guan NN, Wang CC, Zhang L, Huang L, Li JQ, Piao X. In silico prediction of potential miRNA-disease association using an integrative bioinformatics approach based on kernel fusion. J Cell Mol Med 2019; 24:573-587. [PMID: 31747722 PMCID: PMC6933403 DOI: 10.1111/jcmm.14765] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 08/13/2019] [Accepted: 09/20/2019] [Indexed: 12/18/2022] Open
Abstract
Accumulating experimental evidence has demonstrated that microRNAs (miRNAs) have a huge impact on numerous critical biological processes and they are associated with different complex human diseases. Nevertheless, the task to predict potential miRNAs related to diseases remains difficult. In this paper, we developed a Kernel Fusion-based Regularized Least Squares for MiRNA-Disease Association prediction model (KFRLSMDA), which applied kernel fusion technique to fuse similarity matrices and then utilized regularized least squares to predict potential miRNA-disease associations. To prove the effectiveness of KFRLSMDA, we adopted leave-one-out cross-validation (LOOCV) and 5-fold cross-validation and then compared KFRLSMDA with 10 previous computational models (MaxFlow, MiRAI, MIDP, RKNNMDA, MCMDA, HGIMDA, RLSMDA, HDMP, WBSMDA and RWRMDA). Outperforming other models, KFRLSMDA achieved AUCs of 0.9246 in global LOOCV, 0.8243 in local LOOCV and average AUC of 0.9175 ± 0.0008 in 5-fold cross-validation. In addition, respectively, 96%, 100% and 90% of the top 50 potential miRNAs for breast neoplasms, colon neoplasms and oesophageal neoplasms were confirmed by experimental discoveries. We also predicted potential miRNAs related to hepatocellular cancer by removing all known related miRNAs of this cancer and 98% of the top 50 potential miRNAs were verified. Furthermore, we predicted potential miRNAs related to lymphoma using the data set in the old version of the HMDD database and 80% of the top 50 potential miRNAs were confirmed. Therefore, it can be concluded that KFRLSMDA has reliable prediction performance.
Collapse
Affiliation(s)
- Na-Na Guan
- College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang, China.,College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, China.,The Future Laboratory, Tsinghua University, Beijing, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
41
|
Chen X, Xie D, Zhao Q, You ZH. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform 2019; 20:515-539. [PMID: 29045685 DOI: 10.1093/bib/bbx130] [Citation(s) in RCA: 392] [Impact Index Per Article: 78.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 08/13/2017] [Indexed: 12/22/2022] Open
Abstract
Plenty of microRNAs (miRNAs) were discovered at a rapid pace in plants, green algae, viruses and animals. As one of the most important components in the cell, miRNAs play a growing important role in various essential and important biological processes. For the recent few decades, amounts of experimental methods and computational models have been designed and implemented to identify novel miRNA-disease associations. In this review, the functions of miRNAs, miRNA-target interactions, miRNA-disease associations and some important publicly available miRNA-related databases were discussed in detail. Specially, considering the important fact that an increasing number of miRNA-disease associations have been experimentally confirmed, we selected five important miRNA-related human diseases and five crucial disease-related miRNAs and provided corresponding introductions. Identifying disease-related miRNAs has become an important goal of biomedical research, which will accelerate the understanding of disease pathogenesis at the molecular level and molecular tools design for disease diagnosis, treatment and prevention. Computational models have become an important means for novel miRNA-disease association identification, which could select the most promising miRNA-disease pairs for experimental validation and significantly reduce the time and cost of the biological experiments. Here, we reviewed 20 state-of-the-art computational models of predicting miRNA-disease associations from different perspectives. Finally, we summarized four important factors for the difficulties of predicting potential disease-related miRNAs, the framework of constructing powerful computational models to predict potential miRNA-disease associations including five feasible and important research schemas, and future directions for further development of computational models.
Collapse
Affiliation(s)
- Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Di Xie
- School of Mathematics, Liaoning University
| | - Qi Zhao
- School of Mathematics, Liaoning University
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science
| |
Collapse
|
42
|
Xuan P, Pan S, Zhang T, Liu Y, Sun H. Graph Convolutional Network and Convolutional Neural Network Based Method for Predicting lncRNA-Disease Associations. Cells 2019; 8:E1012. [PMID: 31480350 PMCID: PMC6769579 DOI: 10.3390/cells8091012] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 08/19/2019] [Accepted: 08/26/2019] [Indexed: 12/11/2022] Open
Abstract
Aberrant expressions of long non-coding RNAs (lncRNAs) are often associated with diseases and identification of disease-related lncRNAs is helpful for elucidating complex pathogenesis. Recent methods for predicting associations between lncRNAs and diseases integrate their pertinent heterogeneous data. However, they failed to deeply integrate topological information of heterogeneous network comprising lncRNAs, diseases, and miRNAs. We proposed a novel method based on the graph convolutional network and convolutional neural network, referred to as GCNLDA, to infer disease-related lncRNA candidates. The heterogeneous network containing the lncRNA, disease, and miRNA nodes, is constructed firstly. The embedding matrix of a lncRNA-disease node pair was constructed according to various biological premises about lncRNAs, diseases, and miRNAs. A new framework based on a graph convolutional network and a convolutional neural network was developed to learn network and local representations of the lncRNA-disease pair. On the left side of the framework, the autoencoder based on graph convolution deeply integrated topological information within the heterogeneous lncRNA-disease-miRNA network. Moreover, as different node features have discriminative contributions to the association prediction, an attention mechanism at node feature level is constructed. The left side learnt the network representation of the lncRNA-disease pair. The convolutional neural networks on the right side of the framework learnt the local representation of the lncRNA-disease pair by focusing on the similarities, associations, and interactions that are only related to the pair. Compared to several state-of-the-art prediction methods, GCNLDA had superior performance. Case studies on stomach cancer, osteosarcoma, and lung cancer confirmed that GCNLDA effectively discovers the potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Shuxiang Pan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China.
| | - Yong Liu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hao Sun
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
43
|
Computational Identification of Cross-Talking ceRNAs. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1094:97-108. [PMID: 30191491 DOI: 10.1007/978-981-13-0719-5_10] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Competing endogenous RNAs (ceRNAs) are kinds of RNAs that regulate each other at post-transcription level through competing for miRNA regulators. CeRNA-ceRNA networks provide another type of function for protein-coding mRNAs, which link non-coding RNAs such as miRNA, long non-coding RNA, pseudogenes and circular RNAs. In this chapter, we will introduce the definition of ceRNAs, mainly provide the computational method to predict ceRNA interactions in general condition and complex diseases. In addition, we also illustrated several computational methods that are commonly used to identify the perturbed ceRNA networks in human diseases compared to normal conditions. Finally, we also summarized the principles of methods that integrated ceRNA theory to identify human disease biomarkers. Understanding of RNA-RNA crosstalk will provide significant insights into gene regulatory network that has been implicated in human development and/or diseases.
Collapse
|
44
|
Li Y, Li LP, Wang L, Yu CQ, Wang Z, You ZH. An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model. Int J Mol Sci 2019; 20:ijms20143511. [PMID: 31319578 PMCID: PMC6679202 DOI: 10.3390/ijms20143511] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 07/04/2019] [Accepted: 07/15/2019] [Indexed: 01/03/2023] Open
Abstract
Protein plays a critical role in the regulation of biological cell functions. Among them, whether proteins interact with each other has become a fundamental problem, because proteins usually perform their functions by interacting with other proteins. Although a large amount of protein–protein interactions (PPIs) data has been produced by high-throughput biotechnology, the disadvantage of biological experimental technique is time-consuming and costly. Thus, computational methods for predicting protein interactions have become a research hot spot. In this research, we propose an efficient computational method that combines Rotation Forest (RF) classifier with Local Binary Pattern (LBP) feature extraction method to predict PPIs from the perspective of Position-Specific Scoring Matrix (PSSM). The proposed method has achieved superior performance in predicting Yeast, Human, and H. pylori datasets with average accuracies of 92.12%, 96.21%, and 86.59%, respectively. In addition, we also evaluated the performance of the proposed method on the four independent datasets of C. elegans, H. pylori, H. sapiens, and M. musculus datasets. These obtained experimental results fully prove that our model has good feasibility and robustness in predicting PPIs.
Collapse
Affiliation(s)
- Yang Li
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an 710123, China.
| | - Zheng Wang
- School of Information Engineering, Xijing University, Xi'an 710123, China
| | - Zhu-Hong You
- School of Information Engineering, Xijing University, Xi'an 710123, China
| |
Collapse
|
45
|
LLCLPLDA: a novel model for predicting lncRNA-disease associations. Mol Genet Genomics 2019; 294:1477-1486. [PMID: 31250107 DOI: 10.1007/s00438-019-01590-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 06/21/2019] [Indexed: 12/19/2022]
Abstract
Long noncoding RNAs play a significant role in the occurrence of diseases. Thus, studying the relationship prediction between lncRNAs and disease is becoming more popular. Researchers hope to determine effective treatments by revealing the occurrence and development of diseases at the molecular level. However, the traditional biological experimental way to verify the association between lncRNAs and disease is very time-consuming and expensive. Therefore, we developed a method called LLCLPLDA to predict potential lncRNA-disease associations. First, locality-constrained linear coding (LLC) is leveraged to project the features of lncRNAs and diseases to local-constraint features, and then, a label propagation (LP) strategy is used to mix up the initial association matrix and the obtained features of lncRNAs and diseases. To demonstrate the performance of our method, we compared LLCLPLDA with five methods in the leave-one-out cross-validation and fivefold cross-validation scheme, and the experimental results show that the proposed method outperforms the other five methods. Additionally, we conducted case studies on three diseases: cervical cancer, gliomas, and breast cancer. The top five predicted lncRNAs for cervical cancer and gliomas were verified, and four of the five lncRNAs for breast cancer were also confirmed.
Collapse
|
46
|
Fu G, Wang J, Domeniconi C, Yu G. Matrix factorization-based data fusion for the prediction of lncRNA-disease associations. Bioinformatics 2019; 34:1529-1537. [PMID: 29228285 DOI: 10.1093/bioinformatics/btx794] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 12/05/2017] [Indexed: 12/21/2022] Open
Abstract
Motivation Long non-coding RNAs (lncRNAs) play crucial roles in complex disease diagnosis, prognosis, prevention and treatment, but only a small portion of lncRNA-disease associations have been experimentally verified. Various computational models have been proposed to identify lncRNA-disease associations by integrating heterogeneous data sources. However, existing models generally ignore the intrinsic structure of data sources or treat them as equally relevant, while they may not be. Results To accurately identify lncRNA-disease associations, we propose a Matrix Factorization based LncRNA-Disease Association prediction model (MFLDA in short). MFLDA decomposes data matrices of heterogeneous data sources into low-rank matrices via matrix tri-factorization to explore and exploit their intrinsic and shared structure. MFLDA can select and integrate the data sources by assigning different weights to them. An iterative solution is further introduced to simultaneously optimize the weights and low-rank matrices. Next, MFLDA uses the optimized low-rank matrices to reconstruct the lncRNA-disease association matrix and thus to identify potential associations. In 5-fold cross validation experiments to identify verified lncRNA-disease associations, MFLDA achieves an area under the receiver operating characteristic curve (AUC) of 0.7408, at least 3% higher than those given by state-of-the-art data fusion based computational models. An empirical study on identifying masked lncRNA-disease associations again shows that MFLDA can identify potential associations more accurately than competing models. A case study on identifying lncRNAs associated with breast, lung and stomach cancers show that 38 out of 45 (84%) associations predicted by MFLDA are supported by recent biomedical literature and further proves the capability of MFLDA in identifying novel lncRNA-disease associations. MFLDA is a general data fusion framework, and as such it can be adopted to predict associations between other biological entities. Availability and implementation The source code for MFLDA is available at: http://mlda.swu.edu.cn/codes.php? name = MFLDA. Contact gxyu@swu.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyuan Fu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Farifax, VA 22030, USA
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| |
Collapse
|
47
|
Piro RM, Marsico A. Network-Based Methods and Other Approaches for Predicting lncRNA Functions and Disease Associations. Methods Mol Biol 2019; 1912:301-321. [PMID: 30635899 DOI: 10.1007/978-1-4939-8982-9_12] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The discovery that a considerable portion of eukaryotic genomes is transcribed and gives rise to long noncoding RNAs (lncRNAs) provides an important new perspective on the transcriptome and raises questions about the centrality of these lncRNAs in gene-regulatory processes and diseases. The rapidly increasing number of mechanistically investigated lncRNAs has provided evidence for distinct functional classes, such as enhancer-like lncRNAs, which modulate gene expression via chromatin looping, and noncoding competing endogenous RNAs (ceRNAs), which act as microRNA decoys. Despite great progress in the last years, the majority of lncRNAs are functionally uncharacterized and their implication for disease biogenesis and progression is unknown. Here, we summarize recent developments in lncRNA function prediction in general and lncRNA-disease associations in particular, with emphasis on in silico methods based on network analysis and on ceRNA function prediction. We believe that such computational techniques provide a valuable aid to prioritize functional lncRNAs or disease-relevant lncRNAs for targeted, experimental follow-up studies.
Collapse
Affiliation(s)
- Rosario Michael Piro
- Institut für Informatik, Freie Universität Berlin, Berlin, Germany.,Institut für Medizinische Genetik und Humangenetik, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Annalisa Marsico
- Institut für Informatik, Freie Universität Berlin, Berlin, Germany. .,Max-Planck-Institut für molekulare Genetik, Berlin, Germany.
| |
Collapse
|
48
|
Xuan P, Cao Y, Zhang T, Kong R, Zhang Z. Dual Convolutional Neural Networks With Attention Mechanisms Based Method for Predicting Disease-Related lncRNA Genes. Front Genet 2019; 10:416. [PMID: 31130990 PMCID: PMC6509943 DOI: 10.3389/fgene.2019.00416] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 04/16/2019] [Indexed: 12/30/2022] Open
Abstract
A lot of studies indicated that aberrant expression of long non-coding RNA genes (lncRNAs) is closely related to human diseases. Identifying disease-related lncRNAs (disease lncRNAs) is critical for understanding the pathogenesis and etiology of diseases. Most of the previous methods focus on prioritizing the potential disease lncRNAs based on shallow learning methods. The methods fail to extract the deep and complex feature representations of lncRNA-disease associations. Furthermore, nearly all the methods ignore the discriminative contributions of the similarity, association, and interaction relationships among lncRNAs, disease, and miRNAs for the association prediction. A dual convolutional neural networks with attention mechanisms based method is presented for predicting the candidate disease lncRNAs, and it is referred to as CNNLDA. CNNLDA deeply integrates the multiple source data like the lncRNA similarities, the disease similarities, the lncRNA-disease associations, the lncRNA-miRNA interactions, and the miRNA-disease associations. The diverse biological premises about lncRNAs, miRNAs, and diseases are combined to construct the feature matrix from the biological perspectives. A novel framework based on the dual convolutional neural networks is developed to learn the global and attention representations of the lncRNA-disease associations. The left part of the framework exploits the various information contained by the feature matrix to learn the global representation of lncRNA-disease associations. The different connection relationships among the lncRNA, miRNA, and disease nodes and the different features of these nodes have the discriminative contributions for the association prediction. Hence we present the attention mechanisms from the relationship level and the feature level respectively, and the right part of the framework learns the attention representation of associations. The experimental results based on the cross validation indicate that CNNLDA yields superior performance than several state-of-the-art methods. Case studies on stomach cancer, lung cancer, and colon cancer further demonstrate CNNLDA's ability to discover the potential disease lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Yangkun Cao
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin, China
| | - Rui Kong
- Department of Pancreatic and Biliary Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zhaogong Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| |
Collapse
|
49
|
A Novel Method for Predicting Disease-Associated LncRNA-MiRNA Pairs Based on the Higher-Order Orthogonal Iteration. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:7614850. [PMID: 31191710 PMCID: PMC6525924 DOI: 10.1155/2019/7614850] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/25/2019] [Accepted: 02/10/2019] [Indexed: 12/30/2022]
Abstract
A lot of research studies have shown that many complex human diseases are associated not only with microRNAs (miRNAs) but also with long noncoding RNAs (lncRNAs). However, most of the current existing studies focus on the prediction of disease-related miRNAs or lncRNAs, and to our knowledge, until now, there are few literature studies reported to pay attention to the study of impact of miRNA-lncRNA pairs on diseases, although more and more studies have shown that both lncRNAs and miRNAs play important roles in cell proliferation and differentiation during the recent years. The identification of disease-related genes provides great insight into the underlying pathogenesis of diseases at a system level. In this study, a novel model called PADLMHOOI was proposed to predict potential associations between diseases and lncRNA-miRNA pairs based on the higher-order orthogonal iteration, and in order to evaluate its prediction performance, the global and local LOOCV were implemented, respectively, and simulation results demonstrated that PADLMHOOI could achieve reliable AUCs of 0.9545 and 0.8874 in global and local LOOCV separately. Moreover, case studies further demonstrated the effectiveness of PADLMHOOI to infer unknown disease-related lncRNA-miRNA pairs.
Collapse
|
50
|
Tang C, Zhou H, Zheng X, Zhang Y, Sha X. Dual Laplacian regularized matrix completion for microRNA-disease associations prediction. RNA Biol 2019; 16:601-611. [PMID: 30676207 PMCID: PMC6546388 DOI: 10.1080/15476286.2019.1570811] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 11/30/2018] [Accepted: 01/03/2019] [Indexed: 01/21/2023] Open
Abstract
Since lots of miRNA-disease associations have been verified, it is meaningful to discover more miRNA-disease associations for serving disease diagnosis and prevention of human complex diseases. However, it is not practical to identify potential associations using traditional biological experimental methods since the process is expensive and time consuming. Therefore, it is necessary to develop efficient computational methods to accomplish this task. In this work, we introduced a matrix completion model with dual Laplacian regularization (DLRMC) to infer unknown miRNA-disease associations in heterogeneous omics data. Specifically, DLRMC transformed the task of miRNA-disease association prediction into a matrix completion problem, in which the potential missing entries of the miRNA-disease association matrix were calculated, the missing association can be obtained based on the prediction scores after the completion procedure. Meanwhile, the miRNA functional similarity and the disease semantic similarity were fully exploited to serve the miRNA-disease association matrix completion by using a dual Laplacian regularization term. In the experiments, we conducted global and local Leave-One-Out Cross Validation (LOOCV) and case studies to evaluate the efficacy of DLRMC on the Human miRNA-disease associations dataset obtained from the HMDDv2.0 database. As a result, the AUCs of DLRMC is 0.9174 and 0.8289 in global LOOCV and local LOOCV, respectively, which significantly outperform a variety of previous methods. In addition, in the case studies on four significant diseases related to human health including Colon Neoplasms, Kidney neoplasms, Lymphoma and Prostate neoplasms, 90%, 92%, 92% and 94% out of the top 50 predicted miRNAs has been confirmed, respectively.
Collapse
Affiliation(s)
- Chang Tang
- School of Computer Science, China University of Geosciences, Wuhan, China
| | - Hua Zhou
- Department of Hematology, The Affiliated Huai’an Hospital of Xuzhou Medical University, Huai’an, China
| | - Xiao Zheng
- Wuhan University of Technology Hospital, Wuhan University of Technology, Wuhan, China
| | - Yanming Zhang
- Department of Hematology, The Affiliated Huai’an Hospital of Xuzhou Medical University, Huai’an, China
| | - Xiaofeng Sha
- Department of Oncology, Huai’an Hongze District People’s Hospital, Huai’an, China
| |
Collapse
|